* Linux 2.6.27.27
@ 2009-07-20 4:06 Greg KH
2009-07-20 4:07 ` Greg KH
2009-07-20 11:51 ` Krzysztof Oledzki
0 siblings, 2 replies; 43+ messages in thread
From: Greg KH @ 2009-07-20 4:06 UTC (permalink / raw)
To: linux-kernel, Andrew Morton, torvalds, stable; +Cc: lwn
I'm announcing the release of the 2.6.27.26 kernel. All users of the
2.6.27 kernel series are very strongly encouraged to upgrade.
I'll also be replying to this message with a copy of the patch between
2.6.27.26 and 2.6.27.27
The updated 2.6.27.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.27.y.git
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=summary
thanks,
greg k-h
------------
Makefile | 7 ++-
drivers/block/floppy.c | 5 ++
drivers/md/dm.c | 4 --
drivers/net/tulip/interrupt.c | 84 +++++++++++++++++++++++++++---------------
drivers/net/tulip/tulip.h | 32 +++++++++++++++-
drivers/pci/iova.c | 26 +++++++++++--
include/linux/mm.h | 2 -
include/linux/personality.h | 5 ++
include/linux/security.h | 2 +
kernel/resource.c | 2 -
kernel/sysctl.c | 2 -
mm/Kconfig | 18 +++++++++
mm/mmap.c | 3 +
security/Kconfig | 22 -----------
security/security.c | 3 -
15 files changed, 145 insertions(+), 72 deletions(-)
Christoph Lameter (1):
security: use mmap_min_addr indepedently of security models
David Woodhouse (1):
Fix iommu address space allocation
Eugene Teo (1):
Add '-fno-delete-null-pointer-checks' to gcc CFLAGS
Greg Kroah-Hartman (2):
Revert "dm: sysfs skip output when device is being destroyed"
Linux 2.6.27.27
Jiri Slaby (1):
floppy: fix lock imbalance
Julien Tinnes (1):
personality: fix PER_CLEAR_ON_SETID (CVE-2009-1895)
Linus Torvalds (1):
Don't use '-fwrapv' compiler option: it's buggy in gcc-4.1.x
Tomasz Lemiech (1):
tulip: Fix for MTU problems with 802.1q tagged frames
Zhang Rui (1):
kernel/resource.c: fix sign extension in reserve_setup()
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: Linux 2.6.27.27 2009-07-20 4:06 Linux 2.6.27.27 Greg KH @ 2009-07-20 4:07 ` Greg KH 2009-07-20 11:51 ` Krzysztof Oledzki 1 sibling, 0 replies; 43+ messages in thread From: Greg KH @ 2009-07-20 4:07 UTC (permalink / raw) To: linux-kernel, Andrew Morton, torvalds, stable; +Cc: lwn diff --git a/Makefile b/Makefile index 90764ee..387a5fd 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 27 -EXTRAVERSION = .26 +EXTRAVERSION = .27 NAME = Trembling Tortoise # *DOCUMENTATION* @@ -340,7 +340,8 @@ KBUILD_CPPFLAGS := -D__KERNEL__ $(LINUXINCLUDE) KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ -fno-strict-aliasing -fno-common \ - -Werror-implicit-function-declaration + -Werror-implicit-function-declaration \ + -fno-delete-null-pointer-checks KBUILD_AFLAGS := -D__ASSEMBLY__ # Read KERNELRELEASE from include/config/kernel.release (if it exists) @@ -556,7 +557,7 @@ KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,) KBUILD_CFLAGS += $(call cc-option,-Wno-pointer-sign,) # disable invalid "can't wrap" optimzations for signed / pointers -KBUILD_CFLAGS += $(call cc-option,-fwrapv) +KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow) # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments # But warn user when we do so diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 615fcd3..5900f76 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3320,7 +3320,10 @@ static inline int set_geometry(unsigned int cmd, struct floppy_struct *g, if (!capable(CAP_SYS_ADMIN)) return -EPERM; mutex_lock(&open_lock); - LOCK_FDC(drive, 1); + if (lock_fdc(drive, 1)) { + mutex_unlock(&open_lock); + return -EINTR; + } floppy_type[type] = *g; floppy_type[type].name = "user format"; for (cnt = type << 2; cnt < (type << 2) + 4; cnt++) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 925efaf..ace998c 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -265,10 +265,6 @@ static int dm_blk_open(struct inode *inode, struct file *file) goto out; } - if (test_bit(DMF_FREEING, &md->flags) || - test_bit(DMF_DELETING, &md->flags)) - return NULL; - dm_get(md); atomic_inc(&md->open_count); diff --git a/drivers/net/tulip/interrupt.c b/drivers/net/tulip/interrupt.c index c6bad98..7faf84f 100644 --- a/drivers/net/tulip/interrupt.c +++ b/drivers/net/tulip/interrupt.c @@ -140,6 +140,7 @@ int tulip_poll(struct napi_struct *napi, int budget) /* If we own the next entry, it is a new packet. Send it up. */ while ( ! (tp->rx_ring[entry].status & cpu_to_le32(DescOwned))) { s32 status = le32_to_cpu(tp->rx_ring[entry].status); + short pkt_len; if (tp->dirty_rx + RX_RING_SIZE == tp->cur_rx) break; @@ -151,8 +152,28 @@ int tulip_poll(struct napi_struct *napi, int budget) if (++work_done >= budget) goto not_done; - if ((status & 0x38008300) != 0x0300) { - if ((status & 0x38000300) != 0x0300) { + /* + * Omit the four octet CRC from the length. + * (May not be considered valid until we have + * checked status for RxLengthOver2047 bits) + */ + pkt_len = ((status >> 16) & 0x7ff) - 4; + + /* + * Maximum pkt_len is 1518 (1514 + vlan header) + * Anything higher than this is always invalid + * regardless of RxLengthOver2047 bits + */ + + if ((status & (RxLengthOver2047 | + RxDescCRCError | + RxDescCollisionSeen | + RxDescRunt | + RxDescDescErr | + RxWholePkt)) != RxWholePkt + || pkt_len > 1518) { + if ((status & (RxLengthOver2047 | + RxWholePkt)) != RxWholePkt) { /* Ingore earlier buffers. */ if ((status & 0xffff) != 0x7fff) { if (tulip_debug > 1) @@ -161,30 +182,23 @@ int tulip_poll(struct napi_struct *napi, int budget) dev->name, status); tp->stats.rx_length_errors++; } - } else if (status & RxDescFatalErr) { + } else { /* There was a fatal error. */ if (tulip_debug > 2) printk(KERN_DEBUG "%s: Receive error, Rx status %8.8x.\n", dev->name, status); tp->stats.rx_errors++; /* end of a packet.*/ - if (status & 0x0890) tp->stats.rx_length_errors++; + if (pkt_len > 1518 || + (status & RxDescRunt)) + tp->stats.rx_length_errors++; + if (status & 0x0004) tp->stats.rx_frame_errors++; if (status & 0x0002) tp->stats.rx_crc_errors++; if (status & 0x0001) tp->stats.rx_fifo_errors++; } } else { - /* Omit the four octet CRC from the length. */ - short pkt_len = ((status >> 16) & 0x7ff) - 4; struct sk_buff *skb; -#ifndef final_version - if (pkt_len > 1518) { - printk(KERN_WARNING "%s: Bogus packet size of %d (%#x).\n", - dev->name, pkt_len, pkt_len); - pkt_len = 1518; - tp->stats.rx_length_errors++; - } -#endif /* Check if the packet is long enough to accept without copying to a minimally-sized skbuff. */ if (pkt_len < tulip_rx_copybreak @@ -357,14 +371,35 @@ static int tulip_rx(struct net_device *dev) /* If we own the next entry, it is a new packet. Send it up. */ while ( ! (tp->rx_ring[entry].status & cpu_to_le32(DescOwned))) { s32 status = le32_to_cpu(tp->rx_ring[entry].status); + short pkt_len; if (tulip_debug > 5) printk(KERN_DEBUG "%s: In tulip_rx(), entry %d %8.8x.\n", dev->name, entry, status); if (--rx_work_limit < 0) break; - if ((status & 0x38008300) != 0x0300) { - if ((status & 0x38000300) != 0x0300) { + + /* + Omit the four octet CRC from the length. + (May not be considered valid until we have + checked status for RxLengthOver2047 bits) + */ + pkt_len = ((status >> 16) & 0x7ff) - 4; + /* + Maximum pkt_len is 1518 (1514 + vlan header) + Anything higher than this is always invalid + regardless of RxLengthOver2047 bits + */ + + if ((status & (RxLengthOver2047 | + RxDescCRCError | + RxDescCollisionSeen | + RxDescRunt | + RxDescDescErr | + RxWholePkt)) != RxWholePkt + || pkt_len > 1518) { + if ((status & (RxLengthOver2047 | + RxWholePkt)) != RxWholePkt) { /* Ingore earlier buffers. */ if ((status & 0xffff) != 0x7fff) { if (tulip_debug > 1) @@ -373,31 +408,22 @@ static int tulip_rx(struct net_device *dev) dev->name, status); tp->stats.rx_length_errors++; } - } else if (status & RxDescFatalErr) { + } else { /* There was a fatal error. */ if (tulip_debug > 2) printk(KERN_DEBUG "%s: Receive error, Rx status %8.8x.\n", dev->name, status); tp->stats.rx_errors++; /* end of a packet.*/ - if (status & 0x0890) tp->stats.rx_length_errors++; + if (pkt_len > 1518 || + (status & RxDescRunt)) + tp->stats.rx_length_errors++; if (status & 0x0004) tp->stats.rx_frame_errors++; if (status & 0x0002) tp->stats.rx_crc_errors++; if (status & 0x0001) tp->stats.rx_fifo_errors++; } } else { - /* Omit the four octet CRC from the length. */ - short pkt_len = ((status >> 16) & 0x7ff) - 4; struct sk_buff *skb; -#ifndef final_version - if (pkt_len > 1518) { - printk(KERN_WARNING "%s: Bogus packet size of %d (%#x).\n", - dev->name, pkt_len, pkt_len); - pkt_len = 1518; - tp->stats.rx_length_errors++; - } -#endif - /* Check if the packet is long enough to accept without copying to a minimally-sized skbuff. */ if (pkt_len < tulip_rx_copybreak diff --git a/drivers/net/tulip/tulip.h b/drivers/net/tulip/tulip.h index 19abbc3..0afa2d4 100644 --- a/drivers/net/tulip/tulip.h +++ b/drivers/net/tulip/tulip.h @@ -201,8 +201,38 @@ enum desc_status_bits { DescStartPkt = 0x20000000, DescEndRing = 0x02000000, DescUseLink = 0x01000000, - RxDescFatalErr = 0x008000, + + /* + * Error summary flag is logical or of 'CRC Error', 'Collision Seen', + * 'Frame Too Long', 'Runt' and 'Descriptor Error' flags generated + * within tulip chip. + */ + RxDescErrorSummary = 0x8000, + RxDescCRCError = 0x0002, + RxDescCollisionSeen = 0x0040, + + /* + * 'Frame Too Long' flag is set if packet length including CRC exceeds + * 1518. However, a full sized VLAN tagged frame is 1522 bytes + * including CRC. + * + * The tulip chip does not block oversized frames, and if this flag is + * set on a receive descriptor it does not indicate the frame has been + * truncated. The receive descriptor also includes the actual length. + * Therefore we can safety ignore this flag and check the length + * ourselves. + */ + RxDescFrameTooLong = 0x0080, + RxDescRunt = 0x0800, + RxDescDescErr = 0x4000, RxWholePkt = 0x00000300, + /* + * Top three bits of 14 bit frame length (status bits 27-29) should + * never be set as that would make frame over 2047 bytes. The Receive + * Watchdog flag (bit 4) may indicate the length is over 2048 and the + * length field is invalid. + */ + RxLengthOver2047 = 0x38000010 }; diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c index 3ef4ac0..078bf8b 100644 --- a/drivers/pci/iova.c +++ b/drivers/pci/iova.c @@ -1,9 +1,19 @@ /* - * Copyright (c) 2006, Intel Corporation. + * Copyright © 2006-2009, Intel Corporation. * - * This file is released under the GPLv2. + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. * - * Copyright (C) 2006-2008 Intel Corporation * Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> */ @@ -123,7 +133,15 @@ move_left: /* Insert the new_iova into domain rbtree by holding writer lock */ /* Add new node and rebalance tree. */ { - struct rb_node **entry = &((prev)), *parent = NULL; + struct rb_node **entry, *parent = NULL; + + /* If we have 'prev', it's a valid place to start the + insertion. Otherwise, start from the root. */ + if (prev) + entry = &prev; + else + entry = &iovad->rbroot.rb_node; + /* Figure out where to put new node */ while (*entry) { struct iova *this = container_of(*entry, diff --git a/include/linux/mm.h b/include/linux/mm.h index ae9775d..eeb7e56 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -572,12 +572,10 @@ static inline void set_page_links(struct page *page, enum zone_type zone, */ static inline unsigned long round_hint_to_min(unsigned long hint) { -#ifdef CONFIG_SECURITY hint &= PAGE_MASK; if (((void *)hint != NULL) && (hint < mmap_min_addr)) return PAGE_ALIGN(mmap_min_addr); -#endif return hint; } diff --git a/include/linux/personality.h b/include/linux/personality.h index a84e9ff..1261208 100644 --- a/include/linux/personality.h +++ b/include/linux/personality.h @@ -40,7 +40,10 @@ enum { * Security-relevant compatibility flags that must be * cleared upon setuid or setgid exec: */ -#define PER_CLEAR_ON_SETID (READ_IMPLIES_EXEC|ADDR_NO_RANDOMIZE) +#define PER_CLEAR_ON_SETID (READ_IMPLIES_EXEC | \ + ADDR_NO_RANDOMIZE | \ + ADDR_COMPAT_LAYOUT | \ + MMAP_PAGE_ZERO) /* * Personality types. diff --git a/include/linux/security.h b/include/linux/security.h index 80c4d00..1638afd 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -2134,6 +2134,8 @@ static inline int security_file_mmap(struct file *file, unsigned long reqprot, unsigned long addr, unsigned long addr_only) { + if ((addr < mmap_min_addr) && !capable(CAP_SYS_RAWIO)) + return -EACCES; return 0; } diff --git a/kernel/resource.c b/kernel/resource.c index 03d796c..87f675a 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -741,7 +741,7 @@ static int __init reserve_setup(char *str) static struct resource reserve[MAXRESERVE]; for (;;) { - int io_start, io_num; + unsigned int io_start, io_num; int x = reserved; if (get_option (&str, &io_start) != 2) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 6816e6d..1228d65 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1132,7 +1132,6 @@ static struct ctl_table vm_table[] = { .strategy = &sysctl_jiffies, }, #endif -#ifdef CONFIG_SECURITY { .ctl_name = CTL_UNNUMBERED, .procname = "mmap_min_addr", @@ -1141,7 +1140,6 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = &proc_doulongvec_minmax, }, -#endif #ifdef CONFIG_NUMA { .ctl_name = CTL_UNNUMBERED, diff --git a/mm/Kconfig b/mm/Kconfig index 0bd9c2d..07b4ec4 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -208,3 +208,21 @@ config VIRT_TO_BUS config MMU_NOTIFIER bool + +config DEFAULT_MMAP_MIN_ADDR + int "Low address space to protect from user allocation" + default 4096 + help + This is the portion of low virtual memory which should be protected + from userspace allocation. Keeping a user from writing to low pages + can help reduce the impact of kernel NULL pointer bugs. + + For most ia64, ppc64 and x86 users with lots of address space + a value of 65536 is reasonable and should cause no problems. + On arm and other archs it should not be higher than 32768. + Programs which use vm86 functionality would either need additional + permissions from either the LSM or the capabilities module or have + this protection disabled. + + This value can be changed after boot using the + /proc/sys/vm/mmap_min_addr tunable. diff --git a/mm/mmap.c b/mm/mmap.c index 2ae093e..d330758 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -86,6 +86,9 @@ int sysctl_overcommit_ratio = 50; /* default is 50% */ int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; atomic_long_t vm_committed_space = ATOMIC_LONG_INIT(0); +/* amount of vm to protect from userspace access */ +unsigned long mmap_min_addr = CONFIG_DEFAULT_MMAP_MIN_ADDR; + /* * Check that a process has enough memory to allocate a new virtual * mapping. 0 means there is enough memory for the allocation to diff --git a/security/Kconfig b/security/Kconfig index 5592939..38411dd 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -92,28 +92,8 @@ config SECURITY_ROOTPLUG See <http://www.linuxjournal.com/article.php?sid=6279> for more information about this module. - - If you are unsure how to answer this question, answer N. - -config SECURITY_DEFAULT_MMAP_MIN_ADDR - int "Low address space to protect from user allocation" - depends on SECURITY - default 0 - help - This is the portion of low virtual memory which should be protected - from userspace allocation. Keeping a user from writing to low pages - can help reduce the impact of kernel NULL pointer bugs. - - For most ia64, ppc64 and x86 users with lots of address space - a value of 65536 is reasonable and should cause no problems. - On arm and other archs it should not be higher than 32768. - Programs which use vm86 functionality would either need additional - permissions from either the LSM or the capabilities module or have - this protection disabled. - - This value can be changed after boot using the - /proc/sys/vm/mmap_min_addr tunable. + If you are unsure how to answer this question, answer N. source security/selinux/Kconfig source security/smack/Kconfig diff --git a/security/security.c b/security/security.c index 3a4b4f5..27a315d 100644 --- a/security/security.c +++ b/security/security.c @@ -26,9 +26,6 @@ extern void security_fixup_ops(struct security_operations *ops); struct security_operations *security_ops; /* Initialized to NULL */ -/* amount of vm to protect from userspace access */ -unsigned long mmap_min_addr = CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR; - static inline int verify(struct security_operations *ops) { /* verify the security_operations structure exists */ ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 4:06 Linux 2.6.27.27 Greg KH 2009-07-20 4:07 ` Greg KH @ 2009-07-20 11:51 ` Krzysztof Oledzki 2009-07-20 15:10 ` Greg KH 1 sibling, 1 reply; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-20 11:51 UTC (permalink / raw) To: Greg KH; +Cc: linux-kernel, Andrew Morton, torvalds, stable, lwn On 2009-07-20 06:06, Greg KH wrote: > I'm announcing the release of the 2.6.27.26 kernel. All users of the > 2.6.27 kernel series are very strongly encouraged to upgrade. > > I'll also be replying to this message with a copy of the patch between > 2.6.27.26 and 2.6.27.27 > > The updated 2.6.27.y git tree can be found at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.27.y.git > and can be browsed at the normal kernel.org git web browser: > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=summary > > thanks, <CUT> > Linus Torvalds (1): > Don't use '-fwrapv' compiler option: it's buggy in gcc-4.1.x I'm very sorry to say that but 2.6.27.27 compiled with gcc-4.2.4 hangs during boot and it is caused by this fix. I extended http://bugzilla.kernel.org/show_bug.cgi?id=13012 with additional info & .config. Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 11:51 ` Krzysztof Oledzki @ 2009-07-20 15:10 ` Greg KH 2009-07-20 16:01 ` Linus Torvalds 2009-07-23 17:33 ` Krzysztof Olędzki 0 siblings, 2 replies; 43+ messages in thread From: Greg KH @ 2009-07-20 15:10 UTC (permalink / raw) To: Krzysztof Oledzki; +Cc: linux-kernel, Andrew Morton, torvalds, stable, lwn On Mon, Jul 20, 2009 at 01:51:33PM +0200, Krzysztof Oledzki wrote: > On 2009-07-20 06:06, Greg KH wrote: > >I'm announcing the release of the 2.6.27.26 kernel. All users of the > >2.6.27 kernel series are very strongly encouraged to upgrade. > > > >I'll also be replying to this message with a copy of the patch between > >2.6.27.26 and 2.6.27.27 > > > >The updated 2.6.27.y git tree can be found at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.27.y.git > >and can be browsed at the normal kernel.org git web browser: > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=summary > > > >thanks, > > <CUT> > >Linus Torvalds (1): > > Don't use '-fwrapv' compiler option: it's buggy in gcc-4.1.x > > I'm very sorry to say that but 2.6.27.27 compiled with gcc-4.2.4 > hangs during boot and it is caused by this fix. > > I extended http://bugzilla.kernel.org/show_bug.cgi?id=13012 with > additional info & .config. Wierd. Linus, any thoughts? Do we need to check the version of the compiler that we are using before turning that option off? thanks, greg k-h ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 15:10 ` Greg KH @ 2009-07-20 16:01 ` Linus Torvalds 2009-07-20 21:45 ` Krzysztof Oledzki 2009-07-23 17:33 ` Krzysztof Olędzki 1 sibling, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2009-07-20 16:01 UTC (permalink / raw) To: Greg KH; +Cc: Krzysztof Oledzki, linux-kernel, Andrew Morton, stable, lwn On Mon, 20 Jul 2009, Greg KH wrote: > > > >Linus Torvalds (1): > > > Don't use '-fwrapv' compiler option: it's buggy in gcc-4.1.x > > > > I'm very sorry to say that but 2.6.27.27 compiled with gcc-4.2.4 > > hangs during boot and it is caused by this fix. > > > > I extended http://bugzilla.kernel.org/show_bug.cgi?id=13012 with > > additional info & .config. > > Wierd. Linus, any thoughts? Do we need to check the version of the > compiler that we are using before turning that option off? Dang. That was what we hoped to avoid by not using -fwrapv - we didn't know which compilers were buggy, although it _looked_ like just 4.1.x (I have some dim memory of somebody suspecting a 4.2.x version too, but that may have been just a "we know it should only be needed in 4.3.x, so maybe we shouldn't use it in 4.2.x either"). But it looks like the "buggy window" for -fno-strict-overflow is potentially even _larger_ than it ever was with -fwrapv. And I suspect that in both cases the real problem is that almost nobody ever uses either optimization flag, so coverage is way way smaller. For now, I suspect we need to do something like this: go back to -fwrapv (because it has gotten more testing), but limit it to 4.2.x and newer. However, it would be really good to figure out _why_ it breaks. It's probably some specific configuration issue in addition to the particular kernel vesion (and maybe it even needs some particular hardware to show the bug - ie a specific driver that triggers it or whatever). Linus --- Makefile | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index 79957b3..47a3a1a 100644 --- a/Makefile +++ b/Makefile @@ -566,7 +566,7 @@ KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,) KBUILD_CFLAGS += $(call cc-option,-Wno-pointer-sign,) # disable invalid "can't wrap" optimizations for signed / pointers -KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow) +KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0402, -fwrapv) # revert to pre-gcc-4.4 behaviour of .eh_frame KBUILD_CFLAGS += $(call cc-option,-fno-dwarf2-cfi-asm) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 16:01 ` Linus Torvalds @ 2009-07-20 21:45 ` Krzysztof Oledzki 2009-07-20 22:08 ` Linus Torvalds 0 siblings, 1 reply; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-20 21:45 UTC (permalink / raw) To: Linus Torvalds Cc: Greg KH, Krzysztof Oledzki, linux-kernel, Andrew Morton, stable, lwn [-- Attachment #1: Type: TEXT/PLAIN, Size: 1721 bytes --] On Mon, 20 Jul 2009, Linus Torvalds wrote: > > > On Mon, 20 Jul 2009, Greg KH wrote: >> >>>> Linus Torvalds (1): >>>> Don't use '-fwrapv' compiler option: it's buggy in gcc-4.1.x >>> >>> I'm very sorry to say that but 2.6.27.27 compiled with gcc-4.2.4 >>> hangs during boot and it is caused by this fix. >>> >>> I extended http://bugzilla.kernel.org/show_bug.cgi?id=13012 with >>> additional info & .config. >> >> Wierd. Linus, any thoughts? Do we need to check the version of the >> compiler that we are using before turning that option off? > > Dang. That was what we hoped to avoid by not using -fwrapv - we didn't > know which compilers were buggy, although it _looked_ like just 4.1.x (I > have some dim memory of somebody suspecting a 4.2.x version too, but that > may have been just a "we know it should only be needed in 4.3.x, so maybe > we shouldn't use it in 4.2.x either"). > > But it looks like the "buggy window" for -fno-strict-overflow is > potentially even _larger_ than it ever was with -fwrapv. And I suspect > that in both cases the real problem is that almost nobody ever uses either > optimization flag, so coverage is way way smaller. > > For now, I suspect we need to do something like this: go back to -fwrapv > (because it has gotten more testing), but limit it to 4.2.x and newer. > > However, it would be really good to figure out _why_ it breaks. It's > probably some specific configuration issue in addition to the particular > kernel vesion (and maybe it even needs some particular hardware to show > the bug - ie a specific driver that triggers it or whatever). No problem. Please let me know what should I do to help tracking this issue. Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 21:45 ` Krzysztof Oledzki @ 2009-07-20 22:08 ` Linus Torvalds 2009-07-20 23:47 ` Marc Dionne 2009-07-21 6:33 ` Krzysztof Oledzki 0 siblings, 2 replies; 43+ messages in thread From: Linus Torvalds @ 2009-07-20 22:08 UTC (permalink / raw) To: Krzysztof Oledzki; +Cc: Greg KH, linux-kernel, Andrew Morton, stable, lwn On Mon, 20 Jul 2009, Krzysztof Oledzki wrote: > > No problem. Please let me know what should I do to help tracking this issue. Can you build two kernels: one with -fwrapv, and one with -fno-strict-overflow, and then verify that - they are otherwise identical (ie exact same source code, same compiler etc) - verify that yes, the -fwrapv kernel works, the other does not. Just to avoid the confusion that obviously exists with Debian/sid binutils upgrades that _also_ happens result in nonbootable kernels. - upload the 'vmlinux' images somewhere (I'm not sure what the limits for binary attachments are at the kernel bugzilla, but that would be the logical place) In fact, it would be nice to have a third "identical" kernel build, except with neither -fwrapv/-fno-strict-overflow. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 22:08 ` Linus Torvalds @ 2009-07-20 23:47 ` Marc Dionne 2009-07-20 23:56 ` Linus Torvalds 2009-07-21 6:33 ` Krzysztof Oledzki 1 sibling, 1 reply; 43+ messages in thread From: Marc Dionne @ 2009-07-20 23:47 UTC (permalink / raw) To: Linus Torvalds Cc: Krzysztof Oledzki, Greg KH, linux-kernel, Andrew Morton, stable, lwn On 07/20/2009 06:08 PM, Linus Torvalds wrote: > > On Mon, 20 Jul 2009, Krzysztof Oledzki wrote: >> No problem. Please let me know what should I do to help tracking this issue. > > Can you build two kernels: one with -fwrapv, and one with > -fno-strict-overflow, and then verify that > > - they are otherwise identical (ie exact same source code, same compiler > etc) > > - verify that yes, the -fwrapv kernel works, the other does not. Just to > avoid the confusion that obviously exists with Debian/sid binutils > upgrades that _also_ happens result in nonbootable kernels. > > - upload the 'vmlinux' images somewhere (I'm not sure what the limits for > binary attachments are at the kernel bugzilla, but that would be the > logical place) > > In fact, it would be nice to have a third "identical" kernel build, except > with neither -fwrapv/-fno-strict-overflow. > > Linus I might be seeing a slightly different bug, but in case it's helpful, the behaviour here on Fedora rawhide with gcc-4.4.0-14.x86_64 and binutils-2.19.51.0.11-27.fc12.x86_64 is that I get various .o files that come out as completely empty files (or in one case as a precisely 64K sized file that gives a "File format not recognized" error"), and the latest 2.6.31-rc git can't be built at all. If I replace -fno-strict-overflow with -fwrapv in Makefile everything builds and runs fine. Interestingly though, "-fno-strict-overflow -v" also gives me a good kernel, and comparing the assembly for one of the affected files doesn't show any difference between -fwrapv and -fno-strict-overflow. Marc ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 23:47 ` Marc Dionne @ 2009-07-20 23:56 ` Linus Torvalds 2009-07-21 0:37 ` Marc Dionne 0 siblings, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2009-07-20 23:56 UTC (permalink / raw) To: Marc Dionne Cc: Krzysztof Oledzki, Greg KH, linux-kernel, Andrew Morton, stable, lwn On Mon, 20 Jul 2009, Marc Dionne wrote: > > I might be seeing a slightly different bug, but in case it's helpful, the > behaviour here on Fedora rawhide with gcc-4.4.0-14.x86_64 and > binutils-2.19.51.0.11-27.fc12.x86_64 is that I get various .o files that come > out as completely empty files (or in one case as a precisely 64K sized file > that gives a "File format not recognized" error"), and the latest 2.6.31-rc > git can't be built at all. Hmm. This sounds more like the binutils bug that people had. Sounds like an assembler bug if the *.o file ends up being empty or at some fixed size. If it was cc1 that fails, I'd expect to not see an *.o file at all, since it didn't generate good assembly. In fact, your behavior sounds like the thing that produces the *.o files core-dumped or died for other reasons, and had a 64kB buffer that either got flushed or not. That would explain the "zero or exactly 64kB" size. It could be ccache too, of course. > If I replace -fno-strict-overflow with -fwrapv in Makefile everything builds > and runs fine. .. and this is just really really odd. If it was the cc1 front-end that does that with a bad optimization, I'd expect more visible turds. But on the other hand, if it's the binutils, then I don't see why -fwrapv would matter. Some front-end options get passed down to the assembler, but I would definitely not expect -fwrapv/-fno-strict-overflow to be one of those. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 23:56 ` Linus Torvalds @ 2009-07-21 0:37 ` Marc Dionne 2009-07-21 1:01 ` Linus Torvalds 2009-07-21 1:05 ` Linus Torvalds 0 siblings, 2 replies; 43+ messages in thread From: Marc Dionne @ 2009-07-21 0:37 UTC (permalink / raw) To: Linus Torvalds Cc: Krzysztof Oledzki, Greg KH, linux-kernel, Andrew Morton, stable, lwn On 07/20/2009 07:56 PM, Linus Torvalds wrote: > > On Mon, 20 Jul 2009, Marc Dionne wrote: >> I might be seeing a slightly different bug, but in case it's helpful, the >> behaviour here on Fedora rawhide with gcc-4.4.0-14.x86_64 and >> binutils-2.19.51.0.11-27.fc12.x86_64 is that I get various .o files that come >> out as completely empty files (or in one case as a precisely 64K sized file >> that gives a "File format not recognized" error"), and the latest 2.6.31-rc >> git can't be built at all. > > Hmm. This sounds more like the binutils bug that people had. Sounds like > an assembler bug if the *.o file ends up being empty or at some fixed > size. If it was cc1 that fails, I'd expect to not see an *.o file at all, > since it didn't generate good assembly. > > In fact, your behavior sounds like the thing that produces the *.o files > core-dumped or died for other reasons, and had a 64kB buffer that either > got flushed or not. That would explain the "zero or exactly 64kB" size. > > It could be ccache too, of course. Actually in my case it turns out that it is ccache after all - if I remove it from the picture everything is fine. If I re-enable it, even with a clean cache, I get the problem. It might just be a coincidence that it's triggered by the -fwrapv change. Marc ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 0:37 ` Marc Dionne @ 2009-07-21 1:01 ` Linus Torvalds 2009-07-21 6:40 ` Krzysztof Oledzki 2009-07-21 1:05 ` Linus Torvalds 1 sibling, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2009-07-21 1:01 UTC (permalink / raw) To: Marc Dionne Cc: Krzysztof Oledzki, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn On Mon, 20 Jul 2009, Marc Dionne wrote: > > > > Hmm. This sounds more like the binutils bug that people had. Sounds like > > an assembler bug if the *.o file ends up being empty or at some fixed > > size. If it was cc1 that fails, I'd expect to not see an *.o file at all, > > since it didn't generate good assembly. > > > > In fact, your behavior sounds like the thing that produces the *.o files > > core-dumped or died for other reasons, and had a 64kB buffer that either > > got flushed or not. That would explain the "zero or exactly 64kB" size. > > > > It could be ccache too, of course. > > Actually in my case it turns out that it is ccache after all - if I remove it > from the picture everything is fine. If I re-enable it, even with a clean > cache, I get the problem. > > It might just be a coincidence that it's triggered by the -fwrapv change. Ok, so this is getting ridiculous. Do we have _three_ different kernel issues going on at the same time, all subtly related to tools issues rather than the kernel source tree itself? That's just completely bizarre. So right now we have: - Krzysztof Oledzki: the only one who so far has really pinpointed it to the -fwrapv change itself. It would be good to really double-check that this is not about ccache, since Marc apparently gets a good kernel without ccache, and -fwrapv seems to be involved. - Marc Dionne: ccache getting confused, with 0-byte and 64kB object files. But why -fwrapv vs -fno-strict-overflow would matter is totally unclear. Just happenstance? Something silly like overflowing the length of the ccache argument buffer? It would be wonderful to figure out what odd issue ccache might have. The kernel command line isn't _that_ long, but it does end up being something reasonably monstrous like gcc -Wp,-MD,kernel/.fork.s.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.4.0/include -Iinclude -I/home/torvalds/v2.6/linux/arch/x86/include -include include/linux/autoconf.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -march=core2 -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -fno-stack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wdeclaration-after-statement -Wno-pointer-sign -fwrapv -fno-dwarf2-cfi-asm -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(fork)" -D"KBUILD_MODNAME=KBUILD_STR(fork)" -fverbose-asm -S -o kernel/fork.s kernel/fork.c so we are getting into the kilobyte range for it, and mayeb simply the longer argument made something fail. But other build systems do even worse things, I'm sure. - the Debian/sid binutils package failure, solved by downgrading binutils. Crazy, crazy. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 1:01 ` Linus Torvalds @ 2009-07-21 6:40 ` Krzysztof Oledzki 0 siblings, 0 replies; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-21 6:40 UTC (permalink / raw) To: Linus Torvalds Cc: Marc Dionne, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn [-- Attachment #1: Type: TEXT/PLAIN, Size: 1809 bytes --] On Mon, 20 Jul 2009, Linus Torvalds wrote: > > > On Mon, 20 Jul 2009, Marc Dionne wrote: >>> >>> Hmm. This sounds more like the binutils bug that people had. Sounds like >>> an assembler bug if the *.o file ends up being empty or at some fixed >>> size. If it was cc1 that fails, I'd expect to not see an *.o file at all, >>> since it didn't generate good assembly. >>> >>> In fact, your behavior sounds like the thing that produces the *.o files >>> core-dumped or died for other reasons, and had a 64kB buffer that either >>> got flushed or not. That would explain the "zero or exactly 64kB" size. >>> >>> It could be ccache too, of course. >> >> Actually in my case it turns out that it is ccache after all - if I remove it >> from the picture everything is fine. If I re-enable it, even with a clean >> cache, I get the problem. >> >> It might just be a coincidence that it's triggered by the -fwrapv change. > > Ok, so this is getting ridiculous. Do we have _three_ different kernel > issues going on at the same time, all subtly related to tools issues > rather than the kernel source tree itself? > > That's just completely bizarre. > > So right now we have: > > - Krzysztof Oledzki: the only one who so far has really pinpointed it to > the -fwrapv change itself. > > It would be good to really double-check that this is not about ccache, > since Marc apparently gets a good kernel without ccache, and -fwrapv > seems to be involved. There is no ccache configured in my systems and the same problem appears on a different servers (both i386 and x86-64). However, the configs are very similar and the hardware is nearly identical. I'm pretty sure the only different between bootable and unbootable kernel is that fwrapv vs strict-overflow change. Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 0:37 ` Marc Dionne 2009-07-21 1:01 ` Linus Torvalds @ 2009-07-21 1:05 ` Linus Torvalds 2009-07-21 2:38 ` Marc Dionne 1 sibling, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2009-07-21 1:05 UTC (permalink / raw) To: Marc Dionne Cc: Krzysztof Oledzki, Greg KH, linux-kernel, Andrew Morton, stable, lwn On Mon, 20 Jul 2009, Marc Dionne wrote: > > > > It could be ccache too, of course. > > Actually in my case it turns out that it is ccache after all - if I remove it > from the picture everything is fine. If I re-enable it, even with a clean > cache, I get the problem. > > It might just be a coincidence that it's triggered by the -fwrapv change. Btw, do you find any core-files lying around if you enable them before the build with ulimit -c unlimited or similar? And how did you clean ccache? There's "-c" and then there's "-C". Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 1:05 ` Linus Torvalds @ 2009-07-21 2:38 ` Marc Dionne 0 siblings, 0 replies; 43+ messages in thread From: Marc Dionne @ 2009-07-21 2:38 UTC (permalink / raw) To: Linus Torvalds Cc: Krzysztof Oledzki, Greg KH, linux-kernel, Andrew Morton, stable, lwn On 07/20/2009 09:05 PM, Linus Torvalds wrote: > > On Mon, 20 Jul 2009, Marc Dionne wrote: >>> It could be ccache too, of course. >> Actually in my case it turns out that it is ccache after all - if I remove it >> from the picture everything is fine. If I re-enable it, even with a clean >> cache, I get the problem. >> >> It might just be a coincidence that it's triggered by the -fwrapv change. > > Btw, do you find any core-files lying around if you enable them before the > build with > > ulimit -c unlimited > > or similar? > > And how did you clean ccache? There's "-c" and then there's "-C". > > Linus Unfortunately I'm not able to reproduce anymore, after clearing /var/cache/ccache completely (rm -rf). Earlier I had done ccache -C, which didn't help. I didn't think to copy the contents for more analysis. So perhaps a combination of some odd ccache state along with changing gcc, binutils (which I updated today) and the compile flag. Revving binutils and gcc back and forth didn't reproduce it. What is odd though is that when I straced a single gcc command line that produced an empty .o file, it looked normal - a series of successful writes with the correct amount of data to a temp file, close, unlink .o file, rename temp file -> .o file. But the resulting file was empty. Make me wonder if there was something filesystem/caching related to it. Marc ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 22:08 ` Linus Torvalds 2009-07-20 23:47 ` Marc Dionne @ 2009-07-21 6:33 ` Krzysztof Oledzki 2009-07-21 10:16 ` Krzysztof Oledzki 1 sibling, 1 reply; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-21 6:33 UTC (permalink / raw) To: Linus Torvalds Cc: Krzysztof Oledzki, Greg KH, linux-kernel, Andrew Morton, stable, lwn [-- Attachment #1: Type: TEXT/PLAIN, Size: 1057 bytes --] On Mon, 20 Jul 2009, Linus Torvalds wrote: > > > On Mon, 20 Jul 2009, Krzysztof Oledzki wrote: >> >> No problem. Please let me know what should I do to help tracking this issue. > > Can you build two kernels: one with -fwrapv, and one with > -fno-strict-overflow, and then verify that > > - they are otherwise identical (ie exact same source code, same compiler > etc) > > - verify that yes, the -fwrapv kernel works, the other does not. Just to > avoid the confusion that obviously exists with Debian/sid binutils > upgrades that _also_ happens result in nonbootable kernels. > > - upload the 'vmlinux' images somewhere (I'm not sure what the limits for > binary attachments are at the kernel bugzilla, but that would be the > logical place) > > In fact, it would be nice to have a third "identical" kernel build, except > with neither -fwrapv/-fno-strict-overflow. OK. Right now I'm building the three kernels you asked for (fwrapv/fno-strict-overflow/none). I'll test them and upload vmlinux images. Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 6:33 ` Krzysztof Oledzki @ 2009-07-21 10:16 ` Krzysztof Oledzki 2009-07-21 16:11 ` Linus Torvalds 0 siblings, 1 reply; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-21 10:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: Greg KH, linux-kernel, Andrew Morton, stable, lwn [-- Attachment #1: Type: TEXT/PLAIN, Size: 2621 bytes --] On Tue, 21 Jul 2009, Krzysztof Oledzki wrote: > > > On Mon, 20 Jul 2009, Linus Torvalds wrote: > >> >> >> On Mon, 20 Jul 2009, Krzysztof Oledzki wrote: >>> >>> No problem. Please let me know what should I do to help tracking this >>> issue. >> >> Can you build two kernels: one with -fwrapv, and one with >> -fno-strict-overflow, and then verify that >> >> - they are otherwise identical (ie exact same source code, same compiler >> etc) >> >> - verify that yes, the -fwrapv kernel works, the other does not. Just to >> avoid the confusion that obviously exists with Debian/sid binutils >> upgrades that _also_ happens result in nonbootable kernels. >> >> - upload the 'vmlinux' images somewhere (I'm not sure what the limits for >> binary attachments are at the kernel bugzilla, but that would be the >> logical place) >> >> In fact, it would be nice to have a third "identical" kernel build, except >> with neither -fwrapv/-fno-strict-overflow. > > OK. Right now I'm building the three kernels you asked for > (fwrapv/fno-strict-overflow/none). I'll test them and upload vmlinux images. OK, there are three kernels, exactly as you requested: http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow.bz2 (Hangs) http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fwrapv.bz2 (OK) http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fnone.bz2 (OK) # grep -A1 "can't wrap" linux-2.6.27.27-*/Makefile linux-2.6.27.27-fno-strict-overflow/Makefile:# disable invalid "can't wrap" optimzations for signed / pointers linux-2.6.27.27-fno-strict-overflow/Makefile-KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow) -- linux-2.6.27.27-fwrapv/Makefile:# disable invalid "can't wrap" optimzations for signed / pointers linux-2.6.27.27-fwrapv/Makefile-KBUILD_CFLAGS += $(call cc-option,-fwrapv) -- linux-2.6.27.27-fnone/Makefile:# disable invalid "can't wrap" optimzations for signed / pointers linux-2.6.27.27-fnone/Makefile-#KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow) Plwase note the third one has the KBUILD_CFLAGS commented. Kernels are identical and are compiled from the same config, on the same server with gcc-4.2.4, binutils-2.19. There is no ccache installed and the kernels are not patched with any additonal patches - just vanilla linux-2.6.27.27. Screenshot from the hanging kernel (-fno-strict-overflow): http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/2.6.27.27-hang.png Dmesg from a bootable kernel: http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/dmesg Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 10:16 ` Krzysztof Oledzki @ 2009-07-21 16:11 ` Linus Torvalds 2009-07-21 19:15 ` Linus Torvalds 0 siblings, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2009-07-21 16:11 UTC (permalink / raw) To: Krzysztof Oledzki; +Cc: Greg KH, linux-kernel, Andrew Morton, stable, lwn On Tue, 21 Jul 2009, Krzysztof Oledzki wrote: > > OK, there are three kernels, exactly as you requested: > > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow.bz2 (Hangs) > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fwrapv.bz2 (OK) > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fnone.bz2 (OK) Perfect. And interestingly, the "fno-strict-overflow" kernel is actually much closer to the "fnone" kernel than to the "fwrapv" one. I have some silly scripts based on 'objdump -d' plus a lot of stupid sed scripting to remove the trivial differences due to instruction addresses, and then doing a 'diff -u' between the munged disassembly of the kernels gives me: [torvalds@nehalem fno-strict-overflow]$ wc -l fnone-to-fno-strict-overflow fwrapv-to-fno-strict-overflow 39309 fnone-to-fno-strict-overflow 91423 fwrapv-to-fno-strict-overflow 130732 total ie the diff from the kernel with no flags is less than twice the size of the diff from fwrapv. Still - it's almost 40kB of diffs of disassembly, so I'm not going to guarantee that I can make any sense of it and find the compiler problem in it. But I'll try. And send you test-patches to see if I can pinpoint the code that causes the problem. > Kernels are identical and are compiled from the same config, on the same > server with gcc-4.2.4, binutils-2.19. There is no ccache installed and the > kernels are not patched with any additonal patches - just vanilla > linux-2.6.27.27. Thank you. > Screenshot from the hanging kernel (-fno-strict-overflow): > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/2.6.27.27-hang.png > > Dmesg from a bootable kernel: > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/dmesg Great. This is all about as perfect as could be asked for. Now it's just a question of trying to find the right code generation difference... Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 16:11 ` Linus Torvalds @ 2009-07-21 19:15 ` Linus Torvalds 2009-07-21 21:34 ` Troy Moure 0 siblings, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2009-07-21 19:15 UTC (permalink / raw) To: Krzysztof Oledzki; +Cc: Greg KH, linux-kernel, Andrew Morton, stable, lwn On Tue, 21 Jul 2009, Linus Torvalds wrote: > > Great. This is all about as perfect as could be asked for. Now it's just a > question of trying to find the right code generation difference... Ok, that "just" is turning out to be really painful. I've tried to do clever things, but the best I've been able to do is to get the relevant differences down to about 22 thousand lines of assembler diffs that don't match either of the working kernels. Sadly, 22KLOC of assembler diffs isn't something anybody can reasonably read to even start to make a guess about which lines are causing problems. So what I'd love to do is to narrow the failure down a bit, by using -fno-strict-overflow only on _parts_ of the tree and then try a couple of kernels to see if they hang, to see which part it is that mis-compiles. With a newer kernel, we could do something like this: diff --git a/Makefile b/Makefile index 79957b3..b096be2 100644 --- a/Makefile +++ b/Makefile @@ -565,9 +565,6 @@ KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,) # disable pointer signed / unsigned warnings in gcc 4.0 KBUILD_CFLAGS += $(call cc-option,-Wno-pointer-sign,) -# disable invalid "can't wrap" optimizations for signed / pointers -KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow) - # revert to pre-gcc-4.4 behaviour of .eh_frame KBUILD_CFLAGS += $(call cc-option,-fno-dwarf2-cfi-asm) diff --git a/drivers/Makefile b/drivers/Makefile index bc4205d..1250b55 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -5,6 +5,8 @@ # Rewritten to use lists instead of if-statements. # +subdir-ccflags-y += -fno-strict-overflow + obj-y += gpio/ obj-$(CONFIG_PCI) += pci/ obj-$(CONFIG_PARISC) += parisc/ to say "use -fno-strict-overflow only when compiling objects in the drivers/ subdirectories", but I'm pretty sure that whole clever 'subdir-ccflags-y' thing was added pretty recently, and won't work in 2.6.27 However, since there is _some_ reason to wonder about whether the problem could be in radeonfb (because the last printouts before the hang are about that), it would be good to test just that part. So if you have the time and energy, it would be very interesting if you could do something like this: - remove the "KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)" entirely from the main Makefile. - one directory at a time, add ccflags-y += -fno-strict-overflow to the Makefile in just that particular directory, and compile and test the kernel. Now, since your old kernel doesn't have that nifty new "subdir-ccflags-y" thing, you can't do it for big parts of the kernel, you can literally do it for just the contents of one subdirectory (non-recusive!) at a time, but while there's two thousand subdirectories in the Linux kernel sources, judicious sprinking of that into the tree could hopefully make it possible to find. - the first Makefile's to test would be 'drivers/video/aty/Makefile'. If that one doesn't work, some scripting might be in order, eg something like for i in $(find drivers -name Makefile) do ( echo "ccflags-y += -fno-strict-overflow" ; cat $i ) > $i.new mv $i.new $i done should add it to all the subdirectories under 'drivers', etc. and if you can find the subdirectory where '-fno-strict-overflow' makes the difference, at that point I'd love to see the kernel image where things worked (ie the last kernel you booted successfully _before_ the kernel that failed) and the kernel that fails - now hopefully the differences should be much smaller (how small will obviously depend on whether you caught the difference in just one subdirectory or whether you scripted it over lots and lots of subdirectories). Of course, the tighter you can do this, the better. If it happens to be in 'drivers/video/aty/' for example, and you end up being really gung-ho about this and want to narrow it down to not just the subdirectory, but a few files, you could remove the per-directory "ccflags-y" line, and do a few per-file CFLAGS entries instead, like: CFLAGS_radeon_base.o += -fno-strict-overflow etc. And hey, if you think this is too much work, then you're right. It's a lot of work. So don't worry if you can't be bothered - it would be wonderful to try to get this thing resolved, but I do realize I'm asking a lot here. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 19:15 ` Linus Torvalds @ 2009-07-21 21:34 ` Troy Moure 2009-07-22 0:53 ` Linus Torvalds 0 siblings, 1 reply; 43+ messages in thread From: Troy Moure @ 2009-07-21 21:34 UTC (permalink / raw) To: Linus Torvalds Cc: Krzysztof Oledzki, Greg KH, linux-kernel, Andrew Morton, stable, lwn On Tue, 21 Jul 2009, Linus Torvalds wrote: > > Great. This is all about as perfect as could be asked for. Now it's just a > > question of trying to find the right code generation difference... > > Ok, that "just" is turning out to be really painful. I think I've found something interesting. Look at the the code generated for edid_checksum() in driver/video/fbmon.c. This is what I see for the -fno-strict-overflow kernel: ... ffffffff803b37ed <edid_checksum>: ffffffff803b37ed: 53 push %rbx ffffffff803b37ee: 48 89 fb mov %rdi,%rbx ffffffff803b37f1: e8 8d fd ff ff callq ffffffff803b3583 <check_edid> ffffffff803b37f6: 85 c0 test %eax,%eax ffffffff803b37f8: 89 c6 mov %eax,%esi ffffffff803b37fa: 74 08 je ffffffff803b3804 <edid_checksum+0x17> ffffffff803b37fc: 48 89 df mov %rbx,%rdi ffffffff803b37ff: e8 c0 fe ff ff callq ffffffff803b36c4 <fix_edid> ffffffff803b3804: eb fe jmp ffffffff803b3804 <edid_checksum+0x17> ffffffff803b3806 <fb_parse_edid>: ffffffff803b3806: 41 54 push %r12 ffffffff803b3808: 48 85 ff test %rdi,%rdi ... That last insn in edid_checksum() doesn't look *quite* right to me... The -fnone kernel has something a lot more sensible-looking: ffffffff803b39dd <edid_checksum>: ffffffff803b39dd: 53 push %rbx ffffffff803b39de: 48 89 fb mov %rdi,%rbx ffffffff803b39e1: e8 8d fd ff ff callq ffffffff803b3773 <check_$ ffffffff803b39e6: 85 c0 test %eax,%eax ffffffff803b39e8: 89 c6 mov %eax,%esi ffffffff803b39ea: 74 08 je ffffffff803b39f4 <edid_c$ ffffffff803b39ec: 48 89 df mov %rbx,%rdi ffffffff803b39ef: e8 c0 fe ff ff callq ffffffff803b38b4 <fix_ed$ ffffffff803b39f4: 31 c9 xor %ecx,%ecx ffffffff803b39f6: 31 f6 xor %esi,%esi ffffffff803b39f8: 31 d2 xor %edx,%edx ffffffff803b39fa: eb 0a jmp ffffffff803b3a06 <edid_c$ ffffffff803b39fc: 0f b6 c0 movzbl %al,%eax ffffffff803b39ff: 8a 04 03 mov (%rbx,%rax,1),%al ffffffff803b3a02: 01 c1 add %eax,%ecx ffffffff803b3a04: 09 c6 or %eax,%esi ffffffff803b3a06: 88 d0 mov %dl,%al ffffffff803b3a08: ff c2 inc %edx ffffffff803b3a0a: 81 fa 81 00 00 00 cmp $0x81,%edx ffffffff803b3a10: 75 ea jne ffffffff803b39fc <edid_c$ ffffffff803b3a12: 84 c9 test %cl,%cl ffffffff803b3a14: 0f 94 c0 sete %al ffffffff803b3a17: 40 84 f6 test %sil,%sil ffffffff803b3a1a: 5b pop %rbx ffffffff803b3a1b: 0f 95 c2 setne %dl ffffffff803b3a1e: 21 d0 and %edx,%eax ffffffff803b3a20: 0f b6 c0 movzbl %al,%eax ffffffff803b3a23: c3 retq ffffffff803b3a24 <fb_parse_edid>: ffffffff803b3a24: 41 54 push %r12 ffffffff803b3a26: 48 85 ff test %rdi,%rdi Hope that helps narrow things down ;) Troy ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-21 21:34 ` Troy Moure @ 2009-07-22 0:53 ` Linus Torvalds 2009-07-22 1:07 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Linus Torvalds @ 2009-07-22 0:53 UTC (permalink / raw) To: Troy Moure Cc: Krzysztof Oledzki, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [ Added Ian Lance Taylor to the cc, he knows the background, and unlike me is competent with gcc. ] On Tue, 21 Jul 2009, Troy Moure wrote: > > I think I've found something interesting. Look at the the code generated > for edid_checksum() in driver/video/fbmon.c. This is what I see for the > -fno-strict-overflow kernel: Ooh. Bingo. You're 100% right, and you definitely found it (of course, there may be _other_ cases like this, but that's certainly _one_ of the problems, and probably the only one). Just out of curiosity, how did you find it? Now that I know where to look, it's very obvious in the assembler diffs, but I didn't notice it until you pointed it out just because there is so _much_ of the diffs... And yes, that's very much a compiler bug. And I also bet it's very easily fixed. The code in question is this loop: #define EDID_LENGTH 128 unsigned char i, ... for (i = 0; i < EDID_LENGTH; i++) { csum += edid[i]; all_null |= edid[i]; } and gcc -fno-strict-overflow has apparently decided that that is an infinite loop, even though it clearly is not. So then the stupid and buggy compiler will compile that loop (and the whole rest of the function) to the "optimized" version that is just loop: jmp loop; I even bet I know why: it looks at "unsigned char", and sees that it is an 8-bit variable, and then it looks at "i < EDID_LENGTH" and sees that it is a _signed_ comparison (it's signed because the C type rules mean that 'unsigned char' will be extended to 'int' in an expression), and then it decides that in a signed comparison an 8-bit entry is always going to be smaller than 128. Anyway, I bet we can work around the compiler bug by just changing the type of "i" from "unsigned char" to be a plain "int". Krzysztof? Mind testing that? Ian? This is Linux 2.6.27.27 compiled with gcc-4.2.4. I'm not seeing the bug in the gcc I have on my machine (gcc-4.4.0), but the bug is very clear (once you _find_ it, which was the problem) in the binaries that Krzysztof posted. They're still at: http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow.bz2 (Hangs) http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fwrapv.bz2 (OK) http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fnone.bz2 (OK) and you can clearly see the 'edid_checksum' miscompilation in the objdump disassembly. Thanks Troy, Linus --- Leaving in the context for Ian: Buggy -fno-strict-overflow compilation: > ... > ffffffff803b37ed <edid_checksum>: > ffffffff803b37ed: 53 push %rbx > ffffffff803b37ee: 48 89 fb mov %rdi,%rbx > ffffffff803b37f1: e8 8d fd ff ff callq ffffffff803b3583 <check_edid> > ffffffff803b37f6: 85 c0 test %eax,%eax > ffffffff803b37f8: 89 c6 mov %eax,%esi > ffffffff803b37fa: 74 08 je ffffffff803b3804 <edid_checksum+0x17> > ffffffff803b37fc: 48 89 df mov %rbx,%rdi > ffffffff803b37ff: e8 c0 fe ff ff callq ffffffff803b36c4 <fix_edid> > ffffffff803b3804: eb fe jmp ffffffff803b3804 <edid_checksum+0x17> > > ffffffff803b3806 <fb_parse_edid>: > ffffffff803b3806: 41 54 push %r12 > ffffffff803b3808: 48 85 ff test %rdi,%rdi > ... > > That last insn in edid_checksum() doesn't look *quite* right to me... > > The -fnone kernel has something a lot more sensible-looking: > > ffffffff803b39dd <edid_checksum>: > ffffffff803b39dd: 53 push %rbx > ffffffff803b39de: 48 89 fb mov %rdi,%rbx > ffffffff803b39e1: e8 8d fd ff ff callq ffffffff803b3773 <check_$ > ffffffff803b39e6: 85 c0 test %eax,%eax > ffffffff803b39e8: 89 c6 mov %eax,%esi > ffffffff803b39ea: 74 08 je ffffffff803b39f4 <edid_c$ > ffffffff803b39ec: 48 89 df mov %rbx,%rdi > ffffffff803b39ef: e8 c0 fe ff ff callq ffffffff803b38b4 <fix_ed$ > ffffffff803b39f4: 31 c9 xor %ecx,%ecx > ffffffff803b39f6: 31 f6 xor %esi,%esi > ffffffff803b39f8: 31 d2 xor %edx,%edx > ffffffff803b39fa: eb 0a jmp ffffffff803b3a06 <edid_c$ > ffffffff803b39fc: 0f b6 c0 movzbl %al,%eax > ffffffff803b39ff: 8a 04 03 mov (%rbx,%rax,1),%al > ffffffff803b3a02: 01 c1 add %eax,%ecx > ffffffff803b3a04: 09 c6 or %eax,%esi > ffffffff803b3a06: 88 d0 mov %dl,%al > ffffffff803b3a08: ff c2 inc %edx > ffffffff803b3a0a: 81 fa 81 00 00 00 cmp $0x81,%edx > ffffffff803b3a10: 75 ea jne ffffffff803b39fc <edid_c$ > ffffffff803b3a12: 84 c9 test %cl,%cl > ffffffff803b3a14: 0f 94 c0 sete %al > ffffffff803b3a17: 40 84 f6 test %sil,%sil > ffffffff803b3a1a: 5b pop %rbx > ffffffff803b3a1b: 0f 95 c2 setne %dl > ffffffff803b3a1e: 21 d0 and %edx,%eax > ffffffff803b3a20: 0f b6 c0 movzbl %al,%eax > ffffffff803b3a23: c3 retq > > ffffffff803b3a24 <fb_parse_edid>: > ffffffff803b3a24: 41 54 push %r12 > ffffffff803b3a26: 48 85 ff test %rdi,%rdi > > Hope that helps narrow things down ;) > > Troy > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 0:53 ` Linus Torvalds @ 2009-07-22 1:07 ` Linus Torvalds 2009-07-22 6:16 ` Troy Moure 2009-07-22 1:16 ` Linus Torvalds 2009-07-22 11:49 ` Krzysztof Oledzki 2 siblings, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2009-07-22 1:07 UTC (permalink / raw) To: Troy Moure Cc: Krzysztof Oledzki, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Tue, 21 Jul 2009, Linus Torvalds wrote: > > Just out of curiosity, how did you find it? Now that I know where to look, > it's very obvious in the assembler diffs, but I didn't notice it until you > pointed it out just because there is so _much_ of the diffs... Ahh. I think I see how you found it. Looking at the diffs, there's only a few places where the number of instructions changed by a big fraction. And there's only _one_ place that has a factor-of-three difference (26 lines in the correct cases, and 7 lines in the wrong one). Clever. There's also a case in do_page_fault() where -fno-strict-overflow generates a _lot_ more instructions than the other cases (but not by a factor of three - but it expands 63 instructions to 100). I'm not seeing quite _why_ it does that, but it does various stupid things like multiply by 0x38 etc. But it doesn't look buggy, it just looks stupid. Or did you just brute-force it and spend a lot of time eyeballing things? Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 1:07 ` Linus Torvalds @ 2009-07-22 6:16 ` Troy Moure 2009-07-22 15:58 ` Linus Torvalds 0 siblings, 1 reply; 43+ messages in thread From: Troy Moure @ 2009-07-22 6:16 UTC (permalink / raw) To: Linus Torvalds Cc: Troy Moure, Krzysztof Oledzki, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Tue, 21 Jul 2009, Linus Torvalds wrote: > > Just out of curiosity, how did you find it? Now that I know where to look, > > it's very obvious in the assembler diffs, but I didn't notice it until you > > pointed it out just because there is so _much_ of the diffs... > > Ahh. I think I see how you found it. Looking at the diffs, there's only a > few places where the number of instructions changed by a big fraction. And > there's only _one_ place that has a factor-of-three difference (26 lines > in the correct cases, and 7 lines in the wrong one). Clever. Hmm..that's interesting. But no, I wasn't that clever. I actually just started poking around the radeonfb code, since you mentioned it looked like that might be where the issue was. The last message printed in the hung kernel is "Monitor 2 type no found" - printed from radeon_probe_screens(). And the first message after that in the non-hung kernel is "Console: switching to colour frame buffer device", which I guessed was printed from register_framebuffer() (since that calls notifiers). So I started looking in radeon_fb_register() between the call to radeon_probe_screens() and the call to register_framebuffer(), and tracing through the calls it made. I ignored pci_, sysfs_, etc. calls, thinking the driver code was more likely to have a device probing loop or something odd like that that could be miscompiled. For any functions that had a loop or anything strange-looking, I checked the assembler diffs. And after a little while (a half-hour or so, I think), I found edid_checksum(). Just the name made me think it was a likely culprit, even before I looked at the diff. Obviously I got a bit lucky that problem was actually basically where I started looking for it. But I figured even if I didn't find it, I'd learn something about the radeonfb code. And who would pass up an opportunity to learn about that? Troy ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 6:16 ` Troy Moure @ 2009-07-22 15:58 ` Linus Torvalds 0 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2009-07-22 15:58 UTC (permalink / raw) To: Troy Moure Cc: Krzysztof Oledzki, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Wed, 22 Jul 2009, Troy Moure wrote: > > Obviously I got a bit lucky that problem was actually basically where I > started looking for it. But I figured even if I didn't find it, I'd learn > something about the radeonfb code. And who would pass up an opportunity to > learn about that? Who indeed? <eyeroll> There are few things more interesting than looking at assembler code from the frame buffer layer. </eyeroll> Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 0:53 ` Linus Torvalds 2009-07-22 1:07 ` Linus Torvalds @ 2009-07-22 1:16 ` Linus Torvalds 2009-07-22 8:12 ` Krzysztof Oledzki 2009-07-22 13:48 ` Krzysztof Oledzki 2009-07-22 11:49 ` Krzysztof Oledzki 2 siblings, 2 replies; 43+ messages in thread From: Linus Torvalds @ 2009-07-22 1:16 UTC (permalink / raw) To: Troy Moure Cc: Krzysztof Oledzki, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Tue, 21 Jul 2009, Linus Torvalds wrote: > > Anyway, I bet we can work around the compiler bug by just changing the > type of "i" from "unsigned char" to be a plain "int". IOW, like this. Linus --- drivers/video/fbmon.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c index 5c1a2c0..af4a15c 100644 --- a/drivers/video/fbmon.c +++ b/drivers/video/fbmon.c @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) static int edid_checksum(unsigned char *edid) { - unsigned char i, csum = 0, all_null = 0; - int err = 0, fix = check_edid(edid); + unsigned csum = 0, all_null = 0; + int i, err = 0, fix = check_edid(edid); if (fix) fix_edid(edid, fix); ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 1:16 ` Linus Torvalds @ 2009-07-22 8:12 ` Krzysztof Oledzki 2009-07-22 8:32 ` Krzysztof Oledzki 2009-07-22 13:48 ` Krzysztof Oledzki 1 sibling, 1 reply; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-22 8:12 UTC (permalink / raw) To: Linus Torvalds Cc: Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [-- Attachment #1: Type: TEXT/PLAIN, Size: 1342 bytes --] On Tue, 21 Jul 2009, Linus Torvalds wrote: > > > On Tue, 21 Jul 2009, Linus Torvalds wrote: >> >> Anyway, I bet we can work around the compiler bug by just changing the >> type of "i" from "unsigned char" to be a plain "int". > > IOW, like this. > > Linus > > --- > drivers/video/fbmon.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c > index 5c1a2c0..af4a15c 100644 > --- a/drivers/video/fbmon.c > +++ b/drivers/video/fbmon.c > @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) > > static int edid_checksum(unsigned char *edid) > { > - unsigned char i, csum = 0, all_null = 0; > - int err = 0, fix = check_edid(edid); > + unsigned csum = 0, all_null = 0; > + int i, err = 0, fix = check_edid(edid); > > if (fix) > fix_edid(edid, fix); Wow! You guys rock! ;) Indeed, this simple change is enough to make my kernel bootable. However, there is still something wrong. My console is now 80x30 instead of 128x48: -Console: switching to colour frame buffer device 128x48 +Console: switching to colour frame buffer device 80x30 So, it looks like the loop may be still miscompiled. The kernel is here: http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow-fixed.bz2 Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 8:12 ` Krzysztof Oledzki @ 2009-07-22 8:32 ` Krzysztof Oledzki 2009-07-22 9:55 ` Krzysztof Oledzki ` (3 more replies) 0 siblings, 4 replies; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-22 8:32 UTC (permalink / raw) To: Linus Torvalds Cc: Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [-- Attachment #1: Type: TEXT/PLAIN, Size: 2302 bytes --] On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > > > On Tue, 21 Jul 2009, Linus Torvalds wrote: > >> >> >> On Tue, 21 Jul 2009, Linus Torvalds wrote: >>> >>> Anyway, I bet we can work around the compiler bug by just changing the >>> type of "i" from "unsigned char" to be a plain "int". >> >> IOW, like this. >> >> Linus >> >> --- >> drivers/video/fbmon.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c >> index 5c1a2c0..af4a15c 100644 >> --- a/drivers/video/fbmon.c >> +++ b/drivers/video/fbmon.c >> @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) >> >> static int edid_checksum(unsigned char *edid) >> { >> - unsigned char i, csum = 0, all_null = 0; >> - int err = 0, fix = check_edid(edid); >> + unsigned csum = 0, all_null = 0; >> + int i, err = 0, fix = check_edid(edid); >> >> if (fix) >> fix_edid(edid, fix); > > Wow! You guys rock! ;) > > Indeed, this simple change is enough to make my kernel bootable. However, > there is still something wrong. My console is now 80x30 instead of 128x48: > > -Console: switching to colour frame buffer device 128x48 > +Console: switching to colour frame buffer device 80x30 > > So, it looks like the loop may be still miscompiled. > > The kernel is here: > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow-fixed.bz2 OK, by adding a simple debug printk I'm now sure that the problem is indeed in the loop: --- linux-2.6.27.27-a/drivers/video/fbmon.c 2009-07-20 05:45:22.000000000 +0200 +++ linux-2.6.27.27-b/drivers/video/fbmon.c 2009-07-22 09:45:34.000000000 +0200 @@ -272,6 +272,8 @@ err = 1; } + printk("edid_checksum debug: csum=%u, all_null=%u, err=%d\n", csum, all_null, err); + return err; } Here is a diff between a good and a bad kernel: -edid_checksum debug: csum=0, all_null=255, err=1 -edid_checksum debug: csum=0, all_null=255, err=1 -Console: switching to colour frame buffer device 128x48 +edid_checksum debug: csum=6400, all_null=255, err=0 +Console: switching to colour frame buffer device 80x30 In the good one the function is called twice and it returns err=1 (==OK). In the bad kernel it returns 0 because csum!=0x00 (==6400). Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 8:32 ` Krzysztof Oledzki @ 2009-07-22 9:55 ` Krzysztof Oledzki 2009-07-22 10:44 ` Krzysztof Oledzki 2009-07-22 9:58 ` Jens Rosenboom ` (2 subsequent siblings) 3 siblings, 1 reply; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-22 9:55 UTC (permalink / raw) To: Krzysztof Oledzki Cc: Linus Torvalds, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [-- Attachment #1: Type: TEXT/PLAIN, Size: 3751 bytes --] On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > > > On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > >> >> >> On Tue, 21 Jul 2009, Linus Torvalds wrote: >> >>> >>> >>> On Tue, 21 Jul 2009, Linus Torvalds wrote: >>>> >>>> Anyway, I bet we can work around the compiler bug by just changing the >>>> type of "i" from "unsigned char" to be a plain "int". >>> >>> IOW, like this. >>> >>> Linus >>> >>> --- >>> drivers/video/fbmon.c | 4 ++-- >>> 1 files changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c >>> index 5c1a2c0..af4a15c 100644 >>> --- a/drivers/video/fbmon.c >>> +++ b/drivers/video/fbmon.c >>> @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) >>> >>> static int edid_checksum(unsigned char *edid) >>> { >>> - unsigned char i, csum = 0, all_null = 0; >>> - int err = 0, fix = check_edid(edid); >>> + unsigned csum = 0, all_null = 0; >>> + int i, err = 0, fix = check_edid(edid); >>> >>> if (fix) >>> fix_edid(edid, fix); >> >> Wow! You guys rock! ;) >> >> Indeed, this simple change is enough to make my kernel bootable. However, >> there is still something wrong. My console is now 80x30 instead of 128x48: >> >> -Console: switching to colour frame buffer device 128x48 >> +Console: switching to colour frame buffer device 80x30 >> >> So, it looks like the loop may be still miscompiled. >> >> The kernel is here: >> http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow-fixed.bz2 > > OK, by adding a simple debug printk I'm now sure that the problem is indeed > in the loop: > > --- linux-2.6.27.27-a/drivers/video/fbmon.c 2009-07-20 05:45:22.000000000 > +0200 > +++ linux-2.6.27.27-b/drivers/video/fbmon.c 2009-07-22 09:45:34.000000000 > +0200 > @@ -272,6 +272,8 @@ > err = 1; > } > > + printk("edid_checksum debug: csum=%u, all_null=%u, err=%d\n", csum, > all_null, err); > + > return err; > } > > > Here is a diff between a good and a bad kernel: > > -edid_checksum debug: csum=0, all_null=255, err=1 > -edid_checksum debug: csum=0, all_null=255, err=1 > -Console: switching to colour frame buffer device 128x48 > +edid_checksum debug: csum=6400, all_null=255, err=0 > +Console: switching to colour frame buffer device 80x30 > > In the good one the function is called twice and it returns err=1 (==OK). In > the bad kernel it returns 0 because csum!=0x00 (==6400). More fun. Actually, the problem is indepenent from -fno-strict-overflow and also exists with -fwrapv. It is probably caused by "unsigned char i" to "int i" translation and the variables reordering. With "unsigned char i" before "unsigned char csum = 0": edid_checksum loop: edid[0]=0, csum=0 edid_checksum loop: edid[1]=255, csum=0 edid_checksum loop: edid[2]=255, csum=255 -edid_checksum loop: edid[3]=255, csum=254 -edid_checksum loop: edid[4]=255, csum=253 -edid_checksum loop: edid[5]=255, csum=252 -edid_checksum loop: edid[6]=255, csum=251 -edid_checksum loop: edid[7]=0, csum=250 -edid_checksum loop: edid[8]=6, csum=250 -edid_checksum loop: edid[9]=207, csum=0 (...) -edid_checksum debug: csum=0, all_null=255, err=1 -> OK With "int i" after "unsigned char csum = 0": edid_checksum loop: edid[0]=0, csum=0 edid_checksum loop: edid[1]=255, csum=0 edid_checksum loop: edid[2]=255, csum=255 +edid_checksum loop: edid[3]=255, csum=510 +edid_checksum loop: edid[4]=255, csum=765 +edid_checksum loop: edid[5]=255, csum=1020 +edid_checksum loop: edid[6]=255, csum=1275 +edid_checksum loop: edid[7]=0, csum=1530 +edid_checksum loop: edid[8]=6, csum=1530 +edid_checksum loop: edid[9]=207, csum=1536 (...) +edid_checksum debug: csum=6400, all_null=255, err=0 -> ERROR Funny. ;) Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 9:55 ` Krzysztof Oledzki @ 2009-07-22 10:44 ` Krzysztof Oledzki 0 siblings, 0 replies; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-22 10:44 UTC (permalink / raw) To: Linus Torvalds Cc: Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [-- Attachment #1: Type: TEXT/PLAIN, Size: 4483 bytes --] On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > > > On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > >> >> >> On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: >> >>> >>> >>> On Tue, 21 Jul 2009, Linus Torvalds wrote: >>> >>>> >>>> >>>> On Tue, 21 Jul 2009, Linus Torvalds wrote: >>>>> >>>>> Anyway, I bet we can work around the compiler bug by just changing the >>>>> type of "i" from "unsigned char" to be a plain "int". >>>> >>>> IOW, like this. >>>> >>>> Linus >>>> >>>> --- >>>> drivers/video/fbmon.c | 4 ++-- >>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c >>>> index 5c1a2c0..af4a15c 100644 >>>> --- a/drivers/video/fbmon.c >>>> +++ b/drivers/video/fbmon.c >>>> @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) >>>> >>>> static int edid_checksum(unsigned char *edid) >>>> { >>>> - unsigned char i, csum = 0, all_null = 0; >>>> - int err = 0, fix = check_edid(edid); >>>> + unsigned csum = 0, all_null = 0; >>>> + int i, err = 0, fix = check_edid(edid); >>>> >>>> if (fix) >>>> fix_edid(edid, fix); >>> >>> Wow! You guys rock! ;) >>> >>> Indeed, this simple change is enough to make my kernel bootable. However, >>> there is still something wrong. My console is now 80x30 instead of 128x48: >>> >>> -Console: switching to colour frame buffer device 128x48 >>> +Console: switching to colour frame buffer device 80x30 >>> >>> So, it looks like the loop may be still miscompiled. >>> >>> The kernel is here: >>> http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow-fixed.bz2 >> >> OK, by adding a simple debug printk I'm now sure that the problem is indeed >> in the loop: >> >> --- linux-2.6.27.27-a/drivers/video/fbmon.c 2009-07-20 05:45:22.000000000 >> +0200 >> +++ linux-2.6.27.27-b/drivers/video/fbmon.c 2009-07-22 09:45:34.000000000 >> +0200 >> @@ -272,6 +272,8 @@ >> err = 1; >> } >> >> + printk("edid_checksum debug: csum=%u, all_null=%u, err=%d\n", csum, >> all_null, err); >> + >> return err; >> } >> >> >> Here is a diff between a good and a bad kernel: >> >> -edid_checksum debug: csum=0, all_null=255, err=1 >> -edid_checksum debug: csum=0, all_null=255, err=1 >> -Console: switching to colour frame buffer device 128x48 >> +edid_checksum debug: csum=6400, all_null=255, err=0 >> +Console: switching to colour frame buffer device 80x30 >> >> In the good one the function is called twice and it returns err=1 (==OK). >> In the bad kernel it returns 0 because csum!=0x00 (==6400). > > More fun. > > Actually, the problem is indepenent from -fno-strict-overflow and also exists > with -fwrapv. It is probably caused by "unsigned char i" to "int i" > translation and the variables reordering. > > With "unsigned char i" before "unsigned char csum = 0": > edid_checksum loop: edid[0]=0, csum=0 > edid_checksum loop: edid[1]=255, csum=0 > edid_checksum loop: edid[2]=255, csum=255 > -edid_checksum loop: edid[3]=255, csum=254 > -edid_checksum loop: edid[4]=255, csum=253 > -edid_checksum loop: edid[5]=255, csum=252 > -edid_checksum loop: edid[6]=255, csum=251 > -edid_checksum loop: edid[7]=0, csum=250 > -edid_checksum loop: edid[8]=6, csum=250 > -edid_checksum loop: edid[9]=207, csum=0 > (...) > -edid_checksum debug: csum=0, all_null=255, err=1 -> OK > > With "int i" after "unsigned char csum = 0": > edid_checksum loop: edid[0]=0, csum=0 > edid_checksum loop: edid[1]=255, csum=0 > edid_checksum loop: edid[2]=255, csum=255 > +edid_checksum loop: edid[3]=255, csum=510 > +edid_checksum loop: edid[4]=255, csum=765 > +edid_checksum loop: edid[5]=255, csum=1020 > +edid_checksum loop: edid[6]=255, csum=1275 > +edid_checksum loop: edid[7]=0, csum=1530 > +edid_checksum loop: edid[8]=6, csum=1530 > +edid_checksum loop: edid[9]=207, csum=1536 > (...) > +edid_checksum debug: csum=6400, all_null=255, err=0 -> ERROR > > Funny. ;) OK. I'm just blind. The oryginal patch is wrong. Instead of: - unsigned char i, csum = 0, all_null = 0; - int err = 0, fix = check_edid(edid); + unsigned csum = 0, all_null = 0; + int i, err = 0, fix = check_edid(edid); It should be: - unsigned char i, csum = 0, all_null = 0; - int err = 0, fix = check_edid(edid); + unsigned char csum = 0, all_null = 0; + int i, err = 0, fix = check_edid(edid); The patch should not change "unsigned char (...) csum" to "unsigned (...) csum". Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 8:32 ` Krzysztof Oledzki 2009-07-22 9:55 ` Krzysztof Oledzki @ 2009-07-22 9:58 ` Jens Rosenboom 2009-07-22 10:27 ` Troy Moure 2009-07-22 10:54 ` Krzysztof Oledzki 2009-07-22 10:24 ` Troy Moure 2009-07-22 10:33 ` Dick Streefland 3 siblings, 2 replies; 43+ messages in thread From: Jens Rosenboom @ 2009-07-22 9:58 UTC (permalink / raw) To: Krzysztof Oledzki Cc: Linus Torvalds, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Wed, 2009-07-22 at 10:32 +0200, Krzysztof Oledzki wrote: > > On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > > > > > > > On Tue, 21 Jul 2009, Linus Torvalds wrote: > > > >> > >> > >> On Tue, 21 Jul 2009, Linus Torvalds wrote: > >>> > >>> Anyway, I bet we can work around the compiler bug by just changing the > >>> type of "i" from "unsigned char" to be a plain "int". > >> > >> IOW, like this. > >> > >> Linus > >> > >> --- > >> drivers/video/fbmon.c | 4 ++-- > >> 1 files changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c > >> index 5c1a2c0..af4a15c 100644 > >> --- a/drivers/video/fbmon.c > >> +++ b/drivers/video/fbmon.c > >> @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) > >> > >> static int edid_checksum(unsigned char *edid) > >> { > >> - unsigned char i, csum = 0, all_null = 0; > >> - int err = 0, fix = check_edid(edid); > >> + unsigned csum = 0, all_null = 0; I guess Linus shouldn't have deleted the "char" here... [...] > Here is a diff between a good and a bad kernel: > > -edid_checksum debug: csum=0, all_null=255, err=1 > -edid_checksum debug: csum=0, all_null=255, err=1 > -Console: switching to colour frame buffer device 128x48 > +edid_checksum debug: csum=6400, all_null=255, err=0 > +Console: switching to colour frame buffer device 80x30 > > In the good one the function is called twice and it returns err=1 (==OK). > In the bad kernel it returns 0 because csum!=0x00 (==6400). 6400 is a multiple of 256, so as unsigned char it is ==0. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 9:58 ` Jens Rosenboom @ 2009-07-22 10:27 ` Troy Moure 2009-07-22 10:54 ` Krzysztof Oledzki 1 sibling, 0 replies; 43+ messages in thread From: Troy Moure @ 2009-07-22 10:27 UTC (permalink / raw) To: Jens Rosenboom Cc: Krzysztof Oledzki, Linus Torvalds, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Wed, 22 Jul 2009, Jens Rosenboom wrote: > > >> --- a/drivers/video/fbmon.c > > >> +++ b/drivers/video/fbmon.c > > >> @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) > > >> > > >> static int edid_checksum(unsigned char *edid) > > >> { > > >> - unsigned char i, csum = 0, all_null = 0; > > >> - int err = 0, fix = check_edid(edid); > > >> + unsigned csum = 0, all_null = 0; > > I guess Linus shouldn't have deleted the "char" here... > > [...] > Doh! I missed this - I didn't look closely at his patch. You can disregard my just-sent message, then... Troy ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 9:58 ` Jens Rosenboom 2009-07-22 10:27 ` Troy Moure @ 2009-07-22 10:54 ` Krzysztof Oledzki 1 sibling, 0 replies; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-22 10:54 UTC (permalink / raw) To: Jens Rosenboom Cc: Linus Torvalds, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [-- Attachment #1: Type: TEXT/PLAIN, Size: 1217 bytes --] On Wed, 22 Jul 2009, Jens Rosenboom wrote: > On Wed, 2009-07-22 at 10:32 +0200, Krzysztof Oledzki wrote: >> >> On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: >> >>> >>> >>> On Tue, 21 Jul 2009, Linus Torvalds wrote: >>> >>>> >>>> >>>> On Tue, 21 Jul 2009, Linus Torvalds wrote: >>>>> >>>>> Anyway, I bet we can work around the compiler bug by just changing the >>>>> type of "i" from "unsigned char" to be a plain "int". >>>> >>>> IOW, like this. >>>> >>>> Linus >>>> >>>> --- >>>> drivers/video/fbmon.c | 4 ++-- >>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c >>>> index 5c1a2c0..af4a15c 100644 >>>> --- a/drivers/video/fbmon.c >>>> +++ b/drivers/video/fbmon.c >>>> @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) >>>> >>>> static int edid_checksum(unsigned char *edid) >>>> { >>>> - unsigned char i, csum = 0, all_null = 0; >>>> - int err = 0, fix = check_edid(edid); >>>> + unsigned csum = 0, all_null = 0; > > I guess Linus shouldn't have deleted the "char" here... Yep! However, it took me one more hour to find it. I should have read my mails more often. ;) Thanks. Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 8:32 ` Krzysztof Oledzki 2009-07-22 9:55 ` Krzysztof Oledzki 2009-07-22 9:58 ` Jens Rosenboom @ 2009-07-22 10:24 ` Troy Moure 2009-07-22 10:33 ` Dick Streefland 3 siblings, 0 replies; 43+ messages in thread From: Troy Moure @ 2009-07-22 10:24 UTC (permalink / raw) To: Krzysztof Oledzki Cc: Linus Torvalds, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > > > > Indeed, this simple change is enough to make my kernel bootable. However, > > there is still something wrong. My console is now 80x30 instead of 128x48: > > > > -Console: switching to colour frame buffer device 128x48 > > +Console: switching to colour frame buffer device 80x30 > > > > So, it looks like the loop may be still miscompiled. > > Yes. I took a look at the -fixed binary you sent out. edid_checksum() is now compiled to this (I added some notes on the side): ffffffff803b37ed: <edid_checksum>: 53 push %rbx 48 89 fb mov %rdi,%rbx %ebx = edid [... Calls to check_edid and fix_edid ... ] 31 c9 xor %ecx,%ecx csum = 0 31 f6 xor %esi,%esi all_null = 0 31 d2 xor %edx,%edx i = 0 L: 0f b6 04 1a movzbl (%rdx,%rbx,1),%eax %eax = *(i + edid) 48 ff c2 inc %rdx i++ 01 c1 add %eax,%ecx csum += %eax 09 c6 or %eax,%esi all_null |= %eax 48 81 fa 80 00 00 00 cmp $0x80,%rdx 75 ec jne L if i != 80 goto L 85 c9 test %ecx,%ecx 0f 94 c0 sete %al %al == (csum == 0) 85 f6 test %esi,%esi 5b pop %rbx 0f 95 c2 setne %dl %dl == (all_null == 0) 21 d0 and %edx,%eax 0f b6 c0 movzbl %al,%eax %eax == (%al && %dl) c3 retq return %eax The problem is that csum is stored in %ecx (a 32-bit register) at all times and is never truncated to a byte. In other words, the compiler is treating csum like it's an 'int', not an 'unsigned char'. > Here is a diff between a good and a bad kernel: > > -edid_checksum debug: csum=0, all_null=255, err=1 > -edid_checksum debug: csum=0, all_null=255, err=1 > -Console: switching to colour frame buffer device 128x48 > +edid_checksum debug: csum=6400, all_null=255, err=0 > +Console: switching to colour frame buffer device 80x30 > > In the good one the function is called twice and it returns err=1 (==OK). In > the bad kernel it returns 0 because csum!=0x00 (==6400). That makes sense - since csum is being treated like an 'int', it never wraps, so it just ends up holding the total sum of all the bytes, which apparently is 6400. Notice that 6400 % 256 == 0, so if it *had* wrapped, it would have ended up being 0, as expected. One "fix" might be to just make 'csum' an 'int' (since that's what the compiler seems to think anyway :p) and do the wrapping by hand (patch below, if you want to try this). However, I wouldn't be surprised if other kernel functions are also being miscompiled. It seems to me that any function that does arithmetic on 'unsigned char's and depends on the wrapping behaviour could potentially be broken... Best regards, Troy diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c index 5c1a2c0..6802b4c 100644 --- a/drivers/video/fbmon.c +++ b/drivers/video/fbmon.c @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) static int edid_checksum(unsigned char *edid) { - unsigned char i, csum = 0, all_null = 0; - int err = 0, fix = check_edid(edid); + unsigned char all_null = 0; + int i, csum = 0, err = 0, fix = check_edid(edid); if (fix) fix_edid(edid, fix); @@ -267,7 +267,7 @@ static int edid_checksum(unsigned char *edid) all_null |= edid[i]; } - if (csum == 0x00 && all_null) { + if ((csum & 0xff) == 0x00 && all_null) { /* checksum passed, everything's good */ err = 1; } ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 8:32 ` Krzysztof Oledzki ` (2 preceding siblings ...) 2009-07-22 10:24 ` Troy Moure @ 2009-07-22 10:33 ` Dick Streefland 3 siblings, 0 replies; 43+ messages in thread From: Dick Streefland @ 2009-07-22 10:33 UTC (permalink / raw) To: linux-kernel Krzysztof Oledzki <olel@ans.pl> wrote: | Here is a diff between a good and a bad kernel: | | -edid_checksum debug: csum=0, all_null=255, err=1 | -edid_checksum debug: csum=0, all_null=255, err=1 | -Console: switching to colour frame buffer device 128x48 | +edid_checksum debug: csum=6400, all_null=255, err=0 | +Console: switching to colour frame buffer device 80x30 | | In the good one the function is called twice and it returns err=1 (==OK). | In the bad kernel it returns 0 because csum!=0x00 (==6400). Linus accidentally dropped the "char" from the declaration of "csum" and "all_null": > - unsigned char i, csum = 0, all_null = 0; > - int err = 0, fix = check_edid(edid); > + unsigned csum = 0, all_null = 0; > + int i, err = 0, fix = check_edid(edid); -- Dick Streefland //// Altium BV dick.streefland@altium.nl (@ @) http://www.altium.com --------------------------------oOO--(_)--OOo--------------------------- ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 1:16 ` Linus Torvalds 2009-07-22 8:12 ` Krzysztof Oledzki @ 2009-07-22 13:48 ` Krzysztof Oledzki 2009-07-22 15:48 ` Linus Torvalds 1 sibling, 1 reply; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-22 13:48 UTC (permalink / raw) To: Linus Torvalds Cc: Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [-- Attachment #1: Type: TEXT/PLAIN, Size: 984 bytes --] On Tue, 21 Jul 2009, Linus Torvalds wrote: > > > On Tue, 21 Jul 2009, Linus Torvalds wrote: >> >> Anyway, I bet we can work around the compiler bug by just changing the >> type of "i" from "unsigned char" to be a plain "int". > > IOW, like this. > > Linus > > --- > drivers/video/fbmon.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c > index 5c1a2c0..af4a15c 100644 > --- a/drivers/video/fbmon.c > +++ b/drivers/video/fbmon.c > @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) > > static int edid_checksum(unsigned char *edid) > { > - unsigned char i, csum = 0, all_null = 0; > - int err = 0, fix = check_edid(edid); > + unsigned csum = 0, all_null = 0; > + int i, err = 0, fix = check_edid(edid); > > if (fix) > fix_edid(edid, fix); Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> On condition, that you keep "unsigned char" here. ;) Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 13:48 ` Krzysztof Oledzki @ 2009-07-22 15:48 ` Linus Torvalds 2009-07-29 14:57 ` Pavel Machek 0 siblings, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2009-07-22 15:48 UTC (permalink / raw) To: Krzysztof Oledzki Cc: Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > > > On Tue, 21 Jul 2009, Linus Torvalds wrote: > > > > IOW, like this. Yeah, I'm a moron. Not at _all_ like that. I wish I hadn't sent out the patch, you'd have done it correctly by hand from my description. > > diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c > > index 5c1a2c0..af4a15c 100644 > > --- a/drivers/video/fbmon.c > > +++ b/drivers/video/fbmon.c > > @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) > > > > static int edid_checksum(unsigned char *edid) > > { > > - unsigned char i, csum = 0, all_null = 0; > > - int err = 0, fix = check_edid(edid); > > + unsigned csum = 0, all_null = 0; > > + int i, err = 0, fix = check_edid(edid); I don't know where the 'char' disappeared, but that was obviously not intended. I just meant to move the 'i' from one line to the other. > > > > if (fix) > > fix_edid(edid, fix); > > Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> > > On condition, that you keep "unsigned char" here. ;) Indeed. I'll commit the fixed version. Thanks for testing and sorry for the idiot patch. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 15:48 ` Linus Torvalds @ 2009-07-29 14:57 ` Pavel Machek 2009-07-29 15:59 ` Linus Torvalds 0 siblings, 1 reply; 43+ messages in thread From: Pavel Machek @ 2009-07-29 14:57 UTC (permalink / raw) To: Linus Torvalds Cc: Krzysztof Oledzki, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor > > > diff --git a/drivers/video/fbmon.c b/drivers/video/fbmon.c > > > index 5c1a2c0..af4a15c 100644 > > > --- a/drivers/video/fbmon.c > > > +++ b/drivers/video/fbmon.c > > > @@ -256,8 +256,8 @@ static void fix_edid(unsigned char *edid, int fix) > > > > > > static int edid_checksum(unsigned char *edid) > > > { > > > - unsigned char i, csum = 0, all_null = 0; > > > - int err = 0, fix = check_edid(edid); > > > + unsigned csum = 0, all_null = 0; > > > + int i, err = 0, fix = check_edid(edid); > > I don't know where the 'char' disappeared, but that was obviously not > intended. I just meant to move the 'i' from one line to the other. > > > > > > > if (fix) > > > fix_edid(edid, fix); > > > > Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> > > > > On condition, that you keep "unsigned char" here. ;) > > Indeed. I'll commit the fixed version. Thanks for testing and sorry for > the idiot patch. So... we are going to just work around the gcc bug in the kernel? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-29 14:57 ` Pavel Machek @ 2009-07-29 15:59 ` Linus Torvalds 0 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2009-07-29 15:59 UTC (permalink / raw) To: Pavel Machek Cc: Krzysztof Oledzki, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Wed, 29 Jul 2009, Pavel Machek wrote: > > So... we are going to just work around the gcc bug in the kernel? Well, the gcc people hopefully will fix it in the 4.2.4 tree too. Also, it's not exactly the first time we work around compiler bugs. We've done it before, I'm sure we'll do it again. In this case, the work-around is trivial, and in many ways makes the code more "normal" (it's just a loop counter, might as well use an "int" for it), so there are no downsides to it. We could disallow gcc-4.2.4 entirely, of course, and have a big "this compiler is known to generate broken code" message and refuse to compile the kernel with it, but while that would be a "safer" approach, it would be rather user-unfriendly. Compiler bugs happen. They're really really annoying, and nasty to track down. But they aren't the end of the world, and the pattern of this particular bug doesn't seem like it would likely trigger anywhere else. For example, I suspect it really does need that loop induction variable to be 'unsigned char', and it really needs that limit to be exactly 128. It looks like a combination of loop optimization and broken range logic. I doubt it would hit in some random code that just happens to compare an unsigned char against 128 in general. And using 'unsigned char' as a lop induction variable is _very_ rare. Which is probably why the gcc bug happened in the first place - no testing. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 0:53 ` Linus Torvalds 2009-07-22 1:07 ` Linus Torvalds 2009-07-22 1:16 ` Linus Torvalds @ 2009-07-22 11:49 ` Krzysztof Oledzki 2009-07-22 13:27 ` Henrique de Moraes Holschuh ` (2 more replies) 2 siblings, 3 replies; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-22 11:49 UTC (permalink / raw) To: Linus Torvalds Cc: Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [-- Attachment #1: Type: TEXT/PLAIN, Size: 3312 bytes --] On Tue, 21 Jul 2009, Linus Torvalds wrote: > > [ Added Ian Lance Taylor to the cc, he knows the background, and unlike me > is competent with gcc. ] > > On Tue, 21 Jul 2009, Troy Moure wrote: >> >> I think I've found something interesting. Look at the the code generated >> for edid_checksum() in driver/video/fbmon.c. This is what I see for the >> -fno-strict-overflow kernel: > > Ooh. > > Bingo. You're 100% right, and you definitely found it (of course, there > may be _other_ cases like this, but that's certainly _one_ of the > problems, and probably the only one). > > Just out of curiosity, how did you find it? Now that I know where to look, > it's very obvious in the assembler diffs, but I didn't notice it until you > pointed it out just because there is so _much_ of the diffs... > > And yes, that's very much a compiler bug. And I also bet it's very easily > fixed. > > The code in question is this loop: > > #define EDID_LENGTH 128 > > unsigned char i, ... > > for (i = 0; i < EDID_LENGTH; i++) { > csum += edid[i]; > all_null |= edid[i]; > } > > and gcc -fno-strict-overflow has apparently decided that that is an > infinite loop, even though it clearly is not. So then the stupid and buggy > compiler will compile that loop (and the whole rest of the function) to > the "optimized" version that is just > > loop: > jmp loop; > > I even bet I know why: it looks at "unsigned char", and sees that it is an > 8-bit variable, and then it looks at "i < EDID_LENGTH" and sees that it is > a _signed_ comparison (it's signed because the C type rules mean that > 'unsigned char' will be extended to 'int' in an expression), and then it > decides that in a signed comparison an 8-bit entry is always going to be > smaller than 128. > > Anyway, I bet we can work around the compiler bug by just changing the > type of "i" from "unsigned char" to be a plain "int". > > Krzysztof? Mind testing that? > > Ian? This is Linux 2.6.27.27 compiled with gcc-4.2.4. I'm not seeing the > bug in the gcc I have on my machine (gcc-4.4.0), but the bug is very clear > (once you _find_ it, which was the problem) in the binaries that Krzysztof > posted. They're still at: > > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow.bz2 (Hangs) > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fwrapv.bz2 (OK) > http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fnone.bz2 (OK) > > and you can clearly see the 'edid_checksum' miscompilation in the objdump > disassembly. BTW: here is a simple testcase for this bug: --- fno-strict-overflow-fixed-bug.c --- #include <stdio.h> int main() { unsigned char i; for (i = 0; i < 128; i++) printf("loop %u\n", i); return 0; } --- cut here --- The code should be compiled with: cc -o fno-strict-overflow-fixed-bug -Os -fno-strict-overflow fno-strict-overflow-fixed-bug.c or: cc -o fno-strict-overflow-fixed-bug -O2 -fno-strict-overflow fno-strict-overflow-fixed-bug.c This bug does not exist with -O1 or if the loop is controlled by "i < 127" or "i < 129". So, we should make sure there is no unsigned char i; (...) for (i = 0; i < 128; i++) somewhere inside the kernel. Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 11:49 ` Krzysztof Oledzki @ 2009-07-22 13:27 ` Henrique de Moraes Holschuh 2009-07-22 13:45 ` Krzysztof Oledzki 2009-07-22 15:36 ` Ian Lance Taylor 2 siblings, 0 replies; 43+ messages in thread From: Henrique de Moraes Holschuh @ 2009-07-22 13:27 UTC (permalink / raw) To: Krzysztof Oledzki Cc: Linus Torvalds, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > BTW: here is a simple testcase for this bug: > > --- fno-strict-overflow-fixed-bug.c --- > #include <stdio.h> > > int main() { > > unsigned char i; > > for (i = 0; i < 128; i++) > printf("loop %u\n", i); > > return 0; > } > --- cut here --- > > The code should be compiled with: > cc -o fno-strict-overflow-fixed-bug -Os -fno-strict-overflow fno-strict-overflow-fixed-bug.c > or: > cc -o fno-strict-overflow-fixed-bug -O2 -fno-strict-overflow fno-strict-overflow-fixed-bug.c > > This bug does not exist with -O1 or if the loop is controlled by "i > < 127" or "i < 129". Thank you. Debian stable, ia32, gcc 4.2.4-6: BUG Debian stable, ia32, gcc 4.3.2-1.1: no bug (default) Debian unstable, ia32, gcc 4.2.4-6: BUG Debian unstable, ia32, gcc 4.3.3-14: no bug (default) Someone else already tested gcc 4.4.0, as no bug. I will file bugs against Debian gcc-4.2... -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 11:49 ` Krzysztof Oledzki 2009-07-22 13:27 ` Henrique de Moraes Holschuh @ 2009-07-22 13:45 ` Krzysztof Oledzki 2009-07-22 15:36 ` Ian Lance Taylor 2 siblings, 0 replies; 43+ messages in thread From: Krzysztof Oledzki @ 2009-07-22 13:45 UTC (permalink / raw) To: Linus Torvalds Cc: Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn, Ian Lance Taylor [-- Attachment #1: Type: TEXT/PLAIN, Size: 3878 bytes --] On Wed, 22 Jul 2009, Krzysztof Oledzki wrote: > > > On Tue, 21 Jul 2009, Linus Torvalds wrote: > >> >> [ Added Ian Lance Taylor to the cc, he knows the background, and unlike me >> is competent with gcc. ] >> >> On Tue, 21 Jul 2009, Troy Moure wrote: >>> >>> I think I've found something interesting. Look at the the code generated >>> for edid_checksum() in driver/video/fbmon.c. This is what I see for the >>> -fno-strict-overflow kernel: >> >> Ooh. >> >> Bingo. You're 100% right, and you definitely found it (of course, there >> may be _other_ cases like this, but that's certainly _one_ of the >> problems, and probably the only one). >> >> Just out of curiosity, how did you find it? Now that I know where to look, >> it's very obvious in the assembler diffs, but I didn't notice it until you >> pointed it out just because there is so _much_ of the diffs... >> >> And yes, that's very much a compiler bug. And I also bet it's very easily >> fixed. >> >> The code in question is this loop: >> >> #define EDID_LENGTH 128 >> >> unsigned char i, ... >> >> for (i = 0; i < EDID_LENGTH; i++) { >> csum += edid[i]; >> all_null |= edid[i]; >> } >> >> and gcc -fno-strict-overflow has apparently decided that that is an >> infinite loop, even though it clearly is not. So then the stupid and buggy >> compiler will compile that loop (and the whole rest of the function) to >> the "optimized" version that is just >> >> loop: >> jmp loop; >> >> I even bet I know why: it looks at "unsigned char", and sees that it is an >> 8-bit variable, and then it looks at "i < EDID_LENGTH" and sees that it is >> a _signed_ comparison (it's signed because the C type rules mean that >> 'unsigned char' will be extended to 'int' in an expression), and then it >> decides that in a signed comparison an 8-bit entry is always going to be >> smaller than 128. >> >> Anyway, I bet we can work around the compiler bug by just changing the >> type of "i" from "unsigned char" to be a plain "int". >> >> Krzysztof? Mind testing that? >> >> Ian? This is Linux 2.6.27.27 compiled with gcc-4.2.4. I'm not seeing the >> bug in the gcc I have on my machine (gcc-4.4.0), but the bug is very clear >> (once you _find_ it, which was the problem) in the binaries that Krzysztof >> posted. They're still at: >> >> http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow.bz2 >> (Hangs) >> http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fwrapv.bz2 >> (OK) >> http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fnone.bz2 >> (OK) >> >> and you can clearly see the 'edid_checksum' miscompilation in the objdump >> disassembly. > > BTW: here is a simple testcase for this bug: > > --- fno-strict-overflow-fixed-bug.c --- > #include <stdio.h> > > int main() { > > unsigned char i; > > for (i = 0; i < 128; i++) > printf("loop %u\n", i); > > return 0; > } > --- cut here --- Or this better one (no infinite loop): --- cut here --- #include <stdio.h> int main() { unsigned char i, j=0; for (i = 0; i <= 127; i++) { if (!i && j++) { printf("Buggy GCC\n"); return 1; } } printf("GCC is OK\n"); return 0; } --- cut here --- > The code should be compiled with: > cc -o fno-strict-overflow-fixed-bug -Os -fno-strict-overflow > fno-strict-overflow-fixed-bug.c > or: > cc -o fno-strict-overflow-fixed-bug -O2 -fno-strict-overflow > fno-strict-overflow-fixed-bug.c > > This bug does not exist with -O1 or if the loop is controlled by "i < 127" or > "i < 129". > > So, we should make sure there is no > unsigned char i; (...) for (i = 0; i < 128; i++) > somewhere inside the kernel. Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-22 11:49 ` Krzysztof Oledzki 2009-07-22 13:27 ` Henrique de Moraes Holschuh 2009-07-22 13:45 ` Krzysztof Oledzki @ 2009-07-22 15:36 ` Ian Lance Taylor 2 siblings, 0 replies; 43+ messages in thread From: Ian Lance Taylor @ 2009-07-22 15:36 UTC (permalink / raw) To: Krzysztof Oledzki Cc: Linus Torvalds, Troy Moure, Greg KH, Linux Kernel Mailing List, Andrew Morton, stable, lwn Krzysztof Oledzki <olel@ans.pl> writes: > BTW: here is a simple testcase for this bug: > > --- fno-strict-overflow-fixed-bug.c --- > #include <stdio.h> > > int main() { > > unsigned char i; > > for (i = 0; i < 128; i++) > printf("loop %u\n", i); > > return 0; > } > --- cut here --- I can confirm that this is broken in gcc 4.2 using -fno-strict-overflow, and works correctly in gcc 4.3 and later. Sigh. Ian ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-20 15:10 ` Greg KH 2009-07-20 16:01 ` Linus Torvalds @ 2009-07-23 17:33 ` Krzysztof Olędzki 2009-07-24 21:13 ` Greg KH 1 sibling, 1 reply; 43+ messages in thread From: Krzysztof Olędzki @ 2009-07-23 17:33 UTC (permalink / raw) To: Greg KH; +Cc: linux-kernel, Andrew Morton, torvalds, stable, lwn On 2009-07-20 17:10, Greg KH wrote: > On Mon, Jul 20, 2009 at 01:51:33PM +0200, Krzysztof Oledzki wrote: >> On 2009-07-20 06:06, Greg KH wrote: >>> I'm announcing the release of the 2.6.27.26 kernel. All users of the >>> 2.6.27 kernel series are very strongly encouraged to upgrade. >>> >>> I'll also be replying to this message with a copy of the patch between >>> 2.6.27.26 and 2.6.27.27 >>> >>> The updated 2.6.27.y git tree can be found at: >>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.27.y.git >>> and can be browsed at the normal kernel.org git web browser: >>> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=summary >>> >>> thanks, >> <CUT> >>> Linus Torvalds (1): >>> Don't use '-fwrapv' compiler option: it's buggy in gcc-4.1.x >> I'm very sorry to say that but 2.6.27.27 compiled with gcc-4.2.4 >> hangs during boot and it is caused by this fix. >> >> I extended http://bugzilla.kernel.org/show_bug.cgi?id=13012 with >> additional info & .config. > > Wierd. Linus, any thoughts? Do we need to check the version of the > compiler that we are using before turning that option off? This bug is now fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3730793d457fed79a6d49bae72996d458c8e4f2d Could you please include this patch in the next -stable releases? BTW: the problem exists in gcc-4.2.4 not gcc-2.4.2 so I think it is worth to change the subject when adding the patch. ;) Best regards, Krzysztof Olędzki ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Linux 2.6.27.27 2009-07-23 17:33 ` Krzysztof Olędzki @ 2009-07-24 21:13 ` Greg KH 0 siblings, 0 replies; 43+ messages in thread From: Greg KH @ 2009-07-24 21:13 UTC (permalink / raw) To: Krzysztof Olędzki; +Cc: linux-kernel, Andrew Morton, torvalds, stable, lwn On Thu, Jul 23, 2009 at 07:33:23PM +0200, Krzysztof Olędzki wrote: > On 2009-07-20 17:10, Greg KH wrote: > > On Mon, Jul 20, 2009 at 01:51:33PM +0200, Krzysztof Oledzki wrote: > >> On 2009-07-20 06:06, Greg KH wrote: > >>> I'm announcing the release of the 2.6.27.26 kernel. All users of the > >>> 2.6.27 kernel series are very strongly encouraged to upgrade. > >>> > >>> I'll also be replying to this message with a copy of the patch between > >>> 2.6.27.26 and 2.6.27.27 > >>> > >>> The updated 2.6.27.y git tree can be found at: > >>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.27.y.git > >>> and can be browsed at the normal kernel.org git web browser: > >>> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=summary > >>> > >>> thanks, > >> <CUT> > >>> Linus Torvalds (1): > >>> Don't use '-fwrapv' compiler option: it's buggy in gcc-4.1.x > >> I'm very sorry to say that but 2.6.27.27 compiled with gcc-4.2.4 > >> hangs during boot and it is caused by this fix. > >> > >> I extended http://bugzilla.kernel.org/show_bug.cgi?id=13012 with > >> additional info & .config. > > > > Wierd. Linus, any thoughts? Do we need to check the version of the > > compiler that we are using before turning that option off? > > This bug is now fixed by > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3730793d457fed79a6d49bae72996d458c8e4f2d > > Could you please include this patch in the next -stable releases? Yes, I have it queued up to go out today :) > BTW: the problem exists in gcc-4.2.4 not gcc-2.4.2 so I think it is > worth to change the subject when adding the patch. ;) Hm, I'll see about that. thanks, greg k-h ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2009-07-29 16:01 UTC | newest] Thread overview: 43+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-20 4:06 Linux 2.6.27.27 Greg KH 2009-07-20 4:07 ` Greg KH 2009-07-20 11:51 ` Krzysztof Oledzki 2009-07-20 15:10 ` Greg KH 2009-07-20 16:01 ` Linus Torvalds 2009-07-20 21:45 ` Krzysztof Oledzki 2009-07-20 22:08 ` Linus Torvalds 2009-07-20 23:47 ` Marc Dionne 2009-07-20 23:56 ` Linus Torvalds 2009-07-21 0:37 ` Marc Dionne 2009-07-21 1:01 ` Linus Torvalds 2009-07-21 6:40 ` Krzysztof Oledzki 2009-07-21 1:05 ` Linus Torvalds 2009-07-21 2:38 ` Marc Dionne 2009-07-21 6:33 ` Krzysztof Oledzki 2009-07-21 10:16 ` Krzysztof Oledzki 2009-07-21 16:11 ` Linus Torvalds 2009-07-21 19:15 ` Linus Torvalds 2009-07-21 21:34 ` Troy Moure 2009-07-22 0:53 ` Linus Torvalds 2009-07-22 1:07 ` Linus Torvalds 2009-07-22 6:16 ` Troy Moure 2009-07-22 15:58 ` Linus Torvalds 2009-07-22 1:16 ` Linus Torvalds 2009-07-22 8:12 ` Krzysztof Oledzki 2009-07-22 8:32 ` Krzysztof Oledzki 2009-07-22 9:55 ` Krzysztof Oledzki 2009-07-22 10:44 ` Krzysztof Oledzki 2009-07-22 9:58 ` Jens Rosenboom 2009-07-22 10:27 ` Troy Moure 2009-07-22 10:54 ` Krzysztof Oledzki 2009-07-22 10:24 ` Troy Moure 2009-07-22 10:33 ` Dick Streefland 2009-07-22 13:48 ` Krzysztof Oledzki 2009-07-22 15:48 ` Linus Torvalds 2009-07-29 14:57 ` Pavel Machek 2009-07-29 15:59 ` Linus Torvalds 2009-07-22 11:49 ` Krzysztof Oledzki 2009-07-22 13:27 ` Henrique de Moraes Holschuh 2009-07-22 13:45 ` Krzysztof Oledzki 2009-07-22 15:36 ` Ian Lance Taylor 2009-07-23 17:33 ` Krzysztof Olędzki 2009-07-24 21:13 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox