* oops/warning report for the week of November 26, 2008 @ 2008-11-26 23:11 Arjan van de Ven 2008-11-27 0:05 ` Jesse Barnes 2008-11-27 11:52 ` Ingo Molnar 0 siblings, 2 replies; 16+ messages in thread From: Arjan van de Ven @ 2008-11-26 23:11 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes In collecting this report, oopses and warnings with versions prior to 2.6.27 are ignored. This week, a total of 5450 oopses and warnings have been reported of version 2.6.27+, compared to 2198 reports in the previous week. This report is a bit different than the previous weeks; all 2.6.26 and earlier issues are no longer used, which means the top 12 has shuffled quite a bit, with some new star appearances. Also I've reworked the "are these two backtraces the same" algorithm; the website should now be presenting a more compact/concise view due to having the backtraces consolidated in a much more logical (for the human) way. Per file statistics 936 external/virtualbox/module 602 drivers/pci/slot.c 455 drivers/net/wireless/iwlwifi/iwl-tx.c 364 kernel/power/main.c 274 drivers/net/r8169.c 231 drivers/net/wireless/iwlwifi/iwl-3945-rs.c 231 fs/jbd/journal.c 227 arch/x86/include/asm/mtrr.h 147 drivers/ata/libata-sff.c 137 drivers/net/sis900.c 71 net/ipv4/tcp.c 62 drivers/gpu/drm/radeon/radeon_cp.c Rank 1: VBoxDrvLinuxIOCtl (warning) Reported 934 times (1635 total reports) [external] bug in the VirtualBox drivers This warning was last seen in version 2.6.28-rc3, and first seen in 2.6.25.11. More info: http://www.kerneloops.org/searchweek.php?search=VBoxDrvLinuxIOCtl Rank 2: pci_create_slot (warning) Reported 603 times (639 total reports) BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs This warning was last seen in version 2.6.27.5, and first seen in 2.6.27-rc7-git1. More info: http://www.kerneloops.org/searchweek.php?search=pci_create_slot Rank 3: iwl_tx_cmd_complete (warning) Reported 455 times (693 total reports) Bug in the IWL wireless driver; partial fix available This warning was last seen in version 2.6.28-rc4, and first seen in 2.6.27-rc9. More info: http://www.kerneloops.org/searchweek.php?search=iwl_tx_cmd_complete Rank 4: suspend_test_finish (warning) Reported 362 times (1202 total reports) Fedora is shipping with the suspend test on.. and it's failing everywhere. The patch to report what fails is in 2.6.28-rc6 and later This warning was last seen in version 2.6.28-rc1, and first seen in 2.6.27-rc0-git14. More info: http://www.kerneloops.org/searchweek.php?search=suspend_test_finish Rank 5: dev_watchdog(r8169) (oops) Reported 274 times (1414 total reports) Network driver not handling timeouts itself. This oops was last seen in version 2.6.28-rc4, and first seen in 2.6.26.6. More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(r8169) Rank 6: rs_get_rate (oops) Reported 232 times (1152 total reports) Bug in the Intel IWL wireless drivers This oops was last seen in version 2.6.27.5, and first seen in 2.6.25-rc2-git5. More info: http://www.kerneloops.org/searchweek.php?search=rs_get_rate Rank 7: journal_update_superblock (warning) Reported 231 times (6506 total reports) Likely caused by the user removing a USB stick while mounted This warning was last seen in version 2.6.27.7, and first seen in 2.6.24-rc6-git1. More info: http://www.kerneloops.org/searchweek.php?search=journal_update_superblock Rank 8: mtrr_trim_uncached_memory (warning) Reported 227 times (619 total reports) There is a high number of machines where our MTRR checks trigger. I suspect we are too picky in accepting the MTRR configuration. This warning was last seen in version 2.6.27.5, and first seen in 2.6.24. More info: http://www.kerneloops.org/searchweek.php?search=mtrr_trim_uncached_memory Rank 9: __atapi_pio_bytes (warning) Reported 146 times (224 total reports) Alan said this was due to some other layer giving the libata drivers a weird scatter gather list. It just happens a lot, and somehow it mostly happens in virtualized environments This warning was last seen in version 2.6.27.5, and first seen in 2.6.27.4. More info: http://www.kerneloops.org/searchweek.php?search=__atapi_pio_bytes Rank 10: dev_watchdog(sis900) (oops) Reported 137 times (1538 total reports) This oops was last seen in version 2.6.27.6, and first seen in 2.6.26-rc4-git2. More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(sis900) Rank 11: tcp_recvmsg (warning) Reported 71 times (167 total reports) This warning was last seen in version 2.6.27.5, and first seen in 2.6.25. More info: http://www.kerneloops.org/searchweek.php?search=tcp_recvmsg Rank 12: dev_watchdog(atl1) (oops) Reported 56 times (109 total reports) This oops was last seen in version 2.6.27.5, and first seen in 2.6.26.6. More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(atl1) Rank 13: nv_set_page_attrib_cached (warning) Reported 56 times (65 total reports) [external] bug in the binary nvidia driver warning only shows up in tainted kernels This warning was last seen in version 2.6.27.5, and first seen in 2.6.27.5. More info: http://www.kerneloops.org/searchweek.php?search=nv_set_page_attrib_cached ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-26 23:11 oops/warning report for the week of November 26, 2008 Arjan van de Ven @ 2008-11-27 0:05 ` Jesse Barnes 2008-11-27 11:48 ` Ingo Molnar 2008-11-27 19:42 ` Alex Chiang 2008-11-27 11:52 ` Ingo Molnar 1 sibling, 2 replies; 16+ messages in thread From: Jesse Barnes @ 2008-11-27 0:05 UTC (permalink / raw) To: Arjan van de Ven Cc: Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote: > Rank 2: pci_create_slot (warning) > Reported 603 times (639 total reports) > BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs > This warning was last seen in version 2.6.27.5, and first seen in > 2.6.27-rc7-git1. More info: > http://www.kerneloops.org/searchweek.php?search=pci_create_slot IIRC we fixed this one post-2.6.27. I didn't send the patches back to -stable because they were a bit big, but if someone were sufficiently motiviated I'm sure the backport wouldn't be that hard... Jesse ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 0:05 ` Jesse Barnes @ 2008-11-27 11:48 ` Ingo Molnar 2008-11-27 19:42 ` Alex Chiang 1 sibling, 0 replies; 16+ messages in thread From: Ingo Molnar @ 2008-11-27 11:48 UTC (permalink / raw) To: Jesse Barnes Cc: Arjan van de Ven, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox * Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote: > > Rank 2: pci_create_slot (warning) > > Reported 603 times (639 total reports) > > BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs > > This warning was last seen in version 2.6.27.5, and first seen in > > 2.6.27-rc7-git1. More info: > > http://www.kerneloops.org/searchweek.php?search=pci_create_slot > > IIRC we fixed this one post-2.6.27. I didn't send the patches back > to -stable because they were a bit big, but if someone were > sufficiently motiviated I'm sure the backport wouldn't be that > hard... having the commit IDs mentioned here would be nice, should anyone feel motivated. Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 0:05 ` Jesse Barnes 2008-11-27 11:48 ` Ingo Molnar @ 2008-11-27 19:42 ` Alex Chiang 2008-11-27 19:49 ` Arjan van de Ven 1 sibling, 1 reply; 16+ messages in thread From: Alex Chiang @ 2008-11-27 19:42 UTC (permalink / raw) To: Jesse Barnes Cc: Arjan van de Ven, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox * Jesse Barnes <jbarnes@virtuousgeek.org>: > On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote: > > Rank 2: pci_create_slot (warning) > > Reported 603 times (639 total reports) > > BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs > > This warning was last seen in version 2.6.27.5, and first seen in > > 2.6.27-rc7-git1. More info: > > http://www.kerneloops.org/searchweek.php?search=pci_create_slot > > IIRC we fixed this one post-2.6.27. I didn't send the patches back to -stable > because they were a bit big, but if someone were sufficiently motiviated I'm > sure the backport wouldn't be that hard... I can do this backport. A few questions though... We're seeing a proliferation of this one presumably because Fedora10 uses 2.6.27.5 as a starting point? If I just backport the fixes against Greg's latest tree, do I have to do anything special to make sure they get into the Fedora kernel? Also, does kerneloops capture any of the machine information, like DMI output, etc. or does it just get the oops? It would be nice to see which machines out there have the broken BIOS that causes this oops. Thanks. /ac ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 19:42 ` Alex Chiang @ 2008-11-27 19:49 ` Arjan van de Ven 0 siblings, 0 replies; 16+ messages in thread From: Arjan van de Ven @ 2008-11-27 19:49 UTC (permalink / raw) To: Alex Chiang Cc: Jesse Barnes, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox On Thu, 27 Nov 2008 12:42:10 -0700 Alex Chiang <achiang@hp.com> wrote: > * Jesse Barnes <jbarnes@virtuousgeek.org>: > > On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote: > > > Rank 2: pci_create_slot (warning) > > > Reported 603 times (639 total reports) > > > BIOS provided duplicated slot names, the PCI layer > > > blindly passes to sysfs This warning was last seen in version > > > 2.6.27.5, and first seen in 2.6.27-rc7-git1. More info: > > > http://www.kerneloops.org/searchweek.php?search=pci_create_slot > > > > IIRC we fixed this one post-2.6.27. I didn't send the patches back > > to -stable because they were a bit big, but if someone were > > sufficiently motiviated I'm sure the backport wouldn't be that > > hard... > > I can do this backport. A few questions though... > > We're seeing a proliferation of this one presumably because > Fedora10 uses 2.6.27.5 as a starting point? If I just backport > the fixes against Greg's latest tree, do I have to do anything > special to make sure they get into the Fedora kernel? Fedora tends to follow -stable quite closely so that ought to be enough > > Also, does kerneloops capture any of the machine information, > like DMI output, etc. or does it just get the oops? It would be > nice to see which machines out there have the broken BIOS that > causes this oops. right now we do this for oopses, but not for warnings ;( I'll make a patch to add this; it's generally useful. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-26 23:11 oops/warning report for the week of November 26, 2008 Arjan van de Ven 2008-11-27 0:05 ` Jesse Barnes @ 2008-11-27 11:52 ` Ingo Molnar 2008-11-27 17:02 ` Jesse Barnes 2008-11-27 18:01 ` Arjan van de Ven 1 sibling, 2 replies; 16+ messages in thread From: Ingo Molnar @ 2008-11-27 11:52 UTC (permalink / raw) To: Arjan van de Ven, Yinghai Lu Cc: Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes * Arjan van de Ven <arjan@linux.intel.com> wrote: > Rank 8: mtrr_trim_uncached_memory (warning) > Reported 227 times (619 total reports) > There is a high number of machines where our MTRR checks > trigger. I suspect we are too picky in accepting the MTRR > configuration. the warning here means: "the BIOS messed up but we fixed it up for you just fine". Should we print a DMI descriptor so that it can be tracked back to the bad BIOSen in question? Or should we (partially) silence the warning itself? Those BIOS bugs need fixing really: older kernels will boot up with bad MTRR settings - resulting in a super-slow system or other weirdnesses. We can tone down the message so that it doesnt show up in kerneloops.org. It's up to you. Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 11:52 ` Ingo Molnar @ 2008-11-27 17:02 ` Jesse Barnes 2008-11-27 18:01 ` Arjan van de Ven 1 sibling, 0 replies; 16+ messages in thread From: Jesse Barnes @ 2008-11-27 17:02 UTC (permalink / raw) To: Ingo Molnar Cc: Arjan van de Ven, Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox On Thursday, November 27, 2008 3:52 am Ingo Molnar wrote: > * Arjan van de Ven <arjan@linux.intel.com> wrote: > > Rank 8: mtrr_trim_uncached_memory (warning) > > Reported 227 times (619 total reports) > > There is a high number of machines where our MTRR checks > > trigger. I suspect we are too picky in accepting the MTRR > > configuration. > > the warning here means: "the BIOS messed up but we fixed it up for > you just fine". > > Should we print a DMI descriptor so that it can be tracked back to the > bad BIOSen in question? Or should we (partially) silence the warning > itself? Those BIOS bugs need fixing really: older kernels will boot up > with bad MTRR settings - resulting in a super-slow system or other > weirdnesses. We can tone down the message so that it doesnt show up in > kerneloops.org. It's up to you. I actually think we're doing something wrong here, since so many platforms have this behavior. It's likely that there's an undocumented, additional check needed to determine whether a slot is hot pluggable. Matthew Garrett recently posted a patch to check for ACPI _RMV methods, which should be an improvement. I'll be putting that into linux-next soon for testing. -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 11:52 ` Ingo Molnar 2008-11-27 17:02 ` Jesse Barnes @ 2008-11-27 18:01 ` Arjan van de Ven 2008-11-27 20:18 ` Ingo Molnar 1 sibling, 1 reply; 16+ messages in thread From: Arjan van de Ven @ 2008-11-27 18:01 UTC (permalink / raw) To: Ingo Molnar Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes Ingo Molnar wrote: > * Arjan van de Ven <arjan@linux.intel.com> wrote: > >> Rank 8: mtrr_trim_uncached_memory (warning) >> Reported 227 times (619 total reports) >> There is a high number of machines where our MTRR checks >> trigger. I suspect we are too picky in accepting the MTRR >> configuration. > > the warning here means: "the BIOS messed up but we fixed it up for > you just fine". I don't believe that right now. we see so many of these, including many "there's no MTRRs at all", that I am seriously suspecting that our code is just incorrect somehow and triggering too much. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 18:01 ` Arjan van de Ven @ 2008-11-27 20:18 ` Ingo Molnar 2008-11-27 20:28 ` Arjan van de Ven 0 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2008-11-27 20:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes * Arjan van de Ven <arjan@linux.intel.com> wrote: > Ingo Molnar wrote: >> * Arjan van de Ven <arjan@linux.intel.com> wrote: >> >>> Rank 8: mtrr_trim_uncached_memory (warning) >>> Reported 227 times (619 total reports) >>> There is a high number of machines where our MTRR checks trigger. I >>> suspect we are too picky in accepting the MTRR configuration. >> >> the warning here means: "the BIOS messed up but we fixed it up for you >> just fine". > > I don't believe that right now. we see so many of these, including > many "there's no MTRRs at all", that I am seriously suspecting that > our code is just incorrect somehow and triggering too much. well we looked at existing reports and Linux was right to fix them up. Show us one that is incorrect, then we can fix it up. the "no MTRR's" are vmware/(also qemu?) guests not implementing a full CPU emulation. Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:18 ` Ingo Molnar @ 2008-11-27 20:28 ` Arjan van de Ven 2008-11-27 20:47 ` Ingo Molnar ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Arjan van de Ven @ 2008-11-27 20:28 UTC (permalink / raw) To: Ingo Molnar Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes On Thu, 27 Nov 2008 21:18:36 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * Arjan van de Ven <arjan@linux.intel.com> wrote: > > > Ingo Molnar wrote: > >> * Arjan van de Ven <arjan@linux.intel.com> wrote: > >> > >>> Rank 8: mtrr_trim_uncached_memory (warning) > >>> Reported 227 times (619 total reports) > >>> There is a high number of machines where our MTRR checks > >>> trigger. I suspect we are too picky in accepting the MTRR > >>> configuration. > >> > >> the warning here means: "the BIOS messed up but we fixed it up for > >> you just fine". > > > > I don't believe that right now. we see so many of these, including > > many "there's no MTRRs at all", that I am seriously suspecting that > > our code is just incorrect somehow and triggering too much. > > well we looked at existing reports and Linux was right to fix them > up. Show us one that is incorrect, then we can fix it up. > > the "no MTRR's" are vmware/(also qemu?) guests not implementing a > full CPU emulation. ... and it's still our fault in part, since we don't even check to see if a cpu claims to support MTRR before complaining about it... easy to fix though: >From 7e987ae541c41ce908b414fee9d8e2fd2099a083 Mon Sep 17 00:00:00 2001 From: Arjan van de Ven <arjan@linux.intel.com> Date: Thu, 27 Nov 2008 12:25:47 -0800 Subject: [PATCH] x86: make sure the CPU advertizes MTRR support before complaining about the lack thereoff... We complain loudly if a CPU does not have MTRR support... but we don't check if the CPU exposes MTRR support in the CPUID flags first. While this might not fix all of the broken virtualization systems out there, it will at least fix those that properly don't advertize things they don't support. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> --- arch/x86/kernel/cpu/mtrr/main.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index 1159e26..0044e61 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn) * Make sure we only trim uncachable memory on machines that * support the Intel MTRR architecture: */ + if (!cpu_has_mtrr) + return 0; if (!is_cpu(INTEL) || disable_mtrr_trim) return 0; rdmsr(MTRRdefType_MSR, def, dummy); -- 1.6.0.4 -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:28 ` Arjan van de Ven @ 2008-11-27 20:47 ` Ingo Molnar 2008-11-27 20:53 ` Arjan van de Ven 2008-11-27 21:18 ` H. Peter Anvin ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2008-11-27 20:47 UTC (permalink / raw) To: Arjan van de Ven Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes, H. Peter Anvin * Arjan van de Ven <arjan@linux.intel.com> wrote: > On Thu, 27 Nov 2008 21:18:36 +0100 > Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Arjan van de Ven <arjan@linux.intel.com> wrote: > > > > > Ingo Molnar wrote: > > >> * Arjan van de Ven <arjan@linux.intel.com> wrote: > > >> > > >>> Rank 8: mtrr_trim_uncached_memory (warning) > > >>> Reported 227 times (619 total reports) > > >>> There is a high number of machines where our MTRR checks > > >>> trigger. I suspect we are too picky in accepting the MTRR > > >>> configuration. > > >> > > >> the warning here means: "the BIOS messed up but we fixed it up for > > >> you just fine". > > > > > > I don't believe that right now. we see so many of these, including > > > many "there's no MTRRs at all", that I am seriously suspecting that > > > our code is just incorrect somehow and triggering too much. > > > > well we looked at existing reports and Linux was right to fix them > > up. Show us one that is incorrect, then we can fix it up. > > > > the "no MTRR's" are vmware/(also qemu?) guests not implementing a > > full CPU emulation. > > ... and it's still our fault in part, since we don't even check to > see if a cpu claims to support MTRR before complaining about it... > > easy to fix though: IIRC the problem is that vmware _does_ claim that it supports MTRRs. Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:47 ` Ingo Molnar @ 2008-11-27 20:53 ` Arjan van de Ven 2008-11-28 8:34 ` Ingo Molnar 0 siblings, 1 reply; 16+ messages in thread From: Arjan van de Ven @ 2008-11-27 20:53 UTC (permalink / raw) To: Ingo Molnar Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes, H. Peter Anvin On Thu, 27 Nov 2008 21:47:14 +0100 Ingo Molnar <mingo@elte.hu> wrote: > IIRC the problem is that vmware _does_ claim that it supports MTRRs. it might. but even if they would fix that, we would still WARN ( at least we should do our side correctly... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:53 ` Arjan van de Ven @ 2008-11-28 8:34 ` Ingo Molnar 0 siblings, 0 replies; 16+ messages in thread From: Ingo Molnar @ 2008-11-28 8:34 UTC (permalink / raw) To: Arjan van de Ven Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes, H. Peter Anvin * Arjan van de Ven <arjan@linux.intel.com> wrote: > On Thu, 27 Nov 2008 21:47:14 +0100 > Ingo Molnar <mingo@elte.hu> wrote: > > > IIRC the problem is that vmware _does_ claim that it supports MTRRs. > > it might. > but even if they would fix that, we would still WARN ( > at least we should do our side correctly... As pointed out in other parts of the thread, that is not the case. Anyway, as i said it in the onset, if you think we should remove the warning altogether, or tweak it, we can do that - it is important to have relevant warnings show up in kerneloops.org. To sum it up: the only remaining MTRR warnings we know of are either: 1) apparently genuine BIOS bugs that do cause problems if the (new) kernel does not fix them up. The MTRR warning is relevant and correct in those cases. or: 2) sucky virtualization solutions that cheat the guest OS by faking "MTRR support" in the CPUID info, but not actually showing any MTRRs. These virtualization solutions do not even properly identify themselves to the kernel. The MTRR warning is unnecessary in this case. So what we did in the x86 tree was remove the warning in the second case - is to properly identify vmware (and in general, virtualization) guests. It was not a simple oneliner: earth4:~/tip> gll linus..x86/detect-hyper 4e42ebd: x86: hypervisor - fix sparse warnings c450d78: x86: vmware - fix sparse warnings fd8cd7e: x86: vmware: look for DMI string in the product serial key 6bdbfe9: x86: VMware: Fix vmware_get_tsc code 395628e: x86: Skip verification by the watchdog for TSC clocksource. eca0cd0: x86: Add a synthetic TSC_RELIABLE feature bit. 88b094f: x86: Hypervisor detection and get tsc_freq from hypervisor 49ab56a: x86: add X86_FEATURE_HYPERVISOR feature bit b2bcc7b: x86: add a synthetic TSC_RELIABLE feature bit and it will benefit vmware guests in many more areas than just a sharper MTRR warning message. That code is queued up for v2.6.29. Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:28 ` Arjan van de Ven 2008-11-27 20:47 ` Ingo Molnar @ 2008-11-27 21:18 ` H. Peter Anvin 2008-11-27 21:18 ` Yinghai Lu 2008-11-27 21:42 ` H. Peter Anvin 3 siblings, 0 replies; 16+ messages in thread From: H. Peter Anvin @ 2008-11-27 21:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes Arjan van de Ven wrote: > + if (!cpu_has_mtrr) > + return 0; > if (!is_cpu(INTEL) || disable_mtrr_trim) > return 0; > rdmsr(MTRRdefType_MSR, def, dummy); cpu_has_mtrr there should presumably replace is_cpu(INTEL). I'm not sure if this can be replaced by use_intel(); in particular use_intel() relies on mtrr_if having been initialized. Looking... -hpa (out of town for Thanksgiving) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:28 ` Arjan van de Ven 2008-11-27 20:47 ` Ingo Molnar 2008-11-27 21:18 ` H. Peter Anvin @ 2008-11-27 21:18 ` Yinghai Lu 2008-11-27 21:42 ` H. Peter Anvin 3 siblings, 0 replies; 16+ messages in thread From: Yinghai Lu @ 2008-11-27 21:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes Arjan van de Ven wrote: > On Thu, 27 Nov 2008 21:18:36 +0100 > Ingo Molnar <mingo@elte.hu> wrote: > >> * Arjan van de Ven <arjan@linux.intel.com> wrote: >> >>> Ingo Molnar wrote: >>>> * Arjan van de Ven <arjan@linux.intel.com> wrote: >>>> >>>>> Rank 8: mtrr_trim_uncached_memory (warning) >>>>> Reported 227 times (619 total reports) >>>>> There is a high number of machines where our MTRR checks >>>>> trigger. I suspect we are too picky in accepting the MTRR >>>>> configuration. >>>> the warning here means: "the BIOS messed up but we fixed it up for >>>> you just fine". >>> I don't believe that right now. we see so many of these, including >>> many "there's no MTRRs at all", that I am seriously suspecting that >>> our code is just incorrect somehow and triggering too much. >> well we looked at existing reports and Linux was right to fix them >> up. Show us one that is incorrect, then we can fix it up. >> >> the "no MTRR's" are vmware/(also qemu?) guests not implementing a >> full CPU emulation. > > ... and it's still our fault in part, since we don't even check to see > if a cpu claims to support MTRR before complaining about it... > > easy to fix though: > > From 7e987ae541c41ce908b414fee9d8e2fd2099a083 Mon Sep 17 00:00:00 2001 > From: Arjan van de Ven <arjan@linux.intel.com> > Date: Thu, 27 Nov 2008 12:25:47 -0800 > Subject: [PATCH] x86: make sure the CPU advertizes MTRR support before complaining about the lack thereoff... > > We complain loudly if a CPU does not have MTRR support... but we don't check if the CPU > exposes MTRR support in the CPUID flags first. While this might not fix all of the > broken virtualization systems out there, it will at least fix those that properly don't > advertize things they don't support. > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > --- > arch/x86/kernel/cpu/mtrr/main.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c > index 1159e26..0044e61 100644 > --- a/arch/x86/kernel/cpu/mtrr/main.c > +++ b/arch/x86/kernel/cpu/mtrr/main.c > @@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn) > * Make sure we only trim uncachable memory on machines that > * support the Intel MTRR architecture: > */ > + if (!cpu_has_mtrr) > + return 0; that is not needed, we already check that in mtrr_bp_init before this function is called, and it will assign mtrr_if and #define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd) will make it sure mtrr is there. ps: here INTEL mean any cpu has same interface like intel cpu's YH > if (!is_cpu(INTEL) || disable_mtrr_trim) > return 0; > rdmsr(MTRRdefType_MSR, def, dummy); ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:28 ` Arjan van de Ven ` (2 preceding siblings ...) 2008-11-27 21:18 ` Yinghai Lu @ 2008-11-27 21:42 ` H. Peter Anvin 3 siblings, 0 replies; 16+ messages in thread From: H. Peter Anvin @ 2008-11-27 21:42 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes Arjan van de Ven wrote: > > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c > index 1159e26..0044e61 100644 > --- a/arch/x86/kernel/cpu/mtrr/main.c > +++ b/arch/x86/kernel/cpu/mtrr/main.c > @@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn) > * Make sure we only trim uncachable memory on machines that > * support the Intel MTRR architecture: > */ > + if (!cpu_has_mtrr) > + return 0; > if (!is_cpu(INTEL) || disable_mtrr_trim) > return 0; > rdmsr(MTRRdefType_MSR, def, dummy); Okay... is_cpu() here is defined as: #define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd) ... so an MTRR interface has been identified. Therefore testing cpu_has_mtrr is redundant. As far as use_intel() versus is_cpu(INTEL), it looks to me as though the two are identical in the current code -- mtrr_if->vendor is never set in the generic code, and so defaults to 0 - meaning X86_VENDOR_INTEL. All in all, it looks like the vendor ID stuff is a bad case of "works by accident" in the MTRR code, however, *given the current code* I conclude that is_cpu(INTEL) == use_intel() and that neither can be true without MTRRs enabled. -hpa ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2008-11-28 8:35 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-26 23:11 oops/warning report for the week of November 26, 2008 Arjan van de Ven 2008-11-27 0:05 ` Jesse Barnes 2008-11-27 11:48 ` Ingo Molnar 2008-11-27 19:42 ` Alex Chiang 2008-11-27 19:49 ` Arjan van de Ven 2008-11-27 11:52 ` Ingo Molnar 2008-11-27 17:02 ` Jesse Barnes 2008-11-27 18:01 ` Arjan van de Ven 2008-11-27 20:18 ` Ingo Molnar 2008-11-27 20:28 ` Arjan van de Ven 2008-11-27 20:47 ` Ingo Molnar 2008-11-27 20:53 ` Arjan van de Ven 2008-11-28 8:34 ` Ingo Molnar 2008-11-27 21:18 ` H. Peter Anvin 2008-11-27 21:18 ` Yinghai Lu 2008-11-27 21:42 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox