* Re: [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add [not found] <bug-25042-27@https.bugzilla.kernel.org/> @ 2011-01-04 21:51 ` Andrew Morton 2011-01-04 22:32 ` Linus Torvalds 2011-01-06 2:12 ` KAMEZAWA Hiroyuki 0 siblings, 2 replies; 5+ messages in thread From: Andrew Morton @ 2011-01-04 21:51 UTC (permalink / raw) To: linux-mm, linux-acpi; +Cc: bugzilla-daemon, Linus Torvalds, petr, akataria (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). I'm not sure who to blame here so I'll just spray it at everyone I've ever met ;) On Thu, 16 Dec 2010 23:00:12 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=25042 > > Summary: RAM buffer I/O resource badly interacts with memory > hot-add > Product: Memory Management > Version: 2.5 > Kernel Version: 2.6.35 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: akpm@linux-foundation.org > ReportedBy: petr@vandrovec.name > CC: akataria@vmware.com > Regression: Yes > > > Created an attachment (id=40502) > --> (https://bugzilla.kernel.org/attachment.cgi?id=40502) > /proc/iomem after issuing hot-add, one from 3076 to 3200, other from 3200 to > 3456MB > > Linus's commit 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 in May 2009 added code > to create 'RAM buffer' above top of RAM to ensure that I/O resources do not > start immediately after RAM, but sometime later. Originally it was enforcing > 32MB alignment, now it enforces 64MB. Which means that in VMs with memory size > which is not multiple of 64MB there will be additional 'RAM buffer' resource > present: > > 100000000-1003fffff : System RAM > 100400000-103ffffff : RAM buffer > > When we try to hot-add memory, kernel complains that there was resource > conflict with this fake 'RAM buffer' and hot-added memory is not recognized: > > [ 115.324952] Hotplug Mem Device > [ 115.325549] System RAM resource 100400000 - 10fffffff cannot be added > [ 115.325553] ACPI:memory_hp:add_memory failed > [ 115.326519] ACPI:memory_hp:Error in acpi_memory_enable_device > [ 115.327183] acpi_memhotplug: probe of PNP0C80:00 failed with error -22 > [ 115.327347] > [ 115.327350] driver data not found > [ 115.328808] ACPI:memory_hp:Cannot find driver data > > For now we've modified hotplug code to split hot-added request into smaller > ranges, so only first <= 252MB are unusable, rather than whole xxxGB chunk, but > if 'RAM buffer' could be made dependent on memory hot-plug not available on the > platform, it would be much better. > > Another approach is resurrecting > http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-07/msg06501.html and using > this range instead of all "unclaimed" ranges for placing I/O devices. Then > "RAM buffer" would not be necessary at all. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add 2011-01-04 21:51 ` [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add Andrew Morton @ 2011-01-04 22:32 ` Linus Torvalds 2011-01-04 23:55 ` Petr Vandrovec 2011-01-06 2:12 ` KAMEZAWA Hiroyuki 1 sibling, 1 reply; 5+ messages in thread From: Linus Torvalds @ 2011-01-04 22:32 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-acpi, bugzilla-daemon, petr, akataria, Bjorn Helgaas On Tue, Jan 4, 2011 at 1:51 PM, Andrew Morton <akpm@linux-foundation.org> wrote: >> Linus's commit 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 in May 2009 added code >> to create 'RAM buffer' above top of RAM to ensure that I/O resources do not >> start immediately after RAM, but sometime later. Originally it was enforcing >> 32MB alignment, now it enforces 64MB. Which means that in VMs with memory size >> which is not multiple of 64MB there will be additional 'RAM buffer' resource >> present: >> >> 100000000-1003fffff : System RAM >> 100400000-103ffffff : RAM buffer I'd suggest just working around it by hotplugging in 64MB chunks. IOW, the old "it hurts when I do that - don't do that then" solution to the problem. There is no reason why a VM should export some random 8MB-aligned region that I can see. >> Another approach is resurrecting >> http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-07/msg06501.html and using >> this range instead of all "unclaimed" ranges for placing I/O devices. Then >> "RAM buffer" would not be necessary at all. Yeah, not going to happen. There's no point (see above), and it is fundamentally wrong to even think that the firmware tables - ACPI or otherwise - would be so perfect that you can just always trust them. Every time somebody makes the mistake of thinking they can do that (and it happens distressingly often), they are quickly shown to be wrong, and there's some random hardware out there that simply doesn't list the ranges it uses. What could happen these days is to move the "gap" logic from the e820 table (and /proc/iomem) and into the "arch_remove_reservations()" logic. See commit fcb119183c73bf0781009713f303e28b1fb13d3e. That might make memory hotplug happier. That said, I do repeat: why the hell do you keep digging that hole in the first place. Do memory hotplug in 256MB chunks, naturally aligned, and don't bother with any of this crazy crap. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add 2011-01-04 22:32 ` Linus Torvalds @ 2011-01-04 23:55 ` Petr Vandrovec 0 siblings, 0 replies; 5+ messages in thread From: Petr Vandrovec @ 2011-01-04 23:55 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, linux-mm, linux-acpi, bugzilla-daemon, akataria, Bjorn Helgaas On Tue, Jan 4, 2011 at 2:32 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Jan 4, 2011 at 1:51 PM, Andrew Morton <akpm@linux-foundation.org> wrote: >>> Linus's commit 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 in May 2009 added code >>> to create 'RAM buffer' above top of RAM to ensure that I/O resources do not >>> start immediately after RAM, but sometime later. Originally it was enforcing >>> 32MB alignment, now it enforces 64MB. Which means that in VMs with memory size >>> which is not multiple of 64MB there will be additional 'RAM buffer' resource >>> present: >>> >>> 100000000-1003fffff : System RAM >>> 100400000-103ffffff : RAM buffer > > I'd suggest just working around it by hotplugging in 64MB chunks. Unfortunately that does not work - kernels configured for sparsemem hate adding memory in chunks smaller than section size - regions with end aligned to 128MB, and at least 128MB large is requirement for x86-64. If smaller region is added, then either non-existent memory is activated, or nothing happens at all, depending on exact values and kernel versions. So we align end of the hot-added region to 128MB on x86-64, and 1GB on ia32. But we do not align start because there was no need... > IOW, the old "it hurts when I do that - don't do that then" solution > to the problem. There is no reason why a VM should export some random > 8MB-aligned region that I can see. It just adds memory where it ended - power-on memory ended at 0x1003ffff, and so it now platform naturally tries to continue where it left off - from 0x10040000 to 0x10ffffff. It has no idea that OS inside has some special requirements, and OS inside unfortunately does not support _PRS/_SRS on memory devices either, so we cannot offer possible choices hoping that guest will pick one it likes more than default placement/size. > That said, I do repeat: why the hell do you keep digging that hole in > the first place. Do memory hotplug in 256MB chunks, naturally aligned, > and don't bother with any of this crazy crap. So that we can provide contiguous memory area to the VM, and layout of VM created with some amount of memory is same as VM which was hot-added to the required size - that's important for supporting hibernate, and it is easier to implement than discontiguous ranges. I've modified code so that we hot-add two regions, first to align memory size to 256MB (that one is not activated successfully if memory size is not multiple of 64MB, but we cannot do smaller due to sparsemem restrictions listed above), and add remaining (if more than 256MB is added) from there. That makes workaround similar to clash between OPROM base addresses assigned by kernel and ranges reserved in SRAT for memory hot-add... Petr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add 2011-01-04 21:51 ` [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add Andrew Morton 2011-01-04 22:32 ` Linus Torvalds @ 2011-01-06 2:12 ` KAMEZAWA Hiroyuki 2011-01-06 2:20 ` Linus Torvalds 1 sibling, 1 reply; 5+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-01-06 2:12 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-acpi, bugzilla-daemon, Linus Torvalds, petr, akataria On Tue, 4 Jan 2011 13:51:48 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > I'm not sure who to blame here so I'll just spray it at everyone I've > ever met ;) > > On Thu, 16 Dec 2010 23:00:12 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=25042 > > > > Summary: RAM buffer I/O resource badly interacts with memory > > hot-add > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 2.6.35 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > AssignedTo: akpm@linux-foundation.org > > ReportedBy: petr@vandrovec.name > > CC: akataria@vmware.com > > Regression: Yes > > > > > > Created an attachment (id=40502) > > --> (https://bugzilla.kernel.org/attachment.cgi?id=40502) > > /proc/iomem after issuing hot-add, one from 3076 to 3200, other from 3200 to > > 3456MB > > > > Linus's commit 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 in May 2009 added code > > to create 'RAM buffer' above top of RAM to ensure that I/O resources do not > > start immediately after RAM, but sometime later. Originally it was enforcing > > 32MB alignment, now it enforces 64MB. Which means that in VMs with memory size > > which is not multiple of 64MB there will be additional 'RAM buffer' resource > > present: > > > > 100000000-1003fffff : System RAM > > 100400000-103ffffff : RAM buffer > > > > When we try to hot-add memory, kernel complains that there was resource > > conflict with this fake 'RAM buffer' and hot-added memory is not recognized: > > > > [ 115.324952] Hotplug Mem Device > > [ 115.325549] System RAM resource 100400000 - 10fffffff cannot be added > > [ 115.325553] ACPI:memory_hp:add_memory failed > > [ 115.326519] ACPI:memory_hp:Error in acpi_memory_enable_device > > [ 115.327183] acpi_memhotplug: probe of PNP0C80:00 failed with error -22 > > [ 115.327347] > > [ 115.327350] driver data not found > > [ 115.328808] ACPI:memory_hp:Cannot find driver data > > > > For now we've modified hotplug code to split hot-added request into smaller > > ranges, so only first <= 252MB are unusable, rather than whole xxxGB chunk, but > > if 'RAM buffer' could be made dependent on memory hot-plug not available on the > > platform, it would be much better. > > Hmm ? Why do you need to place "hot-added" memory's address range next to System RAM ? Sparsemem allows sparse memory layout. Is it very difficult ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add 2011-01-06 2:12 ` KAMEZAWA Hiroyuki @ 2011-01-06 2:20 ` Linus Torvalds 0 siblings, 0 replies; 5+ messages in thread From: Linus Torvalds @ 2011-01-06 2:20 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, linux-mm, linux-acpi, bugzilla-daemon, petr, akataria On Wed, Jan 5, 2011 at 6:12 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > Hmm ? Why do you need to place "hot-added" memory's address range next to System > RAM ? Sparsemem allows sparse memory layout. Well, even without sparsemem, why couldn't the initial memory image just be more nicely aligned to 256MB or something? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-01-06 2:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <bug-25042-27@https.bugzilla.kernel.org/> 2011-01-04 21:51 ` [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add Andrew Morton 2011-01-04 22:32 ` Linus Torvalds 2011-01-04 23:55 ` Petr Vandrovec 2011-01-06 2:12 ` KAMEZAWA Hiroyuki 2011-01-06 2:20 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).