From mboxrd@z Thu Jan 1 00:00:00 1970 From: Russell King Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default Date: Sun, 7 Mar 2010 09:16:21 +0000 Message-ID: <20100307091621.GA5761@flint.arm.linux.org.uk> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> <20100307010327.GD15725@brick.ozlabs.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20100307010327.GD15725@brick.ozlabs.ibm.com> Sender: owner-linux-mm@kvack.org To: Paul Mackerras Cc: Andrew Morton , Yinghai Lu , Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org List-Id: linux-arch.vger.kernel.org On Sun, Mar 07, 2010 at 12:03:27PM +1100, Paul Mackerras wrote: > On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote: > > Earlier, Johannes wrote > > > > : Humm, now that is a bit disappointing. Because it means we will never > > : get rid of bootmem as long as it works for the other architectures. > > : And your changeset just added ~900 lines of code, some of it being a > > : rather ugly compatibility layer in bootmem that I hoped could go away > > : again sooner than later. > > Whoa! Who's proposing to get rid of bootmem, and why? It would be nice if this stuff was copied to linux-arch since it impacts all architectures. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Mackerras Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default Date: Sun, 7 Mar 2010 12:03:27 +1100 Message-ID: <20100307010327.GD15725@brick.ozlabs.ibm.com> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20100306162234.e2cc84fb.akpm@linux-foundation.org> Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: Yinghai Lu , Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org List-Id: linux-arch.vger.kernel.org On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote: > Earlier, Johannes wrote > > : Humm, now that is a bit disappointing. Because it means we will never > : get rid of bootmem as long as it works for the other architectures. > : And your changeset just added ~900 lines of code, some of it being a > : rather ugly compatibility layer in bootmem that I hoped could go away > : again sooner than later. Whoa! Who's proposing to get rid of bootmem, and why? Paul. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Rothwell Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default Date: Sun, 7 Mar 2010 12:48:26 +1100 Message-ID: <20100307124826.6c70a779.sfr@canb.auug.org.au> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> <20100307010327.GD15725@brick.ozlabs.ibm.com> Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="PGP-SHA1"; boundary="Signature=_Sun__7_Mar_2010_12_48_26_+1100_OZJqAAANxhscbimK" Return-path: Received: from chilli.pcug.org.au ([203.10.76.44]:53358 "EHLO smtps.tip.net.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752552Ab0CGBsg (ORCPT ); Sat, 6 Mar 2010 20:48:36 -0500 In-Reply-To: <20100307010327.GD15725@brick.ozlabs.ibm.com> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Paul Mackerras Cc: Andrew Morton , Yinghai Lu , Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org --Signature=_Sun__7_Mar_2010_12_48_26_+1100_OZJqAAANxhscbimK Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Paul, On Sun, 7 Mar 2010 12:03:27 +1100 Paul Mackerras wrote: > > On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote: > > Earlier, Johannes wrote > >=20 > > : Humm, now that is a bit disappointing. Because it means we will never > > : get rid of bootmem as long as it works for the other architectures.=20 > > : And your changeset just added ~900 lines of code, some of it being a > > : rather ugly compatibility layer in bootmem that I hoped could go away > > : again sooner than later. >=20 > Whoa! Who's proposing to get rid of bootmem, and why? I assume that is the point of the "early_res" work already in Linus' tree starting from commit 27811d8cabe56e0c3622251b049086f49face4ff ("x86: Move range related operation to one file"). --=20 Cheers, Stephen Rothwell sfr@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ --Signature=_Sun__7_Mar_2010_12_48_26_+1100_OZJqAAANxhscbimK Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAkuTBeoACgkQjjKRsyhoI8xhpACcDObt+kcXskN18effWjz/qp07 NnkAoJp9fYyg3fxv9ru6/yJg5649Y6UK =2PoU -----END PGP SIGNATURE----- --Signature=_Sun__7_Mar_2010_12_48_26_+1100_OZJqAAANxhscbimK-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ozlabs.org ([203.10.76.45]:58829 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753449Ab0CGBDe (ORCPT ); Sat, 6 Mar 2010 20:03:34 -0500 Date: Sun, 7 Mar 2010 12:03:27 +1100 From: Paul Mackerras Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default Message-ID: <20100307010327.GD15725@brick.ozlabs.ibm.com> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100306162234.e2cc84fb.akpm@linux-foundation.org> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Andrew Morton Cc: Yinghai Lu , Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Message-ID: <20100307010327.qhasKygY6xAkxLDvsTnoKaZNR124_wzQe-ChZ_gRJAk@z> On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote: > Earlier, Johannes wrote > > : Humm, now that is a bit disappointing. Because it means we will never > : get rid of bootmem as long as it works for the other architectures. > : And your changeset just added ~900 lines of code, some of it being a > : rather ugly compatibility layer in bootmem that I hoped could go away > : again sooner than later. Whoa! Who's proposing to get rid of bootmem, and why? Paul. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from caramon.arm.linux.org.uk ([78.32.30.218]:58919 "EHLO caramon.arm.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751209Ab0CGJRx (ORCPT ); Sun, 7 Mar 2010 04:17:53 -0500 Date: Sun, 7 Mar 2010 09:16:21 +0000 From: Russell King Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default Message-ID: <20100307091621.GA5761@flint.arm.linux.org.uk> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> <20100307010327.GD15725@brick.ozlabs.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100307010327.GD15725@brick.ozlabs.ibm.com> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Paul Mackerras Cc: Andrew Morton , Yinghai Lu , Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Message-ID: <20100307091621.0y6bf_r5DesnawTjgBnI1LXZkBlz0Rfzka395pZGeds@z> On Sun, Mar 07, 2010 at 12:03:27PM +1100, Paul Mackerras wrote: > On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote: > > Earlier, Johannes wrote > > > > : Humm, now that is a bit disappointing. Because it means we will never > > : get rid of bootmem as long as it works for the other architectures. > > : And your changeset just added ~900 lines of code, some of it being a > > : rather ugly compatibility layer in bootmem that I hoped could go away > > : again sooner than later. > > Whoa! Who's proposing to get rid of bootmem, and why? It would be nice if this stuff was copied to linux-arch since it impacts all architectures. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753854Ab0CEJFs (ORCPT ); Fri, 5 Mar 2010 04:05:48 -0500 Received: from hera.kernel.org ([140.211.167.34]:60690 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751691Ab0CEJFn (ORCPT ); Fri, 5 Mar 2010 04:05:43 -0500 Message-ID: <4B90C921.6060908@kernel.org> Date: Fri, 05 Mar 2010 01:04:33 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Johannes Weiner , Jiri Slaby CC: Greg Thelen , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Andrew Morton Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> In-Reply-To: <20100305032106.GA12065@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/04/2010 07:21 PM, Johannes Weiner wrote: > Hello Greg, > > On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >> On several systems I am seeing a boot panic if I use mmotm >> (stamp-2010-03-02-18-38). If I remove >> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >> find that: >> * 2.6.33 boots fine. >> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >> * 2.6.33 + mmotm (including >> bootmem-avoid-dma32-zone-by-default.patch): panics. >> Note: I had to enable earlyprintk to see the panic. Without >> earlyprintk no console output was seen. The system appeared to hang >> after the loader. > > where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate > the mem_section descriptor with bootmem. If this would fail, the box > would panic immediately earlier, but NO_BOOTMEM does not seem to get it > right. > > Greg, could you retry _with_ my bootmem patch applied, but with setting > CONFIG_NO_BOOTMEM=n up front? > > I think NO_BOOTMEM has several problems. Yinghai, can you verify them? ... > > 1. It does not seem to handle goal appropriately: bootmem would try > without the goal if it does not make sense. And in this case, the > goal is 4G (above DMA32) and the amount of memory is 256M. > > And if I did not miss something, this is the difference with my patch: > without it, the default goal is 16M, which is no problem as it is well > within your available memory. But the change of the default goal moved > it outside it which the bootmem replacement can not handle. > > 2. The early reservation stuff seems to return NULL but callsites assume > that the bootmem interface never does that. Okay, the result is the same, > we crash. But it still moves error reporting to a possibly much later > point where somebody actually dereferences the returned pointer. under CONFIG_NO_BOOTMEM for alloc_bootmem_node it will honor goal, if someone input big goal it will not fallback to get a small one below that goal. return NULL, could make caller have more choice and more control. anyway we should honor the goal, otherwise should use _nopanic instead. according to context http://patchwork.kernel.org/patch/73893/ Jiri, please check current linus tree still have problem about mem_map is using that much low mem? on my 1024g system first node has 128G ram, [2g, 4g) are mmio range. with NO_BOOTMEM [ 0.000000] a - 11 [ 0.000000] 19 40 - 80 95 [ 0.000000] 702 740 - 1000 1000 [ 0.000000] 331f 3340 - 3400 3400 [ 0.000000] 35dd - 3600 [ 0.000000] 37dd - 3800 [ 0.000000] 39dd - 3a00 [ 0.000000] 3bdd - 3c00 [ 0.000000] 3ddd - 3e00 [ 0.000000] 3fdd - 4000 [ 0.000000] 41dd - 4200 [ 0.000000] 43dd - 4400 [ 0.000000] 45dd - 4600 [ 0.000000] 47dd - 4800 [ 0.000000] 49dd - 4a00 [ 0.000000] 4bdd - 4c00 [ 0.000000] 4ddd - 4e00 [ 0.000000] 4fdd - 5000 [ 0.000000] 51dd - 5200 [ 0.000000] 93dd 9400 - 7d500 7d53b [ 0.000000] 7f730 - 7f750 [ 0.000000] 100012 100040 - 100200 100200 [ 0.000000] 170200 170200 - 2080000 2080000 [ 0.000000] 2080065 2080080 - 2080200 2080200 so PFN: 9400 - 7d500 are free. without NO_BOOTMEM [ 0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1 [ 0.000000] free [0x000000000a - 0x0000000095] [ 0.000000] free [0x0000000702 - 0x0000001000] [ 0.000000] free [0x00000032c4 - 0x0000003400] [ 0.000000] free [0x00000035de - 0x0000003600] [ 0.000000] free [0x00000037dd - 0x0000003800] [ 0.000000] free [0x00000039dd - 0x0000003a00] [ 0.000000] free [0x0000003bdd - 0x0000003c00] [ 0.000000] free [0x0000003ddd - 0x0000003e00] [ 0.000000] free [0x0000003fdd - 0x0000004000] [ 0.000000] free [0x00000041dd - 0x0000004200] [ 0.000000] free [0x00000043dd - 0x0000004400] [ 0.000000] free [0x00000045dd - 0x0000004600] [ 0.000000] free [0x00000047dd - 0x0000004800] [ 0.000000] free [0x00000049dd - 0x0000004a00] [ 0.000000] free [0x0000004bdd - 0x0000004c00] [ 0.000000] free [0x0000004ddd - 0x0000004e00] [ 0.000000] free [0x0000004fdd - 0x0000005000] [ 0.000000] free [0x00000051dd - 0x0000005200] [ 0.000000] free [0x00000053dd - 0x000007d53b] [ 0.000000] free [0x000007f730 - 0x000007f750] [ 0.000000] free [0x000010041f - 0x0000100a00] [ 0.000000] free [0x0000170a00 - 0x0000180a00] [ 0.000000] free [0x0000180a03 - 0x0002080000] so pfn: 53dd 7d53b are free looks like we don't need to change the default goal in alloc_bootmem_node. YH From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755103Ab0CEK0L (ORCPT ); Fri, 5 Mar 2010 05:26:11 -0500 Received: from fg-out-1718.google.com ([72.14.220.152]:32052 "EHLO fg-out-1718.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754881Ab0CEK0I (ORCPT ); Fri, 5 Mar 2010 05:26:08 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=GoDjtaK5dhtylJ8ygt2DvcRJZC4FHzgkU7neYHtjBjTadZy3V7NCRF+GJuCXjEqof5 wllXYQOrLqx0eqwhjvGXSPKACS1ZjX1S0K8v+LXZ5Tv0nJDoo9Pzx9hPCUK8BR0GXWzX bWdQsfGrxOT5+8Sb+BvafOW6c/l5ReR46wdE8= Message-ID: <4B90DC3C.1060000@gmail.com> Date: Fri, 05 Mar 2010 11:26:04 +0100 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.1.7) Gecko/20100111 SUSE/3.0.1-11.2 Thunderbird/3.0.1 MIME-Version: 1.0 To: Yinghai Lu CC: Johannes Weiner , Greg Thelen , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Andrew Morton Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> In-Reply-To: <4B90C921.6060908@kernel.org> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/05/2010 10:04 AM, Yinghai Lu wrote: > according to context > http://patchwork.kernel.org/patch/73893/ > > Jiri, > please check current linus tree still have problem about mem_map is using that much low mem? Hi! Sorry, I don't have direct access to the machine. I might try to ask the owners to do so. > on my 1024g system first node has 128G ram, [2g, 4g) are mmio range. So where gets your mem_map allocated (I suppose you're running flat model)? Note that the failure we were seeing was with different amount of memory on different machines. Obviously because of different e820 reservations and driver requirements at boot time. So the required memory to trigger the error oscillated around 128G, sometimes being 130G. It triggered when mem_map fit exactly into 0-2G (and 2-4G was reserved) and no more space was there. If RAM was more than 130G, mem_map was above 4G boundary implicitly, so that there was enough space in the first 4G of memory for others with specific bootmem limitations. > with NO_BOOTMEM > [ 0.000000] a - 11 > [ 0.000000] 19 40 - 80 95 > [ 0.000000] 702 740 - 1000 1000 > [ 0.000000] 331f 3340 - 3400 3400 > [ 0.000000] 35dd - 3600 > [ 0.000000] 37dd - 3800 > [ 0.000000] 39dd - 3a00 > [ 0.000000] 3bdd - 3c00 > [ 0.000000] 3ddd - 3e00 > [ 0.000000] 3fdd - 4000 > [ 0.000000] 41dd - 4200 > [ 0.000000] 43dd - 4400 > [ 0.000000] 45dd - 4600 > [ 0.000000] 47dd - 4800 > [ 0.000000] 49dd - 4a00 > [ 0.000000] 4bdd - 4c00 > [ 0.000000] 4ddd - 4e00 > [ 0.000000] 4fdd - 5000 > [ 0.000000] 51dd - 5200 > [ 0.000000] 93dd 9400 - 7d500 7d53b > [ 0.000000] 7f730 - 7f750 > [ 0.000000] 100012 100040 - 100200 100200 > [ 0.000000] 170200 170200 - 2080000 2080000 > [ 0.000000] 2080065 2080080 - 2080200 2080200 > > so PFN: 9400 - 7d500 are free. Could you explain more the dmesg output? > without NO_BOOTMEM > [ 0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1 > [ 0.000000] free [0x000000000a - 0x0000000095] > [ 0.000000] free [0x0000000702 - 0x0000001000] > [ 0.000000] free [0x00000032c4 - 0x0000003400] > [ 0.000000] free [0x00000035de - 0x0000003600] > [ 0.000000] free [0x00000037dd - 0x0000003800] > [ 0.000000] free [0x00000039dd - 0x0000003a00] > [ 0.000000] free [0x0000003bdd - 0x0000003c00] > [ 0.000000] free [0x0000003ddd - 0x0000003e00] > [ 0.000000] free [0x0000003fdd - 0x0000004000] > [ 0.000000] free [0x00000041dd - 0x0000004200] > [ 0.000000] free [0x00000043dd - 0x0000004400] > [ 0.000000] free [0x00000045dd - 0x0000004600] > [ 0.000000] free [0x00000047dd - 0x0000004800] > [ 0.000000] free [0x00000049dd - 0x0000004a00] > [ 0.000000] free [0x0000004bdd - 0x0000004c00] > [ 0.000000] free [0x0000004ddd - 0x0000004e00] > [ 0.000000] free [0x0000004fdd - 0x0000005000] > [ 0.000000] free [0x00000051dd - 0x0000005200] > [ 0.000000] free [0x00000053dd - 0x000007d53b] > [ 0.000000] free [0x000007f730 - 0x000007f750] > [ 0.000000] free [0x000010041f - 0x0000100a00] > [ 0.000000] free [0x0000170a00 - 0x0000180a00] > [ 0.000000] free [0x0000180a03 - 0x0002080000] > so pfn: 53dd 7d53b are free > > looks like we don't need to change the default goal in alloc_bootmem_node. thanks, -- js From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751311Ab0CENIo (ORCPT ); Fri, 5 Mar 2010 08:08:44 -0500 Received: from f0.cmpxchg.org ([85.214.51.133]:50066 "EHLO cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750745Ab0CENIn (ORCPT ); Fri, 5 Mar 2010 08:08:43 -0500 Date: Fri, 5 Mar 2010 14:08:34 +0100 From: Johannes Weiner To: Yinghai Lu Cc: Jiri Slaby , Greg Thelen , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Andrew Morton Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Message-ID: <20100305130834.GB13726@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B90C921.6060908@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 05, 2010 at 01:04:33AM -0800, Yinghai Lu wrote: > On 03/04/2010 07:21 PM, Johannes Weiner wrote: > > Hello Greg, > > > > On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: > >> On several systems I am seeing a boot panic if I use mmotm > >> (stamp-2010-03-02-18-38). If I remove > >> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I > >> find that: > >> * 2.6.33 boots fine. > >> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. > >> * 2.6.33 + mmotm (including > >> bootmem-avoid-dma32-zone-by-default.patch): panics. > >> Note: I had to enable earlyprintk to see the panic. Without > >> earlyprintk no console output was seen. The system appeared to hang > >> after the loader. > > > > where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate > > the mem_section descriptor with bootmem. If this would fail, the box > > would panic immediately earlier, but NO_BOOTMEM does not seem to get it > > right. > > > > Greg, could you retry _with_ my bootmem patch applied, but with setting > > CONFIG_NO_BOOTMEM=n up front? > > > > I think NO_BOOTMEM has several problems. Yinghai, can you verify them? > ... > > > > 1. It does not seem to handle goal appropriately: bootmem would try > > without the goal if it does not make sense. And in this case, the > > goal is 4G (above DMA32) and the amount of memory is 256M. > > > > And if I did not miss something, this is the difference with my patch: > > without it, the default goal is 16M, which is no problem as it is well > > within your available memory. But the change of the default goal moved > > it outside it which the bootmem replacement can not handle. > > > > 2. The early reservation stuff seems to return NULL but callsites assume > > that the bootmem interface never does that. Okay, the result is the same, > > we crash. But it still moves error reporting to a possibly much later > > point where somebody actually dereferences the returned pointer. > > under CONFIG_NO_BOOTMEM > for alloc_bootmem_node it will honor goal, if someone input big goal it will not > fallback to get a small one below that goal. Yes, that's the problem. > return NULL, could make caller have more choice and more control. Most callers do not need it as there is no real way to handle allocation failures at this point of time in the boot process. For everything else, there is the _nopanic API. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755188Ab0CESn0 (ORCPT ); Fri, 5 Mar 2010 13:43:26 -0500 Received: from hera.kernel.org ([140.211.167.34]:35687 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754151Ab0CESnZ (ORCPT ); Fri, 5 Mar 2010 13:43:25 -0500 Message-ID: <4B915074.4020704@kernel.org> Date: Fri, 05 Mar 2010 10:41:56 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar CC: Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> In-Reply-To: <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/04/2010 09:17 PM, Greg Thelen wrote: > On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>> On several systems I am seeing a boot panic if I use mmotm >>> (stamp-2010-03-02-18-38). If I remove >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >>> find that: >>> * 2.6.33 boots fine. >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >>> * 2.6.33 + mmotm (including >>> bootmem-avoid-dma32-zone-by-default.patch): panics. ... > > Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I > re-tested with 'make defconfig' to confirm the panic with this later > mmotm. please check [PATCH] early_res: double check with updated goal in alloc_memory_core_early Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node change the behavoir about goal. original bootmem one will try go further regardless of goal. and it will break his patch about default goal from MAX_DMA to MAX_DMA32... also broke uncommon machines with <=16M of memory. (really? our x86 kernel still can run on 16M system?) so try again with update goal. Reported-by: Greg Thelen Signed-off-by: Yinghai Lu --- mm/bootmem.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) Index: linux-2.6/mm/bootmem.c =================================================================== --- linux-2.6.orig/mm/bootmem.c +++ linux-2.6/mm/bootmem.c @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l } #ifdef CONFIG_NO_BOOTMEM +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size, + u64 align, u64 goal, u64 limit) +{ + void *ptr; + unsigned long end_pfn; + + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, + goal, limit); + if (ptr) + return ptr; + + /* check goal according */ + end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; + if ((end_pfn << PAGE_SHIFT) < (goal + size)) { + goal = pgdat->node_start_pfn << PAGE_SHIFT; + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, + goal, limit); + } + + return ptr; +} + static void __init __free_pages_memory(unsigned long start, unsigned long end) { int i; @@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); #ifdef CONFIG_NO_BOOTMEM - return __alloc_memory_core_early(pgdat->node_id, size, align, + return ___alloc_memory_core_early(pgdat, size, align, goal, -1ULL); #else return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0); @@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); #ifdef CONFIG_NO_BOOTMEM - ptr = __alloc_memory_core_early(pgdat->node_id, size, align, + ptr = ___alloc_memory_core_early(pgdat, size, align, goal, -1ULL); #else ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0); @@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); #ifdef CONFIG_NO_BOOTMEM - return __alloc_memory_core_early(pgdat->node_id, size, align, + return ___alloc_memory_core_early(pgdat, size, align, goal, ARCH_LOW_ADDRESS_LIMIT); #else return ___alloc_bootmem_node(pgdat->bdata, size, align, From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755633Ab0CETKJ (ORCPT ); Fri, 5 Mar 2010 14:10:09 -0500 Received: from smtp-out.google.com ([216.239.33.17]:55304 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755473Ab0CETKH convert rfc822-to-8bit (ORCPT ); Fri, 5 Mar 2010 14:10:07 -0500 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:from:date:message-id: subject:to:cc:content-type:content-transfer-encoding:x-system-of-record; b=iMrEMzFJ5dRXb03Z/uTDmnHPzJTx1eJtxpEL9eqafZH/4Ei2lTQTso7O4XAlPIu+Y 76XahHGkn2y6m0GNJtoAA== MIME-Version: 1.0 In-Reply-To: <4B915074.4020704@kernel.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> From: Greg Thelen Date: Fri, 5 Mar 2010 11:09:41 -0800 Message-ID: <49b004811003051109t3215f86dy280a6317bdab9b15@mail.gmail.com> Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch To: Yinghai Lu Cc: Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 5, 2010 at 10:41 AM, Yinghai Lu wrote: > On 03/04/2010 09:17 PM, Greg Thelen wrote: >> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: >>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>>> On several systems I am seeing a boot panic if I use mmotm >>>> (stamp-2010-03-02-18-38).  If I remove >>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen.  I >>>> find that: >>>> * 2.6.33 boots fine. >>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >>>> * 2.6.33 + mmotm (including >>>> bootmem-avoid-dma32-zone-by-default.patch): panics. > ... >> >> Note: mmotm has been recently updated to stamp-2010-03-04-18-05.  I >> re-tested with 'make defconfig' to confirm the panic with this later >> mmotm. > > please check > > [PATCH] early_res: double check with updated goal in alloc_memory_core_early > > Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node > change the behavoir about goal. > original bootmem one will try go further regardless of goal. > > and it will break his patch about default goal from MAX_DMA to MAX_DMA32... > also broke uncommon machines with <=16M of memory. > (really? our x86 kernel still can run on 16M system?) > > so try again with update goal. > > Reported-by: Greg Thelen > Signed-off-by: Yinghai Lu > > --- >  mm/bootmem.c |   28 +++++++++++++++++++++++++--- >  1 file changed, 25 insertions(+), 3 deletions(-) > > Index: linux-2.6/mm/bootmem.c > =================================================================== > --- linux-2.6.orig/mm/bootmem.c > +++ linux-2.6/mm/bootmem.c > @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l >  } > >  #ifdef CONFIG_NO_BOOTMEM > +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size, > +                                                u64 align, u64 goal, u64 limit) > +{ > +       void *ptr; > +       unsigned long end_pfn; > + > +       ptr = __alloc_memory_core_early(pgdat->node_id, size, align, > +                                        goal, limit); > +       if (ptr) > +               return ptr; > + > +       /* check goal according  */ > +       end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; > +       if ((end_pfn << PAGE_SHIFT) < (goal + size)) { > +               goal = pgdat->node_start_pfn << PAGE_SHIFT; > +               ptr = __alloc_memory_core_early(pgdat->node_id, size, align, > +                                                goal, limit); > +       } > + > +       return ptr; > +} > + >  static void __init __free_pages_memory(unsigned long start, unsigned long end) >  { >        int i; > @@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da >                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); > >  #ifdef CONFIG_NO_BOOTMEM > -       return __alloc_memory_core_early(pgdat->node_id, size, align, > +       return  ___alloc_memory_core_early(pgdat, size, align, >                                         goal, -1ULL); >  #else >        return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0); > @@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan >                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); > >  #ifdef CONFIG_NO_BOOTMEM > -       ptr =  __alloc_memory_core_early(pgdat->node_id, size, align, > +       ptr =  ___alloc_memory_core_early(pgdat, size, align, >                                                 goal, -1ULL); >  #else >        ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0); > @@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p >                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); > >  #ifdef CONFIG_NO_BOOTMEM > -       return __alloc_memory_core_early(pgdat->node_id, size, align, > +       return ___alloc_memory_core_early(pgdat, size, align, >                                goal, ARCH_LOW_ADDRESS_LIMIT); >  #else >        return ___alloc_bootmem_node(pgdat->bdata, size, align, > On my 256MB VM, which detected the problem starting this thread, the "double check with updated goal in alloc_memory_core_early" patch (above) boots without panic. My initial impression is that this fixes the reported problem. Note: I have not tested to see if any other issues are introduced. -- Greg From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753626Ab0CEU2N (ORCPT ); Fri, 5 Mar 2010 15:28:13 -0500 Received: from hera.kernel.org ([140.211.167.34]:37431 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752675Ab0CEU2M (ORCPT ); Fri, 5 Mar 2010 15:28:12 -0500 Message-ID: <4B916922.3010807@kernel.org> Date: Fri, 05 Mar 2010 12:27:14 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Jiri Slaby CC: Johannes Weiner , Greg Thelen , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Andrew Morton Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> <4B90DC3C.1060000@gmail.com> In-Reply-To: <4B90DC3C.1060000@gmail.com> Content-Type: multipart/mixed; boundary="------------010205080801090505000906" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------010205080801090505000906 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 03/05/2010 02:26 AM, Jiri Slaby wrote: > On 03/05/2010 10:04 AM, Yinghai Lu wrote: >> according to context >> http://patchwork.kernel.org/patch/73893/ >> >> Jiri, >> please check current linus tree still have problem about mem_map is using that much low mem? > > Hi! > > Sorry, I don't have direct access to the machine. I might try to ask the > owners to do so. > >> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range. > > So where gets your mem_map allocated (I suppose you're running flat model)? > > Note that the failure we were seeing was with different amount of memory > on different machines. Obviously because of different e820 reservations > and driver requirements at boot time. So the required memory to trigger > the error oscillated around 128G, sometimes being 130G. > > It triggered when mem_map fit exactly into 0-2G (and 2-4G was reserved) > and no more space was there. If RAM was more than 130G, mem_map was > above 4G boundary implicitly, so that there was enough space in the > first 4G of memory for others with specific bootmem limitations. > >> with NO_BOOTMEM >> [ 0.000000] a - 11 >> [ 0.000000] 19 40 - 80 95 >> [ 0.000000] 702 740 - 1000 1000 >> [ 0.000000] 331f 3340 - 3400 3400 >> [ 0.000000] 35dd - 3600 >> [ 0.000000] 37dd - 3800 >> [ 0.000000] 39dd - 3a00 >> [ 0.000000] 3bdd - 3c00 >> [ 0.000000] 3ddd - 3e00 >> [ 0.000000] 3fdd - 4000 >> [ 0.000000] 41dd - 4200 >> [ 0.000000] 43dd - 4400 >> [ 0.000000] 45dd - 4600 >> [ 0.000000] 47dd - 4800 >> [ 0.000000] 49dd - 4a00 >> [ 0.000000] 4bdd - 4c00 >> [ 0.000000] 4ddd - 4e00 >> [ 0.000000] 4fdd - 5000 >> [ 0.000000] 51dd - 5200 >> [ 0.000000] 93dd 9400 - 7d500 7d53b >> [ 0.000000] 7f730 - 7f750 >> [ 0.000000] 100012 100040 - 100200 100200 >> [ 0.000000] 170200 170200 - 2080000 2080000 >> [ 0.000000] 2080065 2080080 - 2080200 2080200 >> >> so PFN: 9400 - 7d500 are free. > > Could you explain more the dmesg output? it will list free pfn range that will be use for slab... attached is debug patch for print out without CONFIG_NO_BOOTMEM set. YH --------------010205080801090505000906 Content-Type: text/x-patch; name="print_free_bootmem.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="print_free_bootmem.patch" Subject: [PATCH -v3] x86: print bootmem free before and free_all_bootmem so we could double check if we have enough low pages later -v2: fix errors checkpatch.pl reported -v3: move after pci_iommu_alloc, so could compare it with NO_BOOTMEM Signed-off-by: Yinghai Lu --- arch/x86/mm/init_64.c | 2 + include/linux/bootmem.h | 3 + mm/bootmem.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 96 insertions(+) Index: linux-2.6/mm/bootmem.c =================================================================== --- linux-2.6.orig/mm/bootmem.c +++ linux-2.6/mm/bootmem.c @@ -335,6 +335,97 @@ static void __init __free(bootmem_data_t BUG(); } +static void __init print_all_bootmem_free_core(bootmem_data_t *bdata) +{ + int aligned; + unsigned long *map; + unsigned long start, end, count = 0; + unsigned long free_start = -1UL, free_end = 0; + + if (!bdata->node_bootmem_map) + return; + + start = bdata->node_min_pfn; + end = bdata->node_low_pfn; + + /* + * If the start is aligned to the machines wordsize, we might + * be able to count it in bulks of that order. + */ + aligned = !(start & (BITS_PER_LONG - 1)); + + printk(KERN_DEBUG "nid=%td start=0x%010lx end=0x%010lx aligned=%d\n", + bdata - bootmem_node_data, start, end, aligned); + map = bdata->node_bootmem_map; + + while (start < end) { + unsigned long idx, vec; + + idx = start - bdata->node_min_pfn; + vec = ~map[idx / BITS_PER_LONG]; + + if (aligned && vec == ~0UL && start + BITS_PER_LONG < end) { + if (free_start == -1UL) { + free_start = idx; + free_end = free_start + BITS_PER_LONG; + } else { + if (free_end == idx) { + free_end += BITS_PER_LONG; + } else { + /* there is gap, print old */ + printk(KERN_DEBUG " free [0x%010lx - 0x%010lx]\n", + free_start + bdata->node_min_pfn, + free_end + bdata->node_min_pfn); + free_start = idx; + free_end = idx + BITS_PER_LONG; + } + } + count += BITS_PER_LONG; + } else { + unsigned long off = 0; + + while (vec && off < BITS_PER_LONG) { + if (vec & 1) { + if (free_start == -1UL) { + free_start = idx + off; + free_end = free_start + 1; + } else { + if (free_end == (idx + off)) { + free_end++; + } else { + /* there is gap, print old */ + printk(KERN_DEBUG " free [0x%010lx - 0x%010lx]\n", + free_start + bdata->node_min_pfn, + free_end + bdata->node_min_pfn); + free_start = idx + off; + free_end = free_start + 1; + } + } + count++; + } + vec >>= 1; + off++; + } + } + start += BITS_PER_LONG; + } + + /* last one */ + if (free_start != -1UL) + printk(KERN_DEBUG " free [0x%010lx - 0x%010lx]\n", + free_start + bdata->node_min_pfn, + free_end + bdata->node_min_pfn); + printk(KERN_DEBUG " total free 0x%010lx\n", count); +} + +void __init print_bootmem_free(void) +{ + bootmem_data_t *bdata; + + list_for_each_entry(bdata, &bdata_list, list) + print_all_bootmem_free_core(bdata); +} + static int __init __reserve(bootmem_data_t *bdata, unsigned long sidx, unsigned long eidx, int flags) { Index: linux-2.6/arch/x86/mm/init_64.c =================================================================== --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -679,6 +679,8 @@ void __init mem_init(void) pci_iommu_alloc(); + print_bootmem_free(); + /* clear_bss() already clear the empty_zero_page */ reservedpages = 0; Index: linux-2.6/include/linux/bootmem.h =================================================================== --- linux-2.6.orig/include/linux/bootmem.h +++ linux-2.6/include/linux/bootmem.h @@ -38,6 +38,9 @@ typedef struct bootmem_data { } bootmem_data_t; extern bootmem_data_t bootmem_node_data[]; +void print_bootmem_free(void); +#else +static inline void print_bootmem_free(void) {} #endif extern unsigned long bootmem_bootmap_pages(unsigned long); --------------010205080801090505000906-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754254Ab0CEUjz (ORCPT ); Fri, 5 Mar 2010 15:39:55 -0500 Received: from hera.kernel.org ([140.211.167.34]:56885 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752106Ab0CEUjy (ORCPT ); Fri, 5 Mar 2010 15:39:54 -0500 Message-ID: <4B916BD6.8010701@kernel.org> Date: Fri, 05 Mar 2010 12:38:46 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Andrew Morton CC: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Subject: [PATCH] x86/bootmem: introduce bootmem_default_goal References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> In-Reply-To: <4B915074.4020704@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org if you don't want to drop | bootmem: avoid DMA32 zone by default today mainline tree actually DO NOT need that patch according to print out ... please apply this one too. [PATCH] x86/bootmem: introduce bootmem_default_goal don't punish the 64bit systems with less 4G RAM. they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... Signed-off-by: Yinghai Lu --- arch/x86/kernel/setup.c | 13 +++++++++++++ include/linux/bootmem.h | 3 ++- mm/bootmem.c | 4 ++++ 3 files changed, 19 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -686,6 +686,18 @@ static void __init trim_bios_range(void) sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map); } +#ifdef MAX_DMA32_PFN +static void __init set_bootmem_default_goal(void) +{ + if (max_pfn_mapped < MAX_DMA32_PFN) + bootmem_default_goal = __pa(MAX_DMA_ADDRESS); +} +#else +static void __init set_bootmem_default_goal(void) +{ +} +#endif + /* * Determine if we were loaded by an EFI loader. If so, then we have also been * passed the efi memmap, systab, etc., so we should use these data structures @@ -931,6 +943,7 @@ void __init setup_arch(char **cmdline_p) max_low_pfn = max_pfn; } #endif + set_bootmem_default_goal(); /* * NOTE: On x86-32, only from this point on, fixmaps are ready for use. Index: linux-2.6/include/linux/bootmem.h =================================================================== --- linux-2.6.orig/include/linux/bootmem.h +++ linux-2.6/include/linux/bootmem.h @@ -104,7 +104,8 @@ extern void *__alloc_bootmem_low_node(pg unsigned long goal); #ifdef MAX_DMA32_PFN -#define BOOTMEM_DEFAULT_GOAL (MAX_DMA32_PFN << PAGE_SHIFT) +extern unsigned long bootmem_default_goal; +#define BOOTMEM_DEFAULT_GOAL bootmem_default_goal #else #define BOOTMEM_DEFAULT_GOAL __pa(MAX_DMA_ADDRESS) #endif Index: linux-2.6/mm/bootmem.c =================================================================== --- linux-2.6.orig/mm/bootmem.c +++ linux-2.6/mm/bootmem.c @@ -25,6 +25,10 @@ unsigned long max_low_pfn; unsigned long min_low_pfn; unsigned long max_pfn; +#ifdef MAX_DMA32_PFN +unsigned long bootmem_default_goal = (MAX_DMA32_PFN << PAGE_SHIFT); +#endif + #ifdef CONFIG_CRASH_DUMP /* * If we have booted due to a crash, max_pfn will be a very low value. We need From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755886Ab0CEX6s (ORCPT ); Fri, 5 Mar 2010 18:58:48 -0500 Received: from f0.cmpxchg.org ([85.214.51.133]:39166 "EHLO cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755407Ab0CEX6r (ORCPT ); Fri, 5 Mar 2010 18:58:47 -0500 Date: Sat, 6 Mar 2010 00:58:12 +0100 From: Johannes Weiner To: Yinghai Lu Cc: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Message-ID: <20100305235812.GA15249@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B915074.4020704@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Yinghai, On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote: > On 03/04/2010 09:17 PM, Greg Thelen wrote: > > On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: > >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: > >>> On several systems I am seeing a boot panic if I use mmotm > >>> (stamp-2010-03-02-18-38). If I remove > >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I > >>> find that: > >>> * 2.6.33 boots fine. > >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. > >>> * 2.6.33 + mmotm (including > >>> bootmem-avoid-dma32-zone-by-default.patch): panics. > ... > > > > Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I > > re-tested with 'make defconfig' to confirm the panic with this later > > mmotm. > > please check > > [PATCH] early_res: double check with updated goal in alloc_memory_core_early > > Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node > change the behavoir about goal. > original bootmem one will try go further regardless of goal. > > and it will break his patch about default goal from MAX_DMA to MAX_DMA32... > also broke uncommon machines with <=16M of memory. > (really? our x86 kernel still can run on 16M system?) > > so try again with update goal. Thanks for the patch, it seems to be correct. However, I have a more generic question about it, regarding the future of the early_res allocator. Did you plan on keeping the bootmem API for longer? Because my impression was, emulating it is a temporary measure until all users are gone and bootmem can be finally dropped. But then this would require some sort of handling of 'user does not need DMA[32] memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res allocator as well. I ask this specifically because you move this fix into the bootmem compatibility code while there is not yet a way to tell early_res the same thing, so switching a user that _needs_ to specify this requirement from bootmem to early_res is not yet possible, is it? > Reported-by: Greg Thelen > Signed-off-by: Yinghai Lu > > --- > mm/bootmem.c | 28 +++++++++++++++++++++++++--- > 1 file changed, 25 insertions(+), 3 deletions(-) > > Index: linux-2.6/mm/bootmem.c > =================================================================== > --- linux-2.6.orig/mm/bootmem.c > +++ linux-2.6/mm/bootmem.c > @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l > } > > #ifdef CONFIG_NO_BOOTMEM > +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size, > + u64 align, u64 goal, u64 limit) > +{ > + void *ptr; > + unsigned long end_pfn; > + > + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, > + goal, limit); > + if (ptr) > + return ptr; > + > + /* check goal according */ > + end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; > + if ((end_pfn << PAGE_SHIFT) < (goal + size)) { > + goal = pgdat->node_start_pfn << PAGE_SHIFT; > + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, > + goal, limit); > + } > + > + return ptr; I think it would make sense to move the parameter check before doing the allocation. Then you save the second call. And a second nitpick: naming the inner function __foo and the outer one ___foo seems confusing to me. Could you maybe rename the wrapper? bootmem_compat_alloc_early() or something like that? Thanks, Hannes From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753313Ab0CFBwI (ORCPT ); Fri, 5 Mar 2010 20:52:08 -0500 Received: from hera.kernel.org ([140.211.167.34]:50850 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751721Ab0CFBwG (ORCPT ); Fri, 5 Mar 2010 20:52:06 -0500 Message-ID: <4B91B4EF.5090502@kernel.org> Date: Fri, 05 Mar 2010 17:50:39 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Johannes Weiner CC: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <20100305235812.GA15249@cmpxchg.org> In-Reply-To: <20100305235812.GA15249@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/05/2010 03:58 PM, Johannes Weiner wrote: > Hello Yinghai, > > On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote: >> On 03/04/2010 09:17 PM, Greg Thelen wrote: >>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: >>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>>>> On several systems I am seeing a boot panic if I use mmotm >>>>> (stamp-2010-03-02-18-38). If I remove >>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >>>>> find that: >>>>> * 2.6.33 boots fine. >>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >>>>> * 2.6.33 + mmotm (including >>>>> bootmem-avoid-dma32-zone-by-default.patch): panics. >> ... >>> >>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I >>> re-tested with 'make defconfig' to confirm the panic with this later >>> mmotm. >> >> please check >> >> [PATCH] early_res: double check with updated goal in alloc_memory_core_early >> >> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node >> change the behavoir about goal. >> original bootmem one will try go further regardless of goal. >> >> and it will break his patch about default goal from MAX_DMA to MAX_DMA32... >> also broke uncommon machines with <=16M of memory. >> (really? our x86 kernel still can run on 16M system?) >> >> so try again with update goal. > > Thanks for the patch, it seems to be correct. > > However, I have a more generic question about it, regarding the future of the > early_res allocator. > > Did you plan on keeping the bootmem API for longer? Because my impression was, > emulating it is a temporary measure until all users are gone and bootmem can > be finally dropped. that depends on every arch maintainer. user can compare them on x86 to check if... next step will be make fw_mem_map to generiaized and combine them with lmb. > > But then this would require some sort of handling of 'user does not need DMA[32] > memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res > allocator as well. > > I ask this specifically because you move this fix into the bootmem compatibility > code while there is not yet a way to tell early_res the same thing, so switching > a user that _needs_ to specify this requirement from bootmem to early_res is not > yet possible, is it? just let caller set the goal. > >> Reported-by: Greg Thelen >> Signed-off-by: Yinghai Lu >> >> --- >> mm/bootmem.c | 28 +++++++++++++++++++++++++--- >> 1 file changed, 25 insertions(+), 3 deletions(-) >> >> Index: linux-2.6/mm/bootmem.c >> =================================================================== >> --- linux-2.6.orig/mm/bootmem.c >> +++ linux-2.6/mm/bootmem.c >> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l >> } >> >> #ifdef CONFIG_NO_BOOTMEM >> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size, >> + u64 align, u64 goal, u64 limit) >> +{ >> + void *ptr; >> + unsigned long end_pfn; >> + >> + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, >> + goal, limit); >> + if (ptr) >> + return ptr; >> + >> + /* check goal according */ >> + end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; >> + if ((end_pfn << PAGE_SHIFT) < (goal + size)) { >> + goal = pgdat->node_start_pfn << PAGE_SHIFT; >> + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, >> + goal, limit); >> + } >> + >> + return ptr; > > I think it would make sense to move the parameter check before doing the > allocation. Then you save the second call. I am trying to avoid the second call. please check another patch about "introduce bootmem_default_goal : don't punish 64bit system without 4g ram" > > And a second nitpick: naming the inner function __foo and the outer one ___foo seems > confusing to me. Could you maybe rename the wrapper? bootmem_compat_alloc_early() or > something like that? ok. Thanks Yinghai From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753621Ab0CFCYk (ORCPT ); Fri, 5 Mar 2010 21:24:40 -0500 Received: from f0.cmpxchg.org ([85.214.51.133]:39861 "EHLO cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753462Ab0CFCYi (ORCPT ); Fri, 5 Mar 2010 21:24:38 -0500 Date: Sat, 6 Mar 2010 03:24:15 +0100 From: Johannes Weiner To: Yinghai Lu Cc: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Message-ID: <20100306022415.GB16967@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <20100305235812.GA15249@cmpxchg.org> <4B91B4EF.5090502@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B91B4EF.5090502@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 05, 2010 at 05:50:39PM -0800, Yinghai Lu wrote: > On 03/05/2010 03:58 PM, Johannes Weiner wrote: > > Hello Yinghai, > > > > On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote: > >> On 03/04/2010 09:17 PM, Greg Thelen wrote: > >>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: > >>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: > >>>>> On several systems I am seeing a boot panic if I use mmotm > >>>>> (stamp-2010-03-02-18-38). If I remove > >>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I > >>>>> find that: > >>>>> * 2.6.33 boots fine. > >>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. > >>>>> * 2.6.33 + mmotm (including > >>>>> bootmem-avoid-dma32-zone-by-default.patch): panics. > >> ... > >>> > >>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I > >>> re-tested with 'make defconfig' to confirm the panic with this later > >>> mmotm. > >> > >> please check > >> > >> [PATCH] early_res: double check with updated goal in alloc_memory_core_early > >> > >> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node > >> change the behavoir about goal. > >> original bootmem one will try go further regardless of goal. > >> > >> and it will break his patch about default goal from MAX_DMA to MAX_DMA32... > >> also broke uncommon machines with <=16M of memory. > >> (really? our x86 kernel still can run on 16M system?) > >> > >> so try again with update goal. > > > > Thanks for the patch, it seems to be correct. > > > > However, I have a more generic question about it, regarding the future of the > > early_res allocator. > > > > Did you plan on keeping the bootmem API for longer? Because my impression was, > > emulating it is a temporary measure until all users are gone and bootmem can > > be finally dropped. > > that depends on every arch maintainer. > > user can compare them on x86 to check if... Humm, now that is a bit disappointing. Because it means we will never get rid of bootmem as long as it works for the other architectures. And your changeset just added ~900 lines of code, some of it being a rather ugly compatibility layer in bootmem that I hoped could go away again sooner than later. I do not know what the upsides for x86 are from no longer using bootmem but it would suck from a code maintainance point of view to get stuck half way through this transition and have now TWO implementations of the bootmem interface we would like to get rid of. > next step will be make fw_mem_map to generiaized and combine them with lmb. > > > > > But then this would require some sort of handling of 'user does not need DMA[32] > > memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res > > allocator as well. > > > > I ask this specifically because you move this fix into the bootmem compatibility > > code while there is not yet a way to tell early_res the same thing, so switching > > a user that _needs_ to specify this requirement from bootmem to early_res is not > > yet possible, is it? > > just let caller set the goal. That means that every caller must be aware of where the DMA zone ends and if it is non-empty and open-code the fallback to the DMA zone if the non-DMA zone is exhausted? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753599Ab0CFCdS (ORCPT ); Fri, 5 Mar 2010 21:33:18 -0500 Received: from hera.kernel.org ([140.211.167.34]:37451 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752604Ab0CFCdR (ORCPT ); Fri, 5 Mar 2010 21:33:17 -0500 Message-ID: <4B91BE93.8000401@kernel.org> Date: Fri, 05 Mar 2010 18:31:47 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Johannes Weiner CC: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <20100305235812.GA15249@cmpxchg.org> <4B91B4EF.5090502@kernel.org> <20100306022415.GB16967@cmpxchg.org> In-Reply-To: <20100306022415.GB16967@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/05/2010 06:24 PM, Johannes Weiner wrote: > On Fri, Mar 05, 2010 at 05:50:39PM -0800, Yinghai Lu wrote: >> On 03/05/2010 03:58 PM, Johannes Weiner wrote: >>> Hello Yinghai, >>> >>> On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote: >>>> On 03/04/2010 09:17 PM, Greg Thelen wrote: >>>>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: >>>>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>>>>>> On several systems I am seeing a boot panic if I use mmotm >>>>>>> (stamp-2010-03-02-18-38). If I remove >>>>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >>>>>>> find that: >>>>>>> * 2.6.33 boots fine. >>>>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >>>>>>> * 2.6.33 + mmotm (including >>>>>>> bootmem-avoid-dma32-zone-by-default.patch): panics. >>>> ... >>>>> >>>>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I >>>>> re-tested with 'make defconfig' to confirm the panic with this later >>>>> mmotm. >>>> >>>> please check >>>> >>>> [PATCH] early_res: double check with updated goal in alloc_memory_core_early >>>> >>>> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node >>>> change the behavoir about goal. >>>> original bootmem one will try go further regardless of goal. >>>> >>>> and it will break his patch about default goal from MAX_DMA to MAX_DMA32... >>>> also broke uncommon machines with <=16M of memory. >>>> (really? our x86 kernel still can run on 16M system?) >>>> >>>> so try again with update goal. >>> >>> Thanks for the patch, it seems to be correct. >>> >>> However, I have a more generic question about it, regarding the future of the >>> early_res allocator. >>> >>> Did you plan on keeping the bootmem API for longer? Because my impression was, >>> emulating it is a temporary measure until all users are gone and bootmem can >>> be finally dropped. >> >> that depends on every arch maintainer. >> >> user can compare them on x86 to check if... > > Humm, now that is a bit disappointing. Because it means we will never get rid > of bootmem as long as it works for the other architectures. And your changeset > just added ~900 lines of code, some of it being a rather ugly compatibility > layer in bootmem that I hoped could go away again sooner than later. > > I do not know what the upsides for x86 are from no longer using bootmem but it > would suck from a code maintainance point of view to get stuck half way through > this transition and have now TWO implementations of the bootmem interface we > would like to get rid of. some data, and others can compare them more on x86 systems... I didn't plan to post this data before you said .... for my 1T system nobootmem: text data bss dec hex filename 19185736 4148404 12170736 35504876 21dc2ec vmlinux.nobootmem Memory: 1058662820k/1075838976k available (11388k kernel code, 2106480k absent, 15069676k reserved, 8589k data, 2744k init [ 220.947157] calling ip_auto_config+0x0/0x24d @ 1 bootmem: text data bss dec hex filename 19188441 4153956 12170736 35513133 21de32d vmlinux.bootmem Memory: 1058662796k/1075838976k available (11388k kernel code, 2106480k absent, 15069700k reserved, 8589k data, 2752k init) [ 236.765364] calling ip_auto_config+0x0/0x24d @ 1 YH From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751647Ab0CFFqT (ORCPT ); Sat, 6 Mar 2010 00:46:19 -0500 Received: from hera.kernel.org ([140.211.167.34]:44550 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750904Ab0CFFqR (ORCPT ); Sat, 6 Mar 2010 00:46:17 -0500 Message-ID: <4B91EBC6.6080509@kernel.org> Date: Fri, 05 Mar 2010 21:44:38 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Andrew Morton CC: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Subject: please don't apply : bootmem: avoid DMA32 zone by default References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> In-Reply-To: <4B916BD6.8010701@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/05/2010 12:38 PM, Yinghai Lu wrote: > if you don't want to drop > | bootmem: avoid DMA32 zone by default > > today mainline tree actually DO NOT need that patch according to print out ... > > please apply this one too. > > [PATCH] x86/bootmem: introduce bootmem_default_goal > > don't punish the 64bit systems with less 4G RAM. > they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... andrew, please drop Johannes' patch : bootmem: avoid DMA32 zone by default so you don't need to apply two fix patches from me: [PATCH] early_res: double check with updated goal in alloc_memory_core_early [PATCH] x86/bootmem: introduce bootmem_default_goal move all bootmem to above 4g, make system performance get worse... Thanks Yinghai Lu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753518Ab0CGAXL (ORCPT ); Sat, 6 Mar 2010 19:23:11 -0500 Received: from smtp1.linux-foundation.org ([140.211.169.13]:45766 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752857Ab0CGAXI (ORCPT ); Sat, 6 Mar 2010 19:23:08 -0500 Date: Sat, 6 Mar 2010 16:22:34 -0800 From: Andrew Morton To: Yinghai Lu Cc: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default Message-Id: <20100306162234.e2cc84fb.akpm@linux-foundation.org> In-Reply-To: <4B91EBC6.6080509@kernel.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> X-Mailer: Sylpheed 2.4.8 (GTK+ 2.12.5; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu wrote: > On 03/05/2010 12:38 PM, Yinghai Lu wrote: > > if you don't want to drop > > | bootmem: avoid DMA32 zone by default > > > > today mainline tree actually DO NOT need that patch according to print out ... > > > > please apply this one too. > > > > [PATCH] x86/bootmem: introduce bootmem_default_goal > > > > don't punish the 64bit systems with less 4G RAM. > > they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... > > andrew, > > please drop Johannes' patch : bootmem: avoid DMA32 zone by default I'd rather not. That patch is said to fix a runtime problem which is present in 2.6.33 and hence we planned on backporting it into 2.6.33.x. I don't have a clue what your patches do. Can you tell us? Earlier, Johannes wrote : Humm, now that is a bit disappointing. Because it means we will never : get rid of bootmem as long as it works for the other architectures. : And your changeset just added ~900 lines of code, some of it being a : rather ugly compatibility layer in bootmem that I hoped could go away : again sooner than later. : : I do not know what the upsides for x86 are from no longer using bootmem : but it would suck from a code maintainance point of view to get stuck : half way through this transition and have now TWO implementations of : the bootmem interface we would like to get rid of. Which is a pretty good-sounding argument. Perhaps we should be dropping your patches. What patches _are_ these x86 bootmem changes, anyway? Please identify them so people can take a look and see what they do. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753573Ab0CGAn0 (ORCPT ); Sat, 6 Mar 2010 19:43:26 -0500 Received: from hera.kernel.org ([140.211.167.34]:34587 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752182Ab0CGAnZ (ORCPT ); Sat, 6 Mar 2010 19:43:25 -0500 Message-ID: <4B92F65A.5060305@kernel.org> Date: Sat, 06 Mar 2010 16:42:02 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Andrew Morton CC: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Linus Torvalds Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> In-Reply-To: <20100306162234.e2cc84fb.akpm@linux-foundation.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/06/2010 04:22 PM, Andrew Morton wrote: > On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu wrote: > >> On 03/05/2010 12:38 PM, Yinghai Lu wrote: >>> if you don't want to drop >>> | bootmem: avoid DMA32 zone by default >>> >>> today mainline tree actually DO NOT need that patch according to print out ... >>> >>> please apply this one too. >>> >>> [PATCH] x86/bootmem: introduce bootmem_default_goal >>> >>> don't punish the 64bit systems with less 4G RAM. >>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... >> >> andrew, >> >> please drop Johannes' patch : bootmem: avoid DMA32 zone by default > > I'd rather not. That patch is said to fix a runtime problem which is > present in 2.6.33 and hence we planned on backporting it into 2.6.33.x. that patch make my box booting time from 215s to 265s. should have better way to fix the problem: just put the mem_map or the big chunk on high. instead put everything above 4g. some thing like static void * __init_refok __earlyonly_bootmem_alloc(int node, unsigned long size, unsigned long align, unsigned long goal) { return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal); } void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size, unsigned long align, unsigned long goal) { #ifdef MAX_DMA32_PFN unsigned long end_pfn; if (WARN_ON_ONCE(slab_is_available())) return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); /* update goal according ...MAX_DMA32_PFN */ end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) && (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) { void *ptr; unsigned long new_goal; new_goal = MAX_DMA32_PFN << PAGE_SHIFT; #ifdef CONFIG_NO_BOOTMEM ptr = __alloc_memory_core_early(pgdat->node_id, size, align, new_goal, -1ULL); #else ptr = alloc_bootmem_core(pgdat->bdata, size, align, new_goal, 0); #endif if (ptr) return ptr; } #endif return __alloc_bootmem_node(pgdat, size, align, goal); } > > I don't have a clue what your patches do. Can you tell us? do use bootmem, and use early_res instead. you are on the to list... please check... http://lkml.org/lkml/2010/2/10/39 > > Earlier, Johannes wrote > > : Humm, now that is a bit disappointing. Because it means we will never > : get rid of bootmem as long as it works for the other architectures. > : And your changeset just added ~900 lines of code, some of it being a > : rather ugly compatibility layer in bootmem that I hoped could go away > : again sooner than later. > : > : I do not know what the upsides for x86 are from no longer using bootmem > : but it would suck from a code maintainance point of view to get stuck > : half way through this transition and have now TWO implementations of > : the bootmem interface we would like to get rid of. > > Which is a pretty good-sounding argument. Perhaps we should be > dropping your patches. > > What patches _are_ these x86 bootmem changes, anyway? Please identify > them so people can take a look and see what they do. http://lkml.org/lkml/2010/2/10/39 and you and linus, ingo, hpa, tglx on the To list. Yinghai From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753645Ab0CGAzF (ORCPT ); Sat, 6 Mar 2010 19:55:05 -0500 Received: from hera.kernel.org ([140.211.167.34]:52076 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753372Ab0CGAzD (ORCPT ); Sat, 6 Mar 2010 19:55:03 -0500 Message-ID: <4B92F91A.5040607@kernel.org> Date: Sat, 06 Mar 2010 16:53:46 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Andrew Morton , Jiri Slaby CC: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Linus Torvalds Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> <4B92F65A.5060305@kernel.org> In-Reply-To: <4B92F65A.5060305@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/06/2010 04:42 PM, Yinghai Lu wrote: > On 03/06/2010 04:22 PM, Andrew Morton wrote: >> On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu wrote: >> >>> On 03/05/2010 12:38 PM, Yinghai Lu wrote: >>>> if you don't want to drop >>>> | bootmem: avoid DMA32 zone by default >>>> >>>> today mainline tree actually DO NOT need that patch according to print out ... >>>> >>>> please apply this one too. >>>> >>>> [PATCH] x86/bootmem: introduce bootmem_default_goal >>>> >>>> don't punish the 64bit systems with less 4G RAM. >>>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... >>> >>> andrew, >>> >>> please drop Johannes' patch : bootmem: avoid DMA32 zone by default >> >> I'd rather not. That patch is said to fix a runtime problem which is >> present in 2.6.33 and hence we planned on backporting it into 2.6.33.x. > > that patch make my box booting time from 215s to 265s. > > should have better way to fix the problem: > just put the mem_map or the big chunk on high. > instead put everything above 4g. > > some thing like > static void * __init_refok __earlyonly_bootmem_alloc(int node, > unsigned long size, > unsigned long align, > unsigned long goal) > { > return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal); > } > > void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size, > unsigned long align, unsigned long goal) > { > #ifdef MAX_DMA32_PFN > unsigned long end_pfn; > > if (WARN_ON_ONCE(slab_is_available())) > return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); > > /* update goal according ...MAX_DMA32_PFN */ > end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; > > if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) && > (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) { > void *ptr; > unsigned long new_goal; > > new_goal = MAX_DMA32_PFN << PAGE_SHIFT; > #ifdef CONFIG_NO_BOOTMEM > ptr = __alloc_memory_core_early(pgdat->node_id, size, align, > new_goal, -1ULL); > #else > ptr = alloc_bootmem_core(pgdat->bdata, size, align, > new_goal, 0); > #endif > if (ptr) > return ptr; > } > #endif > > return __alloc_bootmem_node(pgdat, size, align, goal); > > } Jiri, can you send out your bootlog and .config? Yinghai From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753923Ab0CGBSh (ORCPT ); Sat, 6 Mar 2010 20:18:37 -0500 Received: from hera.kernel.org ([140.211.167.34]:41163 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752552Ab0CGBSg (ORCPT ); Sat, 6 Mar 2010 20:18:36 -0500 Message-ID: <4B92FE9E.90600@kernel.org> Date: Sat, 06 Mar 2010 17:17:18 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Jiri Slaby , Andrew Morton CC: Johannes Weiner , Greg Thelen , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , cl@linux-foundation.com Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> <4B90DC3C.1060000@gmail.com> In-Reply-To: <4B90DC3C.1060000@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/05/2010 02:26 AM, Jiri Slaby wrote: > On 03/05/2010 10:04 AM, Yinghai Lu wrote: >> according to context >> http://patchwork.kernel.org/patch/73893/ >> >> Jiri, >> please check current linus tree still have problem about mem_map is using that much low mem? > > Hi! > > Sorry, I don't have direct access to the machine. I might try to ask the > owners to do so. > >> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range. > > So where gets your mem_map allocated (I suppose you're running flat model)? what kernel version? 2.6.27? x86 64bit now only support SPARSEMEM. Yinghai From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754344Ab0CGCQc (ORCPT ); Sat, 6 Mar 2010 21:16:32 -0500 Received: from hera.kernel.org ([140.211.167.34]:57170 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754123Ab0CGCQa (ORCPT ); Sat, 6 Mar 2010 21:16:30 -0500 Message-ID: <4B930C24.3030400@kernel.org> Date: Sat, 06 Mar 2010 18:15:00 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091130 SUSE/3.0.0-1.1.1 Thunderbird/3.0 MIME-Version: 1.0 To: Andrew Morton , Jiri Slaby , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Linus Torvalds , Christoph Lameter CC: Greg Thelen , Johannes Weiner , "linux-kernel@vger.kernel.org" Subject: [PATCH] sparsemem: on no vmemmap path put mem_map on node high too References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> <4B92F65A.5060305@kernel.org> <4B92F91A.5040607@kernel.org> In-Reply-To: <4B92F91A.5040607@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org we need to put mem_map high when virtual memmap is not used. before this patch free mem pfn range on first node: [ 0.000000] 19 - 1f [ 0.000000] 28 40 - 80 95 [ 0.000000] 702 740 - 1000 1000 [ 0.000000] 347c - 347e [ 0.000000] 34e7 3500 - 3b80 3b8b [ 0.000000] 73b8b 73bc0 - 73c00 73c00 [ 0.000000] 73ddd - 73e00 [ 0.000000] 73fdd - 74000 [ 0.000000] 741dd - 74200 [ 0.000000] 743dd - 74400 [ 0.000000] 745dd - 74600 [ 0.000000] 747dd - 74800 [ 0.000000] 749dd - 74a00 [ 0.000000] 74bdd - 74c00 [ 0.000000] 74ddd - 74e00 [ 0.000000] 74fdd - 75000 [ 0.000000] 751dd - 75200 [ 0.000000] 753dd - 75400 [ 0.000000] 755dd - 75600 [ 0.000000] 757dd - 75800 [ 0.000000] 759dd - 75a00 [ 0.000000] 79bdd 79c00 - 7d540 7d550 [ 0.000000] 7f745 - 7f750 [ 0.000000] 10000b 100040 - 2080000 2080000 so only 79c00 - 7d540 are major free block under 4g... after this patch, we will get [ 0.000000] 19 - 1f [ 0.000000] 28 40 - 80 95 [ 0.000000] 702 740 - 1000 1000 [ 0.000000] 347c - 347e [ 0.000000] 34e7 3500 - 3600 3600 [ 0.000000] 37dd - 3800 [ 0.000000] 39dd - 3a00 [ 0.000000] 3bdd - 3c00 [ 0.000000] 3ddd - 3e00 [ 0.000000] 3fdd - 4000 [ 0.000000] 41dd - 4200 [ 0.000000] 43dd - 4400 [ 0.000000] 45dd - 4600 [ 0.000000] 47dd - 4800 [ 0.000000] 49dd - 4a00 [ 0.000000] 4bdd - 4c00 [ 0.000000] 4ddd - 4e00 [ 0.000000] 4fdd - 5000 [ 0.000000] 51dd - 5200 [ 0.000000] 53dd - 5400 [ 0.000000] 95dd 9600 - 7d540 7d550 [ 0.000000] 7f745 - 7f750 [ 0.000000] 17000b 170040 - 2080000 2080000 we will have 9600 - 7d540 for major free block... sparse-vmemmap path already used __alloc_bootmem_node_high() Signed-off-by: Yinghai Lu --- mm/sparse.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) Index: linux-2.6/mm/sparse.c =================================================================== --- linux-2.6.orig/mm/sparse.c +++ linux-2.6/mm/sparse.c @@ -381,13 +381,15 @@ static void __init sparse_early_usemaps_ struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid) { struct page *map; + unsigned long size; map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION); if (map) return map; - map = alloc_bootmem_pages_node(NODE_DATA(nid), - PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION)); + size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION); + map = __alloc_bootmem_node_high(NODE_DATA(nid), size, + PAGE_SIZE, __pa(MAX_DMA_ADDRESS)); return map; } void __init sparse_mem_maps_populate_node(struct page **map_map, @@ -411,7 +413,8 @@ void __init sparse_mem_maps_populate_nod } size = PAGE_ALIGN(size); - map = alloc_bootmem_pages_node(NODE_DATA(nodeid), size * map_count); + map = __alloc_bootmem_node_high(NODE_DATA(nodeid), size * map_count, + PAGE_SIZE, __pa(MAX_DMA_ADDRESS)); if (map) { for (pnum = pnum_begin; pnum < pnum_end; pnum++) { if (!present_section_nr(pnum)) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757119Ab0CKKyJ (ORCPT ); Thu, 11 Mar 2010 05:54:09 -0500 Received: from fg-out-1718.google.com ([72.14.220.152]:14424 "EHLO fg-out-1718.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757033Ab0CKKyF (ORCPT ); Thu, 11 Mar 2010 05:54:05 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; b=lY+ggT50El8TfOCfGS+Q389OWtrNTviS/sAWfNPRP0hu9X5H7CltokJr7FRa5Ea2gx 13XKvyVyIFF2PCJEMkdVr9SmhxeYZnTnUpO+YsUtbw9eJl7y+hKk7eARJdVYKreIxshR T0Ea1l10xMAIToDVxYd5tf/1RgLH/DSN61zuA= Message-ID: <4B98CBC8.6080008@gmail.com> Date: Thu, 11 Mar 2010 11:54:00 +0100 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.2.2pre) Gecko/20100308 SUSE/3.1b1-5.1 Thunderbird/3.1b1 MIME-Version: 1.0 To: Yinghai Lu CC: Andrew Morton , Johannes Weiner , Greg Thelen , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , cl@linux-foundation.com Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> <4B90DC3C.1060000@gmail.com> <4B92FE9E.90600@kernel.org> In-Reply-To: <4B92FE9E.90600@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/07/2010 02:17 AM, Yinghai Lu wrote: > On 03/05/2010 02:26 AM, Jiri Slaby wrote: >> On 03/05/2010 10:04 AM, Yinghai Lu wrote: >>> according to context >>> http://patchwork.kernel.org/patch/73893/ >>> >>> Jiri, >>> please check current linus tree still have problem about mem_map is using that much low mem? >> >> Hi! >> >> Sorry, I don't have direct access to the machine. I might try to ask the >> owners to do so. >> >>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range. >> >> So where gets your mem_map allocated (I suppose you're running flat model)? > > what kernel version? 2.6.27? Hi, yes, it is 2.6.27. > x86 64bit now only support SPARSEMEM. -- js From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758154Ab0CKUPB (ORCPT ); Thu, 11 Mar 2010 15:15:01 -0500 Received: from hera.kernel.org ([140.211.167.34]:32858 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758120Ab0CKUO7 (ORCPT ); Thu, 11 Mar 2010 15:14:59 -0500 Message-ID: <4B994EC4.2050705@kernel.org> Date: Thu, 11 Mar 2010 12:12:52 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100228 SUSE/3.0.3-1.1.1 Thunderbird/3.0.3 MIME-Version: 1.0 To: Jiri Slaby CC: Andrew Morton , Johannes Weiner , Greg Thelen , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , cl@linux-foundation.com Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> <4B90DC3C.1060000@gmail.com> <4B92FE9E.90600@kernel.org> <4B98CBC8.6080008@gmail.com> In-Reply-To: <4B98CBC8.6080008@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/11/2010 02:54 AM, Jiri Slaby wrote: > On 03/07/2010 02:17 AM, Yinghai Lu wrote: >> On 03/05/2010 02:26 AM, Jiri Slaby wrote: >>> On 03/05/2010 10:04 AM, Yinghai Lu wrote: >>>> according to context >>>> http://patchwork.kernel.org/patch/73893/ >>>> >>>> Jiri, >>>> please check current linus tree still have problem about mem_map is >>>> using that much low mem? >>> >>> Hi! >>> >>> Sorry, I don't have direct access to the machine. I might try to ask the >>> owners to do so. >>> >>>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range. >>> >>> So where gets your mem_map allocated (I suppose you're running flat >>> model)? >> >> what kernel version? 2.6.27? > > Hi, yes, it is 2.6.27. SLES 11? Yinghai From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754871Ab0CKVkO (ORCPT ); Thu, 11 Mar 2010 16:40:14 -0500 Received: from mail-fx0-f227.google.com ([209.85.220.227]:35704 "EHLO mail-fx0-f227.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754316Ab0CKVkJ (ORCPT ); Thu, 11 Mar 2010 16:40:09 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; b=pE5iOHtFfYrXwiPRbJO5kGFMRQVbAFltk4+OeB+7eIEIAbBM5/AOgcKQD/uCdL2WKx MkE1+mpPQAO/BEUHXF1etxKaclt5wwn2UNdBoFTqz/tKLiQxsuXS6gNxDKJPj/5kG9p4 NZHcBsvsr74x4X2+GtNkfNc75QxtF6OoRs4IQ= Message-ID: <4B996335.7070907@gmail.com> Date: Thu, 11 Mar 2010 22:40:05 +0100 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.2.2pre) Gecko/20100308 SUSE/3.1b1-5.1 Thunderbird/3.1b1 MIME-Version: 1.0 To: Yinghai Lu CC: Andrew Morton , Johannes Weiner , Greg Thelen , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , cl@linux-foundation.com Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> <4B90DC3C.1060000@gmail.com> <4B92FE9E.90600@kernel.org> <4B98CBC8.6080008@gmail.com> <4B994EC4.2050705@kernel.org> In-Reply-To: <4B994EC4.2050705@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/11/2010 09:12 PM, Yinghai Lu wrote: > On 03/11/2010 02:54 AM, Jiri Slaby wrote: >> Hi, yes, it is 2.6.27. > > SLES 11? Sorry I wrote that in haste. It is SLES 10 in the end. That means it is 2.6.16, not 2.6.27. Hence no sparsemem whatsoever. With SLES11 it should be OK, we are using flatmem only for i386. Whatever, it should be no issue now, as flatmem currently (as of 2.6.25) depends on i386. On the other hand I still considered the patch as applicable to contemporary kernels since there might be weird bios e820 maps and huge (and sparse) bootmem allocations/reservations (memory cgroups, initrd) so that code requiring much memory below 4g (swiotlb) will fail then. Whatever, in the current kernel, the particular issue I was referring to *is not reproducible*. thanks, -- js From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754937Ab0CKVoU (ORCPT ); Thu, 11 Mar 2010 16:44:20 -0500 Received: from hera.kernel.org ([140.211.167.34]:51536 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754574Ab0CKVoT (ORCPT ); Thu, 11 Mar 2010 16:44:19 -0500 Message-ID: <4B9963D2.10002@kernel.org> Date: Thu, 11 Mar 2010 13:42:42 -0800 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100228 SUSE/3.0.3-1.1.1 Thunderbird/3.0.3 MIME-Version: 1.0 To: Jiri Slaby CC: Andrew Morton , Johannes Weiner , Greg Thelen , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , cl@linux-foundation.com Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> <4B90DC3C.1060000@gmail.com> <4B92FE9E.90600@kernel.org> <4B98CBC8.6080008@gmail.com> <4B994EC4.2050705@kernel.org> <4B996335.7070907@gmail.com> In-Reply-To: <4B996335.7070907@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/11/2010 01:40 PM, Jiri Slaby wrote: > On 03/11/2010 09:12 PM, Yinghai Lu wrote: >> On 03/11/2010 02:54 AM, Jiri Slaby wrote: >>> Hi, yes, it is 2.6.27. >> >> SLES 11? > > Sorry I wrote that in haste. It is SLES 10 in the end. That means it is > 2.6.16, not 2.6.27. Hence no sparsemem whatsoever. With SLES11 it should > be OK, we are using flatmem only for i386. > > Whatever, it should be no issue now, as flatmem currently (as of 2.6.25) > depends on i386. > > On the other hand I still considered the patch as applicable to > contemporary kernels since there might be weird bios e820 maps and huge > (and sparse) bootmem allocations/reservations (memory cgroups, initrd) > so that code requiring much memory below 4g (swiotlb) will fail then. > > Whatever, in the current kernel, the particular issue I was referring to > *is not reproducible*. the point is: we should only put the memmap put high. that is big chunk... other users should be ok... and leave them alone. YH From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 749056B008A for ; Thu, 4 Mar 2010 16:22:03 -0500 (EST) Received: from kpbe13.cbf.corp.google.com (kpbe13.cbf.corp.google.com [172.25.105.77]) by smtp-out.google.com with ESMTP id o24LM2Ir012306 for ; Thu, 4 Mar 2010 21:22:03 GMT Received: from pwj9 (pwj9.prod.google.com [10.241.219.73]) by kpbe13.cbf.corp.google.com with ESMTP id o24LM1Dn007879 for ; Thu, 4 Mar 2010 15:22:01 -0600 Received: by pwj9 with SMTP id 9so2102371pwj.0 for ; Thu, 04 Mar 2010 13:22:01 -0800 (PST) MIME-Version: 1.0 From: Greg Thelen Date: Thu, 4 Mar 2010 13:21:41 -0800 Message-ID: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> Subject: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org List-ID: On several systems I am seeing a boot panic if I use mmotm (stamp-2010-03-02-18-38). If I remove bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I find that: * 2.6.33 boots fine. * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. * 2.6.33 + mmotm (including bootmem-avoid-dma32-zone-by-default.patch): panics. Note: I had to enable earlyprintk to see the panic. Without earlyprintk no console output was seen. The system appeared to hang after the loader. Here's the panic seen with earlyprintk using 2.6.33 + mmotm: Starting up ... [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 2.6.33-mm1+ (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010 [ 0.000000] Command line: root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0 console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600 [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) [ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 000000000fff0000 (usable) [ 0.000000] BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data) [ 0.000000] BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved) [ 0.000000] bootconsole [earlyser0] enabled [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI 2.4 present. [ 0.000000] No AGP bridge found [ 0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000 [ 0.000000] PAT not supported by CPU. [ 0.000000] CPU MTRRs all blank - virtualized system. [ 0.000000] Scanning 1 areas for low memory corruption [ 0.000000] modified physical RAM map: [ 0.000000] modified: 0000000000000000 - 0000000000010000 (reserved) [ 0.000000] modified: 0000000000010000 - 000000000009fc00 (usable) [ 0.000000] modified: 000000000009fc00 - 00000000000a0000 (reserved) [ 0.000000] modified: 00000000000e8000 - 0000000000100000 (reserved) [ 0.000000] modified: 0000000000100000 - 000000000fff0000 (usable) [ 0.000000] modified: 000000000fff0000 - 0000000010000000 (ACPI data) [ 0.000000] modified: 00000000fffbd000 - 0000000100000000 (reserved) [ 0.000000] init_memory_mapping: 0000000000000000-000000000fff0000 [ 0.000000] RAMDISK: 0fd9d000 - 0ffdf539 [ 0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU ) [ 0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU QEMURSDT 00000001 QEMU 00000001) [ 0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU QEMUFACP 00000001 QEMU 00000001) [ 0.000000] ACPI: DSDT 000000000fff0100 0089D (v01 BXPC BXDSDT 00000001 INTL 20061109) [ 0.000000] ACPI: FACS 000000000fff00c0 00040 [ 0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU QEMUAPIC 00000001 QEMU 00000001) [ 0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU QEMUSSDT 00000001 QEMU 00000001) [ 0.000000] No NUMA configuration found [ 0.000000] Faking a node at 0000000000000000-000000000fff0000 [ 0.000000] Initmem setup node 0 0000000000000000-000000000fff0000 [ 0.000000] NODE_DATA [0000000001c4e040 - 0000000001c5303f] [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) [ 0.000000] IP: [] memory_present+0x9a/0xbf [ 0.000000] PGD 0 [ 0.000000] Oops: 0000 [#1] SMP [ 0.000000] last sysfs file: [ 0.000000] CPU 0 [ 0.000000] Modules linked in: [ 0.000000] [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 / [ 0.000000] RIP: 0010:[] [] memory_present+0x9a/0xbf [ 0.000000] RSP: 0000:ffffffff81a01e18 EFLAGS: 00010046 [ 0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 [ 0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 [ 0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040 [ 0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000 [ 0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81adf000(0000) knlGS:0000000000000000 [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0 [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a10020) [ 0.000000] Stack: [ 0.000000] 000000000fff0000 000000000000009f 0000000000000000 0000000000000000 [ 0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000 0000000000000000 [ 0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88 000000000fff0000 [ 0.000000] Call Trace: [ 0.000000] [] sparse_memory_present_with_active_regions+0x31/0x47 [ 0.000000] [] paging_init+0x3f/0x5b [ 0.000000] [] setup_arch+0x964/0xa03 [ 0.000000] [] ? need_resched+0x1e/0x28 [ 0.000000] [] ? should_resched+0x9/0x2a [ 0.000000] [] ? _cond_resched+0x9/0x1d [ 0.000000] [] start_kernel+0x9f/0x382 [ 0.000000] [] x86_64_start_reservations+0xa9/0xad [ 0.000000] [] x86_64_start_kernel+0xe6/0xed [ 0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16 c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65 c8 72 [ 0.000000] RIP [] memory_present+0x9a/0xbf [ 0.000000] RSP [ 0.000000] CR2: 0000000000000000 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] Pid: 0, comm: swapper Tainted: G D 2.6.33-mm1+ #1 [ 0.000000] Call Trace: [ 0.000000] [] panic+0x9e/0x113 [ 0.000000] [] ? printk+0x67/0x69 [ 0.000000] [] ? blocking_notifier_call_chain+0xf/0x11 [ 0.000000] [] do_exit+0x78/0x70f [ 0.000000] [] ? spin_unlock_irqrestore+0x9/0xb [ 0.000000] [] ? kmsg_dump+0x112/0x138 [ 0.000000] [] oops_end+0xb2/0xba [ 0.000000] [] no_context+0x1f5/0x204 [ 0.000000] [] __bad_area_nosemaphore+0x17f/0x1a2 [ 0.000000] [] bad_area_nosemaphore+0xe/0x10 [ 0.000000] [] do_page_fault+0x122/0x24c [ 0.000000] [] page_fault+0x1f/0x30 [ 0.000000] [] ? memory_present+0x9a/0xbf [ 0.000000] [] ? memory_present+0x9a/0xbf [ 0.000000] [] sparse_memory_present_with_active_regions+0x31/0x47 [ 0.000000] [] paging_init+0x3f/0x5b [ 0.000000] [] setup_arch+0x964/0xa03 [ 0.000000] [] ? need_resched+0x1e/0x28 [ 0.000000] [] ? should_resched+0x9/0x2a [ 0.000000] [] ? _cond_resched+0x9/0x1d [ 0.000000] [] start_kernel+0x9f/0x382 [ 0.000000] [] x86_64_start_reservations+0xa9/0xad [ 0.000000] [] x86_64_start_kernel+0xe6/0xed The kernel was built with 'make mrproper && make defconfig && make ARCH=x86_64 CONFIG=smp -j 6'. This panic is seen on every attempt, so I can provide more diagnostics. -- Greg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 94A346B00B1 for ; Thu, 4 Mar 2010 22:21:32 -0500 (EST) Date: Fri, 5 Mar 2010 04:21:06 +0100 From: Johannes Weiner Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Message-ID: <20100305032106.GA12065@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> Sender: owner-linux-mm@kvack.org To: Greg Thelen Cc: Yinghai Lu , linux-mm@kvack.org List-ID: Hello Greg, On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: > On several systems I am seeing a boot panic if I use mmotm > (stamp-2010-03-02-18-38). If I remove > bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I > find that: > * 2.6.33 boots fine. > * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. > * 2.6.33 + mmotm (including > bootmem-avoid-dma32-zone-by-default.patch): panics. > Note: I had to enable earlyprintk to see the panic. Without > earlyprintk no console output was seen. The system appeared to hang > after the loader. Thanks for your report. A few notes below. > Here's the panic seen with earlyprintk using 2.6.33 + mmotm: > > Starting up ... > [ 0.000000] Initializing cgroup subsys cpuset > [ 0.000000] Initializing cgroup subsys cpu > [ 0.000000] Linux version 2.6.33-mm1+ > (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu > 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010 > [ 0.000000] Command line: > root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0 > console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600 > [ 0.000000] BIOS-provided physical RAM map: > [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) > [ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) > [ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) > [ 0.000000] BIOS-e820: 0000000000100000 - 000000000fff0000 (usable) > [ 0.000000] BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data) > [ 0.000000] BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved) > [ 0.000000] bootconsole [earlyser0] enabled > [ 0.000000] NX (Execute Disable) protection: active > [ 0.000000] DMI 2.4 present. > [ 0.000000] No AGP bridge found > [ 0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000 > [ 0.000000] PAT not supported by CPU. > [ 0.000000] CPU MTRRs all blank - virtualized system. > [ 0.000000] Scanning 1 areas for low memory corruption > [ 0.000000] modified physical RAM map: > [ 0.000000] modified: 0000000000000000 - 0000000000010000 (reserved) > [ 0.000000] modified: 0000000000010000 - 000000000009fc00 (usable) > [ 0.000000] modified: 000000000009fc00 - 00000000000a0000 (reserved) > [ 0.000000] modified: 00000000000e8000 - 0000000000100000 (reserved) > [ 0.000000] modified: 0000000000100000 - 000000000fff0000 (usable) > [ 0.000000] modified: 000000000fff0000 - 0000000010000000 (ACPI data) > [ 0.000000] modified: 00000000fffbd000 - 0000000100000000 (reserved) > [ 0.000000] init_memory_mapping: 0000000000000000-000000000fff0000 256MB of memory, right? > [ 0.000000] RAMDISK: 0fd9d000 - 0ffdf539 > [ 0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU ) > [ 0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU QEMURSDT > 00000001 QEMU 00000001) > [ 0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU QEMUFACP > 00000001 QEMU 00000001) > [ 0.000000] ACPI: DSDT 000000000fff0100 0089D (v01 BXPC BXDSDT > 00000001 INTL 20061109) > [ 0.000000] ACPI: FACS 000000000fff00c0 00040 > [ 0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU QEMUAPIC > 00000001 QEMU 00000001) > [ 0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU QEMUSSDT > 00000001 QEMU 00000001) > [ 0.000000] No NUMA configuration found > [ 0.000000] Faking a node at 0000000000000000-000000000fff0000 > [ 0.000000] Initmem setup node 0 0000000000000000-000000000fff0000 > [ 0.000000] NODE_DATA [0000000001c4e040 - 0000000001c5303f] > [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 0.000000] IP: [] memory_present+0x9a/0xbf > [ 0.000000] PGD 0 > [ 0.000000] Oops: 0000 [#1] SMP > [ 0.000000] last sysfs file: > [ 0.000000] CPU 0 > [ 0.000000] Modules linked in: > [ 0.000000] > [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 / > [ 0.000000] RIP: 0010:[] [] > memory_present+0x9a/0xbf > [ 0.000000] RSP: 0000:ffffffff81a01e18 EFLAGS: 00010046 > [ 0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 > [ 0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 > [ 0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040 > [ 0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000 > [ 0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 > [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81adf000(0000) > knlGS:0000000000000000 > [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0 > [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000, > task ffffffff81a10020) > [ 0.000000] Stack: > [ 0.000000] 000000000fff0000 000000000000009f 0000000000000000 > 0000000000000000 > [ 0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000 > 0000000000000000 > [ 0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88 > 000000000fff0000 > [ 0.000000] Call Trace: > [ 0.000000] [] > sparse_memory_present_with_active_regions+0x31/0x47 > [ 0.000000] [] paging_init+0x3f/0x5b > [ 0.000000] [] setup_arch+0x964/0xa03 > [ 0.000000] [] ? need_resched+0x1e/0x28 > [ 0.000000] [] ? should_resched+0x9/0x2a > [ 0.000000] [] ? _cond_resched+0x9/0x1d > [ 0.000000] [] start_kernel+0x9f/0x382 > [ 0.000000] [] x86_64_start_reservations+0xa9/0xad > [ 0.000000] [] x86_64_start_kernel+0xe6/0xed > [ 0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16 > c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8 > 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65 > c8 72 > [ 0.000000] RIP [] memory_present+0x9a/0xbf > [ 0.000000] RSP > [ 0.000000] CR2: 0000000000000000 > [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > [ 0.000000] Pid: 0, comm: swapper Tainted: G D 2.6.33-mm1+ #1 > [ 0.000000] Call Trace: > [ 0.000000] [] panic+0x9e/0x113 > [ 0.000000] [] ? printk+0x67/0x69 > [ 0.000000] [] ? blocking_notifier_call_chain+0xf/0x11 > [ 0.000000] [] do_exit+0x78/0x70f > [ 0.000000] [] ? spin_unlock_irqrestore+0x9/0xb > [ 0.000000] [] ? kmsg_dump+0x112/0x138 > [ 0.000000] [] oops_end+0xb2/0xba > [ 0.000000] [] no_context+0x1f5/0x204 > [ 0.000000] [] __bad_area_nosemaphore+0x17f/0x1a2 > [ 0.000000] [] bad_area_nosemaphore+0xe/0x10 > [ 0.000000] [] do_page_fault+0x122/0x24c > [ 0.000000] [] page_fault+0x1f/0x30 > [ 0.000000] [] ? memory_present+0x9a/0xbf > [ 0.000000] [] ? memory_present+0x9a/0xbf > [ 0.000000] [] > sparse_memory_present_with_active_regions+0x31/0x47 > [ 0.000000] [] paging_init+0x3f/0x5b > [ 0.000000] [] setup_arch+0x964/0xa03 > [ 0.000000] [] ? need_resched+0x1e/0x28 > [ 0.000000] [] ? should_resched+0x9/0x2a > [ 0.000000] [] ? _cond_resched+0x9/0x1d > [ 0.000000] [] start_kernel+0x9f/0x382 > [ 0.000000] [] x86_64_start_reservations+0xa9/0xad > [ 0.000000] [] x86_64_start_kernel+0xe6/0xed > > The kernel was built with 'make mrproper && make defconfig && make > ARCH=x86_64 CONFIG=smp -j 6'. This panic is seen on every attempt, so > I can provide more diagnostics. Okay, if you did defconfig and just hit enter to all questions, you should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled. This means that the 'mem_section' is an array of pointers and the following happens in memory_present(): for_one_pfn_in_each_section() { sparse_index_init(); /* no return value check */ ms = __nr_to_section(); if (!ms->section_mem_map) /* bang */ ...; } where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate the mem_section descriptor with bootmem. If this would fail, the box would panic immediately earlier, but NO_BOOTMEM does not seem to get it right. Greg, could you retry _with_ my bootmem patch applied, but with setting CONFIG_NO_BOOTMEM=n up front? I think NO_BOOTMEM has several problems. Yinghai, can you verify them? 1. It does not seem to handle goal appropriately: bootmem would try without the goal if it does not make sense. And in this case, the goal is 4G (above DMA32) and the amount of memory is 256M. And if I did not miss something, this is the difference with my patch: without it, the default goal is 16M, which is no problem as it is well within your available memory. But the change of the default goal moved it outside it which the bootmem replacement can not handle. 2. The early reservation stuff seems to return NULL but callsites assume that the bootmem interface never does that. Okay, the result is the same, we crash. But it still moves error reporting to a possibly much later point where somebody actually dereferences the returned pointer. Hannes -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id 86C546B004D for ; Fri, 5 Mar 2010 00:01:35 -0500 (EST) Message-ID: <4B908FF3.5000303@kernel.org> Date: Thu, 04 Mar 2010 21:00:35 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> In-Reply-To: <20100305032106.GA12065@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: Greg Thelen , linux-mm@kvack.org List-ID: On 03/04/2010 07:21 PM, Johannes Weiner wrote: > Hello Greg, > > On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >> On several systems I am seeing a boot panic if I use mmotm >> (stamp-2010-03-02-18-38). If I remove >> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >> find that: >> * 2.6.33 boots fine. >> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >> * 2.6.33 + mmotm (including >> bootmem-avoid-dma32-zone-by-default.patch): panics. >> Note: I had to enable earlyprintk to see the panic. Without >> earlyprintk no console output was seen. The system appeared to hang >> after the loader. > > Thanks for your report. A few notes below. > >> Here's the panic seen with earlyprintk using 2.6.33 + mmotm: >> >> Starting up ... >> [ 0.000000] Initializing cgroup subsys cpuset >> [ 0.000000] Initializing cgroup subsys cpu >> [ 0.000000] Linux version 2.6.33-mm1+ >> (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu >> 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010 >> [ 0.000000] Command line: >> root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0 >> console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600 >> [ 0.000000] BIOS-provided physical RAM map: >> [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) >> [ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) >> [ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) >> [ 0.000000] BIOS-e820: 0000000000100000 - 000000000fff0000 (usable) >> [ 0.000000] BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data) >> [ 0.000000] BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved) >> [ 0.000000] bootconsole [earlyser0] enabled >> [ 0.000000] NX (Execute Disable) protection: active >> [ 0.000000] DMI 2.4 present. >> [ 0.000000] No AGP bridge found >> [ 0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000 >> [ 0.000000] PAT not supported by CPU. >> [ 0.000000] CPU MTRRs all blank - virtualized system. >> [ 0.000000] Scanning 1 areas for low memory corruption >> [ 0.000000] modified physical RAM map: >> [ 0.000000] modified: 0000000000000000 - 0000000000010000 (reserved) >> [ 0.000000] modified: 0000000000010000 - 000000000009fc00 (usable) >> [ 0.000000] modified: 000000000009fc00 - 00000000000a0000 (reserved) >> [ 0.000000] modified: 00000000000e8000 - 0000000000100000 (reserved) >> [ 0.000000] modified: 0000000000100000 - 000000000fff0000 (usable) >> [ 0.000000] modified: 000000000fff0000 - 0000000010000000 (ACPI data) >> [ 0.000000] modified: 00000000fffbd000 - 0000000100000000 (reserved) >> [ 0.000000] init_memory_mapping: 0000000000000000-000000000fff0000 > > 256MB of memory, right? > >> [ 0.000000] RAMDISK: 0fd9d000 - 0ffdf539 >> [ 0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU ) >> [ 0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU QEMURSDT >> 00000001 QEMU 00000001) >> [ 0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU QEMUFACP >> 00000001 QEMU 00000001) >> [ 0.000000] ACPI: DSDT 000000000fff0100 0089D (v01 BXPC BXDSDT >> 00000001 INTL 20061109) >> [ 0.000000] ACPI: FACS 000000000fff00c0 00040 >> [ 0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU QEMUAPIC >> 00000001 QEMU 00000001) >> [ 0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU QEMUSSDT >> 00000001 QEMU 00000001) >> [ 0.000000] No NUMA configuration found >> [ 0.000000] Faking a node at 0000000000000000-000000000fff0000 >> [ 0.000000] Initmem setup node 0 0000000000000000-000000000fff0000 >> [ 0.000000] NODE_DATA [0000000001c4e040 - 0000000001c5303f] >> [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) >> [ 0.000000] IP: [] memory_present+0x9a/0xbf >> [ 0.000000] PGD 0 >> [ 0.000000] Oops: 0000 [#1] SMP >> [ 0.000000] last sysfs file: >> [ 0.000000] CPU 0 >> [ 0.000000] Modules linked in: >> [ 0.000000] >> [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 / >> [ 0.000000] RIP: 0010:[] [] >> memory_present+0x9a/0xbf >> [ 0.000000] RSP: 0000:ffffffff81a01e18 EFLAGS: 00010046 >> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 >> [ 0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 >> [ 0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040 >> [ 0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000 >> [ 0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 >> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81adf000(0000) >> knlGS:0000000000000000 >> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0 >> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000, >> task ffffffff81a10020) >> [ 0.000000] Stack: >> [ 0.000000] 000000000fff0000 000000000000009f 0000000000000000 >> 0000000000000000 >> [ 0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000 >> 0000000000000000 >> [ 0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88 >> 000000000fff0000 >> [ 0.000000] Call Trace: >> [ 0.000000] [] >> sparse_memory_present_with_active_regions+0x31/0x47 >> [ 0.000000] [] paging_init+0x3f/0x5b >> [ 0.000000] [] setup_arch+0x964/0xa03 >> [ 0.000000] [] ? need_resched+0x1e/0x28 >> [ 0.000000] [] ? should_resched+0x9/0x2a >> [ 0.000000] [] ? _cond_resched+0x9/0x1d >> [ 0.000000] [] start_kernel+0x9f/0x382 >> [ 0.000000] [] x86_64_start_reservations+0xa9/0xad >> [ 0.000000] [] x86_64_start_kernel+0xe6/0xed >> [ 0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16 >> c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8 >> 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65 >> c8 72 >> [ 0.000000] RIP [] memory_present+0x9a/0xbf >> [ 0.000000] RSP >> [ 0.000000] CR2: 0000000000000000 >> [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- >> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >> [ 0.000000] Pid: 0, comm: swapper Tainted: G D 2.6.33-mm1+ #1 >> [ 0.000000] Call Trace: >> [ 0.000000] [] panic+0x9e/0x113 >> [ 0.000000] [] ? printk+0x67/0x69 >> [ 0.000000] [] ? blocking_notifier_call_chain+0xf/0x11 >> [ 0.000000] [] do_exit+0x78/0x70f >> [ 0.000000] [] ? spin_unlock_irqrestore+0x9/0xb >> [ 0.000000] [] ? kmsg_dump+0x112/0x138 >> [ 0.000000] [] oops_end+0xb2/0xba >> [ 0.000000] [] no_context+0x1f5/0x204 >> [ 0.000000] [] __bad_area_nosemaphore+0x17f/0x1a2 >> [ 0.000000] [] bad_area_nosemaphore+0xe/0x10 >> [ 0.000000] [] do_page_fault+0x122/0x24c >> [ 0.000000] [] page_fault+0x1f/0x30 >> [ 0.000000] [] ? memory_present+0x9a/0xbf >> [ 0.000000] [] ? memory_present+0x9a/0xbf >> [ 0.000000] [] >> sparse_memory_present_with_active_regions+0x31/0x47 >> [ 0.000000] [] paging_init+0x3f/0x5b >> [ 0.000000] [] setup_arch+0x964/0xa03 >> [ 0.000000] [] ? need_resched+0x1e/0x28 >> [ 0.000000] [] ? should_resched+0x9/0x2a >> [ 0.000000] [] ? _cond_resched+0x9/0x1d >> [ 0.000000] [] start_kernel+0x9f/0x382 >> [ 0.000000] [] x86_64_start_reservations+0xa9/0xad >> [ 0.000000] [] x86_64_start_kernel+0xe6/0xed >> >> The kernel was built with 'make mrproper && make defconfig && make >> ARCH=x86_64 CONFIG=smp -j 6'. This panic is seen on every attempt, so >> I can provide more diagnostics. > > Okay, if you did defconfig and just hit enter to all questions, you > should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled. This means that > the 'mem_section' is an array of pointers and the following happens in > memory_present(): > > for_one_pfn_in_each_section() { > sparse_index_init(); /* no return value check */ > ms = __nr_to_section(); > if (!ms->section_mem_map) /* bang */ > ...; > } > > where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate > the mem_section descriptor with bootmem. If this would fail, the box > would panic immediately earlier, but NO_BOOTMEM does not seem to get it > right. > > Greg, could you retry _with_ my bootmem patch applied, but with setting > CONFIG_NO_BOOTMEM=n up front? > > I think NO_BOOTMEM has several problems. Yinghai, can you verify them? > > 1. It does not seem to handle goal appropriately: bootmem would try > without the goal if it does not make sense. And in this case, the > goal is 4G (above DMA32) and the amount of memory is 256M. > > And if I did not miss something, this is the difference with my patch: > without it, the default goal is 16M, which is no problem as it is well > within your available memory. But the change of the default goal moved > it outside it which the bootmem replacement can not handle. > > 2. The early reservation stuff seems to return NULL but callsites assume > that the bootmem interface never does that. Okay, the result is the same, > we crash. But it still moves error reporting to a possibly much later > point where somebody actually dereferences the returned pointer. related change could be: __alloc_bootmem_node_high... void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size, unsigned long align, unsigned long goal) { #ifdef MAX_DMA32_PFN unsigned long end_pfn; if (WARN_ON_ONCE(slab_is_available())) return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); /* update goal according ...MAX_DMA32_PFN */ end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) && (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) { void *ptr; unsigned long new_goal; new_goal = MAX_DMA32_PFN << PAGE_SHIFT; #ifdef CONFIG_NO_BOOTMEM ptr = __alloc_memory_core_early(pgdat->node_id, size, align, new_goal, -1ULL); #else ptr = alloc_bootmem_core(pgdat->bdata, size, align, new_goal, 0); #endif if (ptr) return ptr; } #endif return __alloc_bootmem_node(pgdat, size, align, goal); } also __alloc_bootmem_node will not fallback...if you specify one big goal. static void * __init_refok __earlyonly_bootmem_alloc(int node, unsigned long size, unsigned long align, unsigned long goal) { return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal); } static void *vmemmap_buf; static void *vmemmap_buf_end; void * __meminit vmemmap_alloc_block(unsigned long size, int node) { /* If the main allocator is up use that, fallback to bootmem. */ if (slab_is_available()) { struct page *page; if (node_state(node, N_HIGH_MEMORY)) page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, get_order(size)); else page = alloc_pages(GFP_KERNEL | __GFP_ZERO, get_order(size)); if (page) return page_address(page); return NULL; } else return __earlyonly_bootmem_alloc(node, size, size, __pa(MAX_DMA_ADDRESS)); } so you patch change the goal in vmemmap_alloc_block ? YH -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id 6A5C66B007B for ; Fri, 5 Mar 2010 00:15:12 -0500 (EST) Message-ID: <4B909327.2030702@kernel.org> Date: Thu, 04 Mar 2010 21:14:15 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B908FF3.5000303@kernel.org> In-Reply-To: <4B908FF3.5000303@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: Greg Thelen , linux-mm@kvack.org List-ID: On 03/04/2010 09:00 PM, Yinghai Lu wrote: > On 03/04/2010 07:21 PM, Johannes Weiner wrote: >> Hello Greg, >> >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>> On several systems I am seeing a boot panic if I use mmotm >>> (stamp-2010-03-02-18-38). If I remove >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >>> find that: >>> * 2.6.33 boots fine. >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >>> * 2.6.33 + mmotm (including >>> bootmem-avoid-dma32-zone-by-default.patch): panics. >>> Note: I had to enable earlyprintk to see the panic. Without >>> earlyprintk no console output was seen. The system appeared to hang >>> after the loader. >> >> Thanks for your report. A few notes below. >> >>> Here's the panic seen with earlyprintk using 2.6.33 + mmotm: >>> >>> Starting up ... >>> [ 0.000000] Initializing cgroup subsys cpuset >>> [ 0.000000] Initializing cgroup subsys cpu >>> [ 0.000000] Linux version 2.6.33-mm1+ >>> (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu >>> 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010 >>> [ 0.000000] Command line: >>> root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0 >>> console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600 >>> [ 0.000000] BIOS-provided physical RAM map: >>> [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) >>> [ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) >>> [ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) >>> [ 0.000000] BIOS-e820: 0000000000100000 - 000000000fff0000 (usable) >>> [ 0.000000] BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data) >>> [ 0.000000] BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved) >>> [ 0.000000] bootconsole [earlyser0] enabled >>> [ 0.000000] NX (Execute Disable) protection: active >>> [ 0.000000] DMI 2.4 present. >>> [ 0.000000] No AGP bridge found >>> [ 0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000 >>> [ 0.000000] PAT not supported by CPU. >>> [ 0.000000] CPU MTRRs all blank - virtualized system. >>> [ 0.000000] Scanning 1 areas for low memory corruption >>> [ 0.000000] modified physical RAM map: >>> [ 0.000000] modified: 0000000000000000 - 0000000000010000 (reserved) >>> [ 0.000000] modified: 0000000000010000 - 000000000009fc00 (usable) >>> [ 0.000000] modified: 000000000009fc00 - 00000000000a0000 (reserved) >>> [ 0.000000] modified: 00000000000e8000 - 0000000000100000 (reserved) >>> [ 0.000000] modified: 0000000000100000 - 000000000fff0000 (usable) >>> [ 0.000000] modified: 000000000fff0000 - 0000000010000000 (ACPI data) >>> [ 0.000000] modified: 00000000fffbd000 - 0000000100000000 (reserved) >>> [ 0.000000] init_memory_mapping: 0000000000000000-000000000fff0000 >> >> 256MB of memory, right? >> >>> [ 0.000000] RAMDISK: 0fd9d000 - 0ffdf539 >>> [ 0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU ) >>> [ 0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU QEMURSDT >>> 00000001 QEMU 00000001) >>> [ 0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU QEMUFACP >>> 00000001 QEMU 00000001) >>> [ 0.000000] ACPI: DSDT 000000000fff0100 0089D (v01 BXPC BXDSDT >>> 00000001 INTL 20061109) >>> [ 0.000000] ACPI: FACS 000000000fff00c0 00040 >>> [ 0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU QEMUAPIC >>> 00000001 QEMU 00000001) >>> [ 0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU QEMUSSDT >>> 00000001 QEMU 00000001) >>> [ 0.000000] No NUMA configuration found >>> [ 0.000000] Faking a node at 0000000000000000-000000000fff0000 >>> [ 0.000000] Initmem setup node 0 0000000000000000-000000000fff0000 >>> [ 0.000000] NODE_DATA [0000000001c4e040 - 0000000001c5303f] >>> [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) >>> [ 0.000000] IP: [] memory_present+0x9a/0xbf >>> [ 0.000000] PGD 0 >>> [ 0.000000] Oops: 0000 [#1] SMP >>> [ 0.000000] last sysfs file: >>> [ 0.000000] CPU 0 >>> [ 0.000000] Modules linked in: >>> [ 0.000000] >>> [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 / >>> [ 0.000000] RIP: 0010:[] [] >>> memory_present+0x9a/0xbf >>> [ 0.000000] RSP: 0000:ffffffff81a01e18 EFLAGS: 00010046 >>> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 >>> [ 0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 >>> [ 0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040 >>> [ 0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000 >>> [ 0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 >>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81adf000(0000) >>> knlGS:0000000000000000 >>> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0 >>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000, >>> task ffffffff81a10020) >>> [ 0.000000] Stack: >>> [ 0.000000] 000000000fff0000 000000000000009f 0000000000000000 >>> 0000000000000000 >>> [ 0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000 >>> 0000000000000000 >>> [ 0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88 >>> 000000000fff0000 >>> [ 0.000000] Call Trace: >>> [ 0.000000] [] >>> sparse_memory_present_with_active_regions+0x31/0x47 >>> [ 0.000000] [] paging_init+0x3f/0x5b >>> [ 0.000000] [] setup_arch+0x964/0xa03 >>> [ 0.000000] [] ? need_resched+0x1e/0x28 >>> [ 0.000000] [] ? should_resched+0x9/0x2a >>> [ 0.000000] [] ? _cond_resched+0x9/0x1d >>> [ 0.000000] [] start_kernel+0x9f/0x382 >>> [ 0.000000] [] x86_64_start_reservations+0xa9/0xad >>> [ 0.000000] [] x86_64_start_kernel+0xe6/0xed >>> [ 0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16 >>> c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8 >>> 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65 >>> c8 72 >>> [ 0.000000] RIP [] memory_present+0x9a/0xbf >>> [ 0.000000] RSP >>> [ 0.000000] CR2: 0000000000000000 >>> [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- >>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >>> [ 0.000000] Pid: 0, comm: swapper Tainted: G D 2.6.33-mm1+ #1 >>> [ 0.000000] Call Trace: >>> [ 0.000000] [] panic+0x9e/0x113 >>> [ 0.000000] [] ? printk+0x67/0x69 >>> [ 0.000000] [] ? blocking_notifier_call_chain+0xf/0x11 >>> [ 0.000000] [] do_exit+0x78/0x70f >>> [ 0.000000] [] ? spin_unlock_irqrestore+0x9/0xb >>> [ 0.000000] [] ? kmsg_dump+0x112/0x138 >>> [ 0.000000] [] oops_end+0xb2/0xba >>> [ 0.000000] [] no_context+0x1f5/0x204 >>> [ 0.000000] [] __bad_area_nosemaphore+0x17f/0x1a2 >>> [ 0.000000] [] bad_area_nosemaphore+0xe/0x10 >>> [ 0.000000] [] do_page_fault+0x122/0x24c >>> [ 0.000000] [] page_fault+0x1f/0x30 >>> [ 0.000000] [] ? memory_present+0x9a/0xbf >>> [ 0.000000] [] ? memory_present+0x9a/0xbf >>> [ 0.000000] [] >>> sparse_memory_present_with_active_regions+0x31/0x47 >>> [ 0.000000] [] paging_init+0x3f/0x5b >>> [ 0.000000] [] setup_arch+0x964/0xa03 >>> [ 0.000000] [] ? need_resched+0x1e/0x28 >>> [ 0.000000] [] ? should_resched+0x9/0x2a >>> [ 0.000000] [] ? _cond_resched+0x9/0x1d >>> [ 0.000000] [] start_kernel+0x9f/0x382 >>> [ 0.000000] [] x86_64_start_reservations+0xa9/0xad >>> [ 0.000000] [] x86_64_start_kernel+0xe6/0xed >>> >>> The kernel was built with 'make mrproper && make defconfig && make >>> ARCH=x86_64 CONFIG=smp -j 6'. This panic is seen on every attempt, so >>> I can provide more diagnostics. >> >> Okay, if you did defconfig and just hit enter to all questions, you >> should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled. This means that >> the 'mem_section' is an array of pointers and the following happens in >> memory_present(): >> >> for_one_pfn_in_each_section() { >> sparse_index_init(); /* no return value check */ >> ms = __nr_to_section(); >> if (!ms->section_mem_map) /* bang */ >> ...; >> } >> >> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate >> the mem_section descriptor with bootmem. If this would fail, the box >> would panic immediately earlier, but NO_BOOTMEM does not seem to get it >> right. >> >> Greg, could you retry _with_ my bootmem patch applied, but with setting >> CONFIG_NO_BOOTMEM=n up front? >> >> I think NO_BOOTMEM has several problems. Yinghai, can you verify them? >> >> 1. It does not seem to handle goal appropriately: bootmem would try >> without the goal if it does not make sense. And in this case, the >> goal is 4G (above DMA32) and the amount of memory is 256M. >> >> And if I did not miss something, this is the difference with my patch: >> without it, the default goal is 16M, which is no problem as it is well >> within your available memory. But the change of the default goal moved >> it outside it which the bootmem replacement can not handle. >> >> 2. The early reservation stuff seems to return NULL but callsites assume >> that the bootmem interface never does that. Okay, the result is the same, >> we crash. But it still moves error reporting to a possibly much later >> point where somebody actually dereferences the returned pointer. > > related change could be: __alloc_bootmem_node_high... no should be here... static struct mem_section noinline __init_refok *sparse_index_alloc(int nid) { struct mem_section *section = NULL; unsigned long array_size = SECTIONS_PER_ROOT * sizeof(struct mem_section); if (slab_is_available()) { if (node_state(nid, N_HIGH_MEMORY)) section = kmalloc_node(array_size, GFP_KERNEL, nid); else section = kmalloc(array_size, GFP_KERNEL); } else section = alloc_bootmem_node(NODE_DATA(nid), array_size); and #define alloc_bootmem_node(pgdat, x) \ __alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) then you change that goal MAX_DMA_ADDRESS to 4g..., but the system only have 256M YH -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id D04136B007E for ; Fri, 5 Mar 2010 00:17:44 -0500 (EST) Received: from spaceape12.eur.corp.google.com (spaceape12.eur.corp.google.com [172.28.16.146]) by smtp-out.google.com with ESMTP id o255HeYP001129 for ; Fri, 5 Mar 2010 05:17:41 GMT Received: from pwi1 (pwi1.prod.google.com [10.241.219.1]) by spaceape12.eur.corp.google.com with ESMTP id o255Hasb030802 for ; Thu, 4 Mar 2010 21:17:37 -0800 Received: by pwi1 with SMTP id 1so94415pwi.26 for ; Thu, 04 Mar 2010 21:17:36 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20100305032106.GA12065@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> From: Greg Thelen Date: Thu, 4 Mar 2010 21:17:16 -0800 Message-ID: <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: Yinghai Lu , linux-mm@kvack.org List-ID: On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: > On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >> On several systems I am seeing a boot panic if I use mmotm >> (stamp-2010-03-02-18-38). =A0If I remove >> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. =A0I >> find that: >> * 2.6.33 boots fine. >> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fi= ne. >> * 2.6.33 + mmotm (including >> bootmem-avoid-dma32-zone-by-default.patch): panics. >> Here's the panic seen with earlyprintk using 2.6.33 + mmotm: >> [ =A0 =A00.000000] =A0modified: 0000000000000000 - 0000000000010000 (res= erved) >> [ =A0 =A00.000000] =A0modified: 0000000000010000 - 000000000009fc00 (usa= ble) >> [ =A0 =A00.000000] =A0modified: 000000000009fc00 - 00000000000a0000 (res= erved) >> [ =A0 =A00.000000] =A0modified: 00000000000e8000 - 0000000000100000 (res= erved) >> [ =A0 =A00.000000] =A0modified: 0000000000100000 - 000000000fff0000 (usa= ble) >> [ =A0 =A00.000000] =A0modified: 000000000fff0000 - 0000000010000000 (ACP= I data) >> [ =A0 =A00.000000] =A0modified: 00000000fffbd000 - 0000000100000000 (res= erved) >> [ =A0 =A00.000000] init_memory_mapping: 0000000000000000-000000000fff000= 0 > 256MB of memory, right? yes, I am testing in a 256MB VM. >> The kernel was built with 'make mrproper && make defconfig && make >> ARCH=3Dx86_64 CONFIG=3Dsmp -j 6'. =A0This panic is seen on every attempt= , so >> I can provide more diagnostics. > > Okay, if you did defconfig and just hit enter to all questions, you > should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled. Correct. > This means that the 'mem_section' is an array of pointers and the followi= ng > happens in memory_present(): > > =A0 =A0 =A0 =A0for_one_pfn_in_each_section() { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sparse_index_init(); /* no return value ch= eck */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ms =3D __nr_to_section(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!ms->section_mem_map) /* bang */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0...; > =A0 =A0 =A0 =A0} > > where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate > the mem_section descriptor with bootmem. =A0If this would fail, the box > would panic immediately earlier, but NO_BOOTMEM does not seem to get it > right. > > Greg, could you retry _with_ my bootmem patch applied, but with setting > CONFIG_NO_BOOTMEM=3Dn up front? Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I re-tested with 'make defconfig' to confirm the panic with this later mmotm. Then, as you suggested, I set CONFIG_NO_BOOTMEM=3Dn. The system booted fine (no panic). -- Greg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 3995E6B004D for ; Fri, 5 Mar 2010 00:34:39 -0500 (EST) Received: from spaceape11.eur.corp.google.com (spaceape11.eur.corp.google.com [172.28.16.145]) by smtp-out.google.com with ESMTP id o255Yb7E023810 for ; Thu, 4 Mar 2010 21:34:37 -0800 Received: from pzk41 (pzk41.prod.google.com [10.243.19.169]) by spaceape11.eur.corp.google.com with ESMTP id o255YF5j000914 for ; Thu, 4 Mar 2010 21:34:36 -0800 Received: by pzk41 with SMTP id 41so2237315pzk.23 for ; Thu, 04 Mar 2010 21:34:34 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> From: Greg Thelen Date: Thu, 4 Mar 2010 21:34:14 -0800 Message-ID: <49b004811003042134s4bbd0425n1517a1cb0e9879d9@mail.gmail.com> Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: Yinghai Lu , linux-mm@kvack.org List-ID: On Thu, Mar 4, 2010 at 9:17 PM, Greg Thelen wrote: > On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: >> 256MB of memory, right? > > yes, I am testing in a 256MB VM. I also performed a 6GB test and found that the system booted fine with defconfig: CONFIG_NO_BOOTMEM=y CONFIG_SPARSEMEM_EXTREME=y -- Greg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id 8B69F6B004D for ; Fri, 5 Mar 2010 04:05:41 -0500 (EST) Message-ID: <4B90C921.6060908@kernel.org> Date: Fri, 05 Mar 2010 01:04:33 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> In-Reply-To: <20100305032106.GA12065@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner , Jiri Slaby Cc: Greg Thelen , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Andrew Morton List-ID: On 03/04/2010 07:21 PM, Johannes Weiner wrote: > Hello Greg, > > On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >> On several systems I am seeing a boot panic if I use mmotm >> (stamp-2010-03-02-18-38). If I remove >> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >> find that: >> * 2.6.33 boots fine. >> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >> * 2.6.33 + mmotm (including >> bootmem-avoid-dma32-zone-by-default.patch): panics. >> Note: I had to enable earlyprintk to see the panic. Without >> earlyprintk no console output was seen. The system appeared to hang >> after the loader. > > where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate > the mem_section descriptor with bootmem. If this would fail, the box > would panic immediately earlier, but NO_BOOTMEM does not seem to get it > right. > > Greg, could you retry _with_ my bootmem patch applied, but with setting > CONFIG_NO_BOOTMEM=n up front? > > I think NO_BOOTMEM has several problems. Yinghai, can you verify them? ... > > 1. It does not seem to handle goal appropriately: bootmem would try > without the goal if it does not make sense. And in this case, the > goal is 4G (above DMA32) and the amount of memory is 256M. > > And if I did not miss something, this is the difference with my patch: > without it, the default goal is 16M, which is no problem as it is well > within your available memory. But the change of the default goal moved > it outside it which the bootmem replacement can not handle. > > 2. The early reservation stuff seems to return NULL but callsites assume > that the bootmem interface never does that. Okay, the result is the same, > we crash. But it still moves error reporting to a possibly much later > point where somebody actually dereferences the returned pointer. under CONFIG_NO_BOOTMEM for alloc_bootmem_node it will honor goal, if someone input big goal it will not fallback to get a small one below that goal. return NULL, could make caller have more choice and more control. anyway we should honor the goal, otherwise should use _nopanic instead. according to context http://patchwork.kernel.org/patch/73893/ Jiri, please check current linus tree still have problem about mem_map is using that much low mem? on my 1024g system first node has 128G ram, [2g, 4g) are mmio range. with NO_BOOTMEM [ 0.000000] a - 11 [ 0.000000] 19 40 - 80 95 [ 0.000000] 702 740 - 1000 1000 [ 0.000000] 331f 3340 - 3400 3400 [ 0.000000] 35dd - 3600 [ 0.000000] 37dd - 3800 [ 0.000000] 39dd - 3a00 [ 0.000000] 3bdd - 3c00 [ 0.000000] 3ddd - 3e00 [ 0.000000] 3fdd - 4000 [ 0.000000] 41dd - 4200 [ 0.000000] 43dd - 4400 [ 0.000000] 45dd - 4600 [ 0.000000] 47dd - 4800 [ 0.000000] 49dd - 4a00 [ 0.000000] 4bdd - 4c00 [ 0.000000] 4ddd - 4e00 [ 0.000000] 4fdd - 5000 [ 0.000000] 51dd - 5200 [ 0.000000] 93dd 9400 - 7d500 7d53b [ 0.000000] 7f730 - 7f750 [ 0.000000] 100012 100040 - 100200 100200 [ 0.000000] 170200 170200 - 2080000 2080000 [ 0.000000] 2080065 2080080 - 2080200 2080200 so PFN: 9400 - 7d500 are free. without NO_BOOTMEM [ 0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1 [ 0.000000] free [0x000000000a - 0x0000000095] [ 0.000000] free [0x0000000702 - 0x0000001000] [ 0.000000] free [0x00000032c4 - 0x0000003400] [ 0.000000] free [0x00000035de - 0x0000003600] [ 0.000000] free [0x00000037dd - 0x0000003800] [ 0.000000] free [0x00000039dd - 0x0000003a00] [ 0.000000] free [0x0000003bdd - 0x0000003c00] [ 0.000000] free [0x0000003ddd - 0x0000003e00] [ 0.000000] free [0x0000003fdd - 0x0000004000] [ 0.000000] free [0x00000041dd - 0x0000004200] [ 0.000000] free [0x00000043dd - 0x0000004400] [ 0.000000] free [0x00000045dd - 0x0000004600] [ 0.000000] free [0x00000047dd - 0x0000004800] [ 0.000000] free [0x00000049dd - 0x0000004a00] [ 0.000000] free [0x0000004bdd - 0x0000004c00] [ 0.000000] free [0x0000004ddd - 0x0000004e00] [ 0.000000] free [0x0000004fdd - 0x0000005000] [ 0.000000] free [0x00000051dd - 0x0000005200] [ 0.000000] free [0x00000053dd - 0x000007d53b] [ 0.000000] free [0x000007f730 - 0x000007f750] [ 0.000000] free [0x000010041f - 0x0000100a00] [ 0.000000] free [0x0000170a00 - 0x0000180a00] [ 0.000000] free [0x0000180a03 - 0x0002080000] so pfn: 53dd 7d53b are free looks like we don't need to change the default goal in alloc_bootmem_node. YH -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id D2DAA6B0047 for ; Fri, 5 Mar 2010 05:26:00 -0500 (EST) Received: by fg-out-1718.google.com with SMTP id 19so136116fgg.8 for ; Fri, 05 Mar 2010 02:26:06 -0800 (PST) Message-ID: <4B90DC3C.1060000@gmail.com> Date: Fri, 05 Mar 2010 11:26:04 +0100 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> In-Reply-To: <4B90C921.6060908@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Yinghai Lu Cc: Johannes Weiner , Greg Thelen , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Andrew Morton List-ID: On 03/05/2010 10:04 AM, Yinghai Lu wrote: > according to context > http://patchwork.kernel.org/patch/73893/ > > Jiri, > please check current linus tree still have problem about mem_map is using that much low mem? Hi! Sorry, I don't have direct access to the machine. I might try to ask the owners to do so. > on my 1024g system first node has 128G ram, [2g, 4g) are mmio range. So where gets your mem_map allocated (I suppose you're running flat model)? Note that the failure we were seeing was with different amount of memory on different machines. Obviously because of different e820 reservations and driver requirements at boot time. So the required memory to trigger the error oscillated around 128G, sometimes being 130G. It triggered when mem_map fit exactly into 0-2G (and 2-4G was reserved) and no more space was there. If RAM was more than 130G, mem_map was above 4G boundary implicitly, so that there was enough space in the first 4G of memory for others with specific bootmem limitations. > with NO_BOOTMEM > [ 0.000000] a - 11 > [ 0.000000] 19 40 - 80 95 > [ 0.000000] 702 740 - 1000 1000 > [ 0.000000] 331f 3340 - 3400 3400 > [ 0.000000] 35dd - 3600 > [ 0.000000] 37dd - 3800 > [ 0.000000] 39dd - 3a00 > [ 0.000000] 3bdd - 3c00 > [ 0.000000] 3ddd - 3e00 > [ 0.000000] 3fdd - 4000 > [ 0.000000] 41dd - 4200 > [ 0.000000] 43dd - 4400 > [ 0.000000] 45dd - 4600 > [ 0.000000] 47dd - 4800 > [ 0.000000] 49dd - 4a00 > [ 0.000000] 4bdd - 4c00 > [ 0.000000] 4ddd - 4e00 > [ 0.000000] 4fdd - 5000 > [ 0.000000] 51dd - 5200 > [ 0.000000] 93dd 9400 - 7d500 7d53b > [ 0.000000] 7f730 - 7f750 > [ 0.000000] 100012 100040 - 100200 100200 > [ 0.000000] 170200 170200 - 2080000 2080000 > [ 0.000000] 2080065 2080080 - 2080200 2080200 > > so PFN: 9400 - 7d500 are free. Could you explain more the dmesg output? > without NO_BOOTMEM > [ 0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1 > [ 0.000000] free [0x000000000a - 0x0000000095] > [ 0.000000] free [0x0000000702 - 0x0000001000] > [ 0.000000] free [0x00000032c4 - 0x0000003400] > [ 0.000000] free [0x00000035de - 0x0000003600] > [ 0.000000] free [0x00000037dd - 0x0000003800] > [ 0.000000] free [0x00000039dd - 0x0000003a00] > [ 0.000000] free [0x0000003bdd - 0x0000003c00] > [ 0.000000] free [0x0000003ddd - 0x0000003e00] > [ 0.000000] free [0x0000003fdd - 0x0000004000] > [ 0.000000] free [0x00000041dd - 0x0000004200] > [ 0.000000] free [0x00000043dd - 0x0000004400] > [ 0.000000] free [0x00000045dd - 0x0000004600] > [ 0.000000] free [0x00000047dd - 0x0000004800] > [ 0.000000] free [0x00000049dd - 0x0000004a00] > [ 0.000000] free [0x0000004bdd - 0x0000004c00] > [ 0.000000] free [0x0000004ddd - 0x0000004e00] > [ 0.000000] free [0x0000004fdd - 0x0000005000] > [ 0.000000] free [0x00000051dd - 0x0000005200] > [ 0.000000] free [0x00000053dd - 0x000007d53b] > [ 0.000000] free [0x000007f730 - 0x000007f750] > [ 0.000000] free [0x000010041f - 0x0000100a00] > [ 0.000000] free [0x0000170a00 - 0x0000180a00] > [ 0.000000] free [0x0000180a03 - 0x0002080000] > so pfn: 53dd 7d53b are free > > looks like we don't need to change the default goal in alloc_bootmem_node. thanks, -- js -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id A2D2E6B004D for ; Fri, 5 Mar 2010 07:51:48 -0500 (EST) Date: Fri, 5 Mar 2010 13:51:39 +0100 From: Johannes Weiner Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Message-ID: <20100305125139.GA13726@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B908FF3.5000303@kernel.org> <4B909327.2030702@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B909327.2030702@kernel.org> Sender: owner-linux-mm@kvack.org To: Yinghai Lu Cc: Greg Thelen , linux-mm@kvack.org List-ID: Hi, On Thu, Mar 04, 2010 at 09:14:15PM -0800, Yinghai Lu wrote: > On 03/04/2010 09:00 PM, Yinghai Lu wrote: > > On 03/04/2010 07:21 PM, Johannes Weiner wrote: > >> Hello Greg, > >> > >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: > >>> On several systems I am seeing a boot panic if I use mmotm > >>> (stamp-2010-03-02-18-38). If I remove > >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I > >>> find that: > >>> * 2.6.33 boots fine. > >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. > >>> * 2.6.33 + mmotm (including > >>> bootmem-avoid-dma32-zone-by-default.patch): panics. > >>> Note: I had to enable earlyprintk to see the panic. Without > >>> earlyprintk no console output was seen. The system appeared to hang > >>> after the loader. > >> > >> Thanks for your report. A few notes below. > >> > >>> Here's the panic seen with earlyprintk using 2.6.33 + mmotm: > >>> > >>> Starting up ... > >>> [ 0.000000] Initializing cgroup subsys cpuset > >>> [ 0.000000] Initializing cgroup subsys cpu > >>> [ 0.000000] Linux version 2.6.33-mm1+ > >>> (gthelen@ninji.mtv.corp.google.com) (gcc version 4.2.4 (Ubuntu > >>> 4.2.4-1ubuntu4)) #1 SMP Thu Mar 4 12:03:29 PST 2010 > >>> [ 0.000000] Command line: > >>> root=UUID=a77f406a-7cc7-4f49-9cc2-818b2b4159ae ro console=tty0 > >>> console=ttyS0,115200n8 earlyprintk=serial,ttyS0,9600 > >>> [ 0.000000] BIOS-provided physical RAM map: > >>> [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) > >>> [ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) > >>> [ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) > >>> [ 0.000000] BIOS-e820: 0000000000100000 - 000000000fff0000 (usable) > >>> [ 0.000000] BIOS-e820: 000000000fff0000 - 0000000010000000 (ACPI data) > >>> [ 0.000000] BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved) > >>> [ 0.000000] bootconsole [earlyser0] enabled > >>> [ 0.000000] NX (Execute Disable) protection: active > >>> [ 0.000000] DMI 2.4 present. > >>> [ 0.000000] No AGP bridge found > >>> [ 0.000000] last_pfn = 0xfff0 max_arch_pfn = 0x400000000 > >>> [ 0.000000] PAT not supported by CPU. > >>> [ 0.000000] CPU MTRRs all blank - virtualized system. > >>> [ 0.000000] Scanning 1 areas for low memory corruption > >>> [ 0.000000] modified physical RAM map: > >>> [ 0.000000] modified: 0000000000000000 - 0000000000010000 (reserved) > >>> [ 0.000000] modified: 0000000000010000 - 000000000009fc00 (usable) > >>> [ 0.000000] modified: 000000000009fc00 - 00000000000a0000 (reserved) > >>> [ 0.000000] modified: 00000000000e8000 - 0000000000100000 (reserved) > >>> [ 0.000000] modified: 0000000000100000 - 000000000fff0000 (usable) > >>> [ 0.000000] modified: 000000000fff0000 - 0000000010000000 (ACPI data) > >>> [ 0.000000] modified: 00000000fffbd000 - 0000000100000000 (reserved) > >>> [ 0.000000] init_memory_mapping: 0000000000000000-000000000fff0000 > >> > >> 256MB of memory, right? > >> > >>> [ 0.000000] RAMDISK: 0fd9d000 - 0ffdf539 > >>> [ 0.000000] ACPI: RSDP 00000000000fb450 00014 (v00 QEMU ) > >>> [ 0.000000] ACPI: RSDT 000000000fff0000 00030 (v01 QEMU QEMURSDT > >>> 00000001 QEMU 00000001) > >>> [ 0.000000] ACPI: FACP 000000000fff0030 00074 (v01 QEMU QEMUFACP > >>> 00000001 QEMU 00000001) > >>> [ 0.000000] ACPI: DSDT 000000000fff0100 0089D (v01 BXPC BXDSDT > >>> 00000001 INTL 20061109) > >>> [ 0.000000] ACPI: FACS 000000000fff00c0 00040 > >>> [ 0.000000] ACPI: APIC 000000000fff09d8 00068 (v01 QEMU QEMUAPIC > >>> 00000001 QEMU 00000001) > >>> [ 0.000000] ACPI: SSDT 000000000fff099d 00037 (v01 QEMU QEMUSSDT > >>> 00000001 QEMU 00000001) > >>> [ 0.000000] No NUMA configuration found > >>> [ 0.000000] Faking a node at 0000000000000000-000000000fff0000 > >>> [ 0.000000] Initmem setup node 0 0000000000000000-000000000fff0000 > >>> [ 0.000000] NODE_DATA [0000000001c4e040 - 0000000001c5303f] > >>> [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) > >>> [ 0.000000] IP: [] memory_present+0x9a/0xbf > >>> [ 0.000000] PGD 0 > >>> [ 0.000000] Oops: 0000 [#1] SMP > >>> [ 0.000000] last sysfs file: > >>> [ 0.000000] CPU 0 > >>> [ 0.000000] Modules linked in: > >>> [ 0.000000] > >>> [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.33-mm1+ #1 / > >>> [ 0.000000] RIP: 0010:[] [] > >>> memory_present+0x9a/0xbf > >>> [ 0.000000] RSP: 0000:ffffffff81a01e18 EFLAGS: 00010046 > >>> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 > >>> [ 0.000000] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 > >>> [ 0.000000] RBP: ffffffff81a01e58 R08: ffffffffffffffff R09: 0000000000000040 > >>> [ 0.000000] R10: ffff880001c4e040 R11: 0000000000004100 R12: 0000000000000000 > >>> [ 0.000000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 > >>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81adf000(0000) > >>> knlGS:0000000000000000 > >>> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 0.000000] CR2: 0000000000000000 CR3: 0000000001a08000 CR4: 00000000000000b0 > >>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >>> [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81a00000, > >>> task ffffffff81a10020) > >>> [ 0.000000] Stack: > >>> [ 0.000000] 000000000fff0000 000000000000009f 0000000000000000 > >>> 0000000000000000 > >>> [ 0.000000] <0> 0000000000000040 ffffffff81a01ef8 0000000000000000 > >>> 0000000000000000 > >>> [ 0.000000] <0> ffffffff81a01e78 ffffffff81b0dd0e ffffffff81a01e88 > >>> 000000000fff0000 > >>> [ 0.000000] Call Trace: > >>> [ 0.000000] [] > >>> sparse_memory_present_with_active_regions+0x31/0x47 > >>> [ 0.000000] [] paging_init+0x3f/0x5b > >>> [ 0.000000] [] setup_arch+0x964/0xa03 > >>> [ 0.000000] [] ? need_resched+0x1e/0x28 > >>> [ 0.000000] [] ? should_resched+0x9/0x2a > >>> [ 0.000000] [] ? _cond_resched+0x9/0x1d > >>> [ 0.000000] [] start_kernel+0x9f/0x382 > >>> [ 0.000000] [] x86_64_start_reservations+0xa9/0xad > >>> [ 0.000000] [] x86_64_start_kernel+0xe6/0xed > >>> [ 0.000000] Code: c7 00 56 c2 81 e8 a0 f9 a1 ff 48 83 3c dd 00 16 > >>> c2 81 00 75 08 4c 89 2c dd 00 16 c2 81 fe 05 11 60 11 00 4c 89 ff e8 > >>> 85 3b 5c ff <48> 83 38 00 75 03 4c 89 30 49 81 c4 00 80 00 00 4c 3b 65 > >>> c8 72 > >>> [ 0.000000] RIP [] memory_present+0x9a/0xbf > >>> [ 0.000000] RSP > >>> [ 0.000000] CR2: 0000000000000000 > >>> [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- > >>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > >>> [ 0.000000] Pid: 0, comm: swapper Tainted: G D 2.6.33-mm1+ #1 > >>> [ 0.000000] Call Trace: > >>> [ 0.000000] [] panic+0x9e/0x113 > >>> [ 0.000000] [] ? printk+0x67/0x69 > >>> [ 0.000000] [] ? blocking_notifier_call_chain+0xf/0x11 > >>> [ 0.000000] [] do_exit+0x78/0x70f > >>> [ 0.000000] [] ? spin_unlock_irqrestore+0x9/0xb > >>> [ 0.000000] [] ? kmsg_dump+0x112/0x138 > >>> [ 0.000000] [] oops_end+0xb2/0xba > >>> [ 0.000000] [] no_context+0x1f5/0x204 > >>> [ 0.000000] [] __bad_area_nosemaphore+0x17f/0x1a2 > >>> [ 0.000000] [] bad_area_nosemaphore+0xe/0x10 > >>> [ 0.000000] [] do_page_fault+0x122/0x24c > >>> [ 0.000000] [] page_fault+0x1f/0x30 > >>> [ 0.000000] [] ? memory_present+0x9a/0xbf > >>> [ 0.000000] [] ? memory_present+0x9a/0xbf > >>> [ 0.000000] [] > >>> sparse_memory_present_with_active_regions+0x31/0x47 > >>> [ 0.000000] [] paging_init+0x3f/0x5b > >>> [ 0.000000] [] setup_arch+0x964/0xa03 > >>> [ 0.000000] [] ? need_resched+0x1e/0x28 > >>> [ 0.000000] [] ? should_resched+0x9/0x2a > >>> [ 0.000000] [] ? _cond_resched+0x9/0x1d > >>> [ 0.000000] [] start_kernel+0x9f/0x382 > >>> [ 0.000000] [] x86_64_start_reservations+0xa9/0xad > >>> [ 0.000000] [] x86_64_start_kernel+0xe6/0xed > >>> > >>> The kernel was built with 'make mrproper && make defconfig && make > >>> ARCH=x86_64 CONFIG=smp -j 6'. This panic is seen on every attempt, so > >>> I can provide more diagnostics. > >> > >> Okay, if you did defconfig and just hit enter to all questions, you > >> should have SPARSEMEM_EXTREME and NO_BOOTMEM enabled. This means that > >> the 'mem_section' is an array of pointers and the following happens in > >> memory_present(): > >> > >> for_one_pfn_in_each_section() { > >> sparse_index_init(); /* no return value check */ > >> ms = __nr_to_section(); > >> if (!ms->section_mem_map) /* bang */ > >> ...; > >> } > >> > >> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate > >> the mem_section descriptor with bootmem. If this would fail, the box > >> would panic immediately earlier, but NO_BOOTMEM does not seem to get it > >> right. > >> > >> Greg, could you retry _with_ my bootmem patch applied, but with setting > >> CONFIG_NO_BOOTMEM=n up front? > >> > >> I think NO_BOOTMEM has several problems. Yinghai, can you verify them? > >> > >> 1. It does not seem to handle goal appropriately: bootmem would try > >> without the goal if it does not make sense. And in this case, the > >> goal is 4G (above DMA32) and the amount of memory is 256M. > >> > >> And if I did not miss something, this is the difference with my patch: > >> without it, the default goal is 16M, which is no problem as it is well > >> within your available memory. But the change of the default goal moved > >> it outside it which the bootmem replacement can not handle. > >> > >> 2. The early reservation stuff seems to return NULL but callsites assume > >> that the bootmem interface never does that. Okay, the result is the same, > >> we crash. But it still moves error reporting to a possibly much later > >> point where somebody actually dereferences the returned pointer. > > > > related change could be: __alloc_bootmem_node_high... > > no should be here... > > static struct mem_section noinline __init_refok *sparse_index_alloc(int nid) > { > struct mem_section *section = NULL; > unsigned long array_size = SECTIONS_PER_ROOT * > sizeof(struct mem_section); > > if (slab_is_available()) { > if (node_state(nid, N_HIGH_MEMORY)) > section = kmalloc_node(array_size, GFP_KERNEL, nid); > else > section = kmalloc(array_size, GFP_KERNEL); > } else > section = alloc_bootmem_node(NODE_DATA(nid), array_size); > > and > > #define alloc_bootmem_node(pgdat, x) \ > __alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS)) > > > then you change that goal MAX_DMA_ADDRESS to 4g..., but the system only have 256M and alloc_bootmem_core() will handle it. The principle of the default goal is: if you have memory outside the DMA zone, use that if possible. If not, just use what's there. So increasing the default goal to above the DMA32 zone and falling back if not possible is a sensible change in itself. Replacing the bootmem API implementation with something incompatible is NOT a sensible change, however. You have to do the fallback or review all callers and make sure they conform to your new semantics. My patch just shows that with common machines: those with <=4G of memory but you already broke uncommon machines without my patch, those with <=16M of memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 8E4E26B0087 for ; Fri, 5 Mar 2010 08:08:50 -0500 (EST) Date: Fri, 5 Mar 2010 14:08:34 +0100 From: Johannes Weiner Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Message-ID: <20100305130834.GB13726@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B90C921.6060908@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B90C921.6060908@kernel.org> Sender: owner-linux-mm@kvack.org To: Yinghai Lu Cc: Jiri Slaby , Greg Thelen , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Andrew Morton List-ID: On Fri, Mar 05, 2010 at 01:04:33AM -0800, Yinghai Lu wrote: > On 03/04/2010 07:21 PM, Johannes Weiner wrote: > > Hello Greg, > > > > On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: > >> On several systems I am seeing a boot panic if I use mmotm > >> (stamp-2010-03-02-18-38). If I remove > >> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I > >> find that: > >> * 2.6.33 boots fine. > >> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. > >> * 2.6.33 + mmotm (including > >> bootmem-avoid-dma32-zone-by-default.patch): panics. > >> Note: I had to enable earlyprintk to see the panic. Without > >> earlyprintk no console output was seen. The system appeared to hang > >> after the loader. > > > > where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate > > the mem_section descriptor with bootmem. If this would fail, the box > > would panic immediately earlier, but NO_BOOTMEM does not seem to get it > > right. > > > > Greg, could you retry _with_ my bootmem patch applied, but with setting > > CONFIG_NO_BOOTMEM=n up front? > > > > I think NO_BOOTMEM has several problems. Yinghai, can you verify them? > ... > > > > 1. It does not seem to handle goal appropriately: bootmem would try > > without the goal if it does not make sense. And in this case, the > > goal is 4G (above DMA32) and the amount of memory is 256M. > > > > And if I did not miss something, this is the difference with my patch: > > without it, the default goal is 16M, which is no problem as it is well > > within your available memory. But the change of the default goal moved > > it outside it which the bootmem replacement can not handle. > > > > 2. The early reservation stuff seems to return NULL but callsites assume > > that the bootmem interface never does that. Okay, the result is the same, > > we crash. But it still moves error reporting to a possibly much later > > point where somebody actually dereferences the returned pointer. > > under CONFIG_NO_BOOTMEM > for alloc_bootmem_node it will honor goal, if someone input big goal it will not > fallback to get a small one below that goal. Yes, that's the problem. > return NULL, could make caller have more choice and more control. Most callers do not need it as there is no real way to handle allocation failures at this point of time in the boot process. For everything else, there is the _nopanic API. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id 819B96B0047 for ; Fri, 5 Mar 2010 11:38:28 -0500 (EST) References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <4B908FF3.5000303@kernel.org> <4B909327.2030702@kernel.org> <20100305125139.GA13726@cmpxchg.org> Message-Id: <7EE97A35-A262-4040-BA2A-7F7A8347E8D5@kernel.org> From: Yinghai In-Reply-To: <20100305125139.GA13726@cmpxchg.org> Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (iPhone Mail 7D11) Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Date: Fri, 5 Mar 2010 08:38:02 -0800 Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: Greg Thelen , "linux-mm@kvack.org" List-ID: On Mar 5, 2010, at 4:51 AM, Johannes Weiner wrote: > > My patch just shows that with common machines: those with <=4G of > memory > but you already broke uncommon machines without my patch, those with > <=16M of memory. Ok Will fix it -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id BB5BB6B0047 for ; Fri, 5 Mar 2010 13:43:18 -0500 (EST) Message-ID: <4B915074.4020704@kernel.org> Date: Fri, 05 Mar 2010 10:41:56 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> In-Reply-To: <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar Cc: Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: On 03/04/2010 09:17 PM, Greg Thelen wrote: > On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>> On several systems I am seeing a boot panic if I use mmotm >>> (stamp-2010-03-02-18-38). If I remove >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >>> find that: >>> * 2.6.33 boots fine. >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >>> * 2.6.33 + mmotm (including >>> bootmem-avoid-dma32-zone-by-default.patch): panics. ... > > Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I > re-tested with 'make defconfig' to confirm the panic with this later > mmotm. please check [PATCH] early_res: double check with updated goal in alloc_memory_core_early Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node change the behavoir about goal. original bootmem one will try go further regardless of goal. and it will break his patch about default goal from MAX_DMA to MAX_DMA32... also broke uncommon machines with <=16M of memory. (really? our x86 kernel still can run on 16M system?) so try again with update goal. Reported-by: Greg Thelen Signed-off-by: Yinghai Lu --- mm/bootmem.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) Index: linux-2.6/mm/bootmem.c =================================================================== --- linux-2.6.orig/mm/bootmem.c +++ linux-2.6/mm/bootmem.c @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l } #ifdef CONFIG_NO_BOOTMEM +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size, + u64 align, u64 goal, u64 limit) +{ + void *ptr; + unsigned long end_pfn; + + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, + goal, limit); + if (ptr) + return ptr; + + /* check goal according */ + end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; + if ((end_pfn << PAGE_SHIFT) < (goal + size)) { + goal = pgdat->node_start_pfn << PAGE_SHIFT; + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, + goal, limit); + } + + return ptr; +} + static void __init __free_pages_memory(unsigned long start, unsigned long end) { int i; @@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); #ifdef CONFIG_NO_BOOTMEM - return __alloc_memory_core_early(pgdat->node_id, size, align, + return ___alloc_memory_core_early(pgdat, size, align, goal, -1ULL); #else return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0); @@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); #ifdef CONFIG_NO_BOOTMEM - ptr = __alloc_memory_core_early(pgdat->node_id, size, align, + ptr = ___alloc_memory_core_early(pgdat, size, align, goal, -1ULL); #else ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0); @@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); #ifdef CONFIG_NO_BOOTMEM - return __alloc_memory_core_early(pgdat->node_id, size, align, + return ___alloc_memory_core_early(pgdat, size, align, goal, ARCH_LOW_ADDRESS_LIMIT); #else return ___alloc_bootmem_node(pgdat->bdata, size, align, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with ESMTP id 7B1AE6B0047 for ; Fri, 5 Mar 2010 14:10:08 -0500 (EST) Received: from wpaz21.hot.corp.google.com (wpaz21.hot.corp.google.com [172.24.198.85]) by smtp-out.google.com with ESMTP id o25JA3i7007977 for ; Fri, 5 Mar 2010 19:10:03 GMT Received: from pxi27 (pxi27.prod.google.com [10.243.27.27]) by wpaz21.hot.corp.google.com with ESMTP id o25JA1oT028920 for ; Fri, 5 Mar 2010 11:10:02 -0800 Received: by pxi27 with SMTP id 27so157131pxi.31 for ; Fri, 05 Mar 2010 11:10:01 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <4B915074.4020704@kernel.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> From: Greg Thelen Date: Fri, 5 Mar 2010 11:09:41 -0800 Message-ID: <49b004811003051109t3215f86dy280a6317bdab9b15@mail.gmail.com> Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Yinghai Lu Cc: Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: On Fri, Mar 5, 2010 at 10:41 AM, Yinghai Lu wrote: > On 03/04/2010 09:17 PM, Greg Thelen wrote: >> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wro= te: >>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>>> On several systems I am seeing a boot panic if I use mmotm >>>> (stamp-2010-03-02-18-38). =A0If I remove >>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. =A0I >>>> find that: >>>> * 2.6.33 boots fine. >>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots = fine. >>>> * 2.6.33 + mmotm (including >>>> bootmem-avoid-dma32-zone-by-default.patch): panics. > ... >> >> Note: mmotm has been recently updated to stamp-2010-03-04-18-05. =A0I >> re-tested with 'make defconfig' to confirm the panic with this later >> mmotm. > > please check > > [PATCH] early_res: double check with updated goal in alloc_memory_core_ea= rly > > Johannes Weiner pointed out that new early_res replacement for alloc_boot= mem_node > change the behavoir about goal. > original bootmem one will try go further regardless of goal. > > and it will break his patch about default goal from MAX_DMA to MAX_DMA32.= .. > also broke uncommon machines with <=3D16M of memory. > (really? our x86 kernel still can run on 16M system?) > > so try again with update goal. > > Reported-by: Greg Thelen > Signed-off-by: Yinghai Lu > > --- > =A0mm/bootmem.c | =A0 28 +++++++++++++++++++++++++--- > =A01 file changed, 25 insertions(+), 3 deletions(-) > > Index: linux-2.6/mm/bootmem.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- linux-2.6.orig/mm/bootmem.c > +++ linux-2.6/mm/bootmem.c > @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l > =A0} > > =A0#ifdef CONFIG_NO_BOOTMEM > +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 si= ze, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0u64 align, u64 goal, u64 limit) > +{ > + =A0 =A0 =A0 void *ptr; > + =A0 =A0 =A0 unsigned long end_pfn; > + > + =A0 =A0 =A0 ptr =3D __alloc_memory_core_early(pgdat->node_id, size, ali= gn, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0goal, limit); > + =A0 =A0 =A0 if (ptr) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return ptr; > + > + =A0 =A0 =A0 /* check goal according =A0*/ > + =A0 =A0 =A0 end_pfn =3D pgdat->node_start_pfn + pgdat->node_spanned_pag= es; > + =A0 =A0 =A0 if ((end_pfn << PAGE_SHIFT) < (goal + size)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 goal =3D pgdat->node_start_pfn << PAGE_SHIF= T; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ptr =3D __alloc_memory_core_early(pgdat->no= de_id, size, align, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0goal, limit); > + =A0 =A0 =A0 } > + > + =A0 =A0 =A0 return ptr; > +} > + > =A0static void __init __free_pages_memory(unsigned long start, unsigned l= ong end) > =A0{ > =A0 =A0 =A0 =A0int i; > @@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return kzalloc_node(size, GFP_NOWAIT, pgda= t->node_id); > > =A0#ifdef CONFIG_NO_BOOTMEM > - =A0 =A0 =A0 return __alloc_memory_core_early(pgdat->node_id, size, alig= n, > + =A0 =A0 =A0 return =A0___alloc_memory_core_early(pgdat, size, align, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 goal, -1ULL); > =A0#else > =A0 =A0 =A0 =A0return ___alloc_bootmem_node(pgdat->bdata, size, align, go= al, 0); > @@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return kzalloc_node(size, GFP_NOWAIT, pgda= t->node_id); > > =A0#ifdef CONFIG_NO_BOOTMEM > - =A0 =A0 =A0 ptr =3D =A0__alloc_memory_core_early(pgdat->node_id, size, = align, > + =A0 =A0 =A0 ptr =3D =A0___alloc_memory_core_early(pgdat, size, align, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 goal, -1ULL); > =A0#else > =A0 =A0 =A0 =A0ptr =3D alloc_arch_preferred_bootmem(pgdat->bdata, size, a= lign, goal, 0); > @@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return kzalloc_node(size, GFP_NOWAIT, pgda= t->node_id); > > =A0#ifdef CONFIG_NO_BOOTMEM > - =A0 =A0 =A0 return __alloc_memory_core_early(pgdat->node_id, size, alig= n, > + =A0 =A0 =A0 return ___alloc_memory_core_early(pgdat, size, align, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goal, ARCH= _LOW_ADDRESS_LIMIT); > =A0#else > =A0 =A0 =A0 =A0return ___alloc_bootmem_node(pgdat->bdata, size, align, > On my 256MB VM, which detected the problem starting this thread, the "double check with updated goal in alloc_memory_core_early" patch (above) boots without panic. My initial impression is that this fixes the reported problem. Note: I have not tested to see if any other issues are introduced. -- Greg -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 09EA76B0047 for ; Fri, 5 Mar 2010 15:39:47 -0500 (EST) Message-ID: <4B916BD6.8010701@kernel.org> Date: Fri, 05 Mar 2010 12:38:46 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: [PATCH] x86/bootmem: introduce bootmem_default_goal References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> In-Reply-To: <4B915074.4020704@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: if you don't want to drop | bootmem: avoid DMA32 zone by default today mainline tree actually DO NOT need that patch according to print out ... please apply this one too. [PATCH] x86/bootmem: introduce bootmem_default_goal don't punish the 64bit systems with less 4G RAM. they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... Signed-off-by: Yinghai Lu --- arch/x86/kernel/setup.c | 13 +++++++++++++ include/linux/bootmem.h | 3 ++- mm/bootmem.c | 4 ++++ 3 files changed, 19 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -686,6 +686,18 @@ static void __init trim_bios_range(void) sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map); } +#ifdef MAX_DMA32_PFN +static void __init set_bootmem_default_goal(void) +{ + if (max_pfn_mapped < MAX_DMA32_PFN) + bootmem_default_goal = __pa(MAX_DMA_ADDRESS); +} +#else +static void __init set_bootmem_default_goal(void) +{ +} +#endif + /* * Determine if we were loaded by an EFI loader. If so, then we have also been * passed the efi memmap, systab, etc., so we should use these data structures @@ -931,6 +943,7 @@ void __init setup_arch(char **cmdline_p) max_low_pfn = max_pfn; } #endif + set_bootmem_default_goal(); /* * NOTE: On x86-32, only from this point on, fixmaps are ready for use. Index: linux-2.6/include/linux/bootmem.h =================================================================== --- linux-2.6.orig/include/linux/bootmem.h +++ linux-2.6/include/linux/bootmem.h @@ -104,7 +104,8 @@ extern void *__alloc_bootmem_low_node(pg unsigned long goal); #ifdef MAX_DMA32_PFN -#define BOOTMEM_DEFAULT_GOAL (MAX_DMA32_PFN << PAGE_SHIFT) +extern unsigned long bootmem_default_goal; +#define BOOTMEM_DEFAULT_GOAL bootmem_default_goal #else #define BOOTMEM_DEFAULT_GOAL __pa(MAX_DMA_ADDRESS) #endif Index: linux-2.6/mm/bootmem.c =================================================================== --- linux-2.6.orig/mm/bootmem.c +++ linux-2.6/mm/bootmem.c @@ -25,6 +25,10 @@ unsigned long max_low_pfn; unsigned long min_low_pfn; unsigned long max_pfn; +#ifdef MAX_DMA32_PFN +unsigned long bootmem_default_goal = (MAX_DMA32_PFN << PAGE_SHIFT); +#endif + #ifdef CONFIG_CRASH_DUMP /* * If we have booted due to a crash, max_pfn will be a very low value. We need -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 30ACE6B0047 for ; Fri, 5 Mar 2010 18:58:47 -0500 (EST) Date: Sat, 6 Mar 2010 00:58:12 +0100 From: Johannes Weiner Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Message-ID: <20100305235812.GA15249@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B915074.4020704@kernel.org> Sender: owner-linux-mm@kvack.org To: Yinghai Lu Cc: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: Hello Yinghai, On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote: > On 03/04/2010 09:17 PM, Greg Thelen wrote: > > On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: > >> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: > >>> On several systems I am seeing a boot panic if I use mmotm > >>> (stamp-2010-03-02-18-38). If I remove > >>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I > >>> find that: > >>> * 2.6.33 boots fine. > >>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. > >>> * 2.6.33 + mmotm (including > >>> bootmem-avoid-dma32-zone-by-default.patch): panics. > ... > > > > Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I > > re-tested with 'make defconfig' to confirm the panic with this later > > mmotm. > > please check > > [PATCH] early_res: double check with updated goal in alloc_memory_core_early > > Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node > change the behavoir about goal. > original bootmem one will try go further regardless of goal. > > and it will break his patch about default goal from MAX_DMA to MAX_DMA32... > also broke uncommon machines with <=16M of memory. > (really? our x86 kernel still can run on 16M system?) > > so try again with update goal. Thanks for the patch, it seems to be correct. However, I have a more generic question about it, regarding the future of the early_res allocator. Did you plan on keeping the bootmem API for longer? Because my impression was, emulating it is a temporary measure until all users are gone and bootmem can be finally dropped. But then this would require some sort of handling of 'user does not need DMA[32] memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res allocator as well. I ask this specifically because you move this fix into the bootmem compatibility code while there is not yet a way to tell early_res the same thing, so switching a user that _needs_ to specify this requirement from bootmem to early_res is not yet possible, is it? > Reported-by: Greg Thelen > Signed-off-by: Yinghai Lu > > --- > mm/bootmem.c | 28 +++++++++++++++++++++++++--- > 1 file changed, 25 insertions(+), 3 deletions(-) > > Index: linux-2.6/mm/bootmem.c > =================================================================== > --- linux-2.6.orig/mm/bootmem.c > +++ linux-2.6/mm/bootmem.c > @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l > } > > #ifdef CONFIG_NO_BOOTMEM > +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size, > + u64 align, u64 goal, u64 limit) > +{ > + void *ptr; > + unsigned long end_pfn; > + > + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, > + goal, limit); > + if (ptr) > + return ptr; > + > + /* check goal according */ > + end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; > + if ((end_pfn << PAGE_SHIFT) < (goal + size)) { > + goal = pgdat->node_start_pfn << PAGE_SHIFT; > + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, > + goal, limit); > + } > + > + return ptr; I think it would make sense to move the parameter check before doing the allocation. Then you save the second call. And a second nitpick: naming the inner function __foo and the outer one ___foo seems confusing to me. Could you maybe rename the wrapper? bootmem_compat_alloc_early() or something like that? Thanks, Hannes -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 08AF56B0078 for ; Fri, 5 Mar 2010 20:51:56 -0500 (EST) Message-ID: <4B91B4EF.5090502@kernel.org> Date: Fri, 05 Mar 2010 17:50:39 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <20100305235812.GA15249@cmpxchg.org> In-Reply-To: <20100305235812.GA15249@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: On 03/05/2010 03:58 PM, Johannes Weiner wrote: > Hello Yinghai, > > On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote: >> On 03/04/2010 09:17 PM, Greg Thelen wrote: >>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: >>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>>>> On several systems I am seeing a boot panic if I use mmotm >>>>> (stamp-2010-03-02-18-38). If I remove >>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >>>>> find that: >>>>> * 2.6.33 boots fine. >>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >>>>> * 2.6.33 + mmotm (including >>>>> bootmem-avoid-dma32-zone-by-default.patch): panics. >> ... >>> >>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I >>> re-tested with 'make defconfig' to confirm the panic with this later >>> mmotm. >> >> please check >> >> [PATCH] early_res: double check with updated goal in alloc_memory_core_early >> >> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node >> change the behavoir about goal. >> original bootmem one will try go further regardless of goal. >> >> and it will break his patch about default goal from MAX_DMA to MAX_DMA32... >> also broke uncommon machines with <=16M of memory. >> (really? our x86 kernel still can run on 16M system?) >> >> so try again with update goal. > > Thanks for the patch, it seems to be correct. > > However, I have a more generic question about it, regarding the future of the > early_res allocator. > > Did you plan on keeping the bootmem API for longer? Because my impression was, > emulating it is a temporary measure until all users are gone and bootmem can > be finally dropped. that depends on every arch maintainer. user can compare them on x86 to check if... next step will be make fw_mem_map to generiaized and combine them with lmb. > > But then this would require some sort of handling of 'user does not need DMA[32] > memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res > allocator as well. > > I ask this specifically because you move this fix into the bootmem compatibility > code while there is not yet a way to tell early_res the same thing, so switching > a user that _needs_ to specify this requirement from bootmem to early_res is not > yet possible, is it? just let caller set the goal. > >> Reported-by: Greg Thelen >> Signed-off-by: Yinghai Lu >> >> --- >> mm/bootmem.c | 28 +++++++++++++++++++++++++--- >> 1 file changed, 25 insertions(+), 3 deletions(-) >> >> Index: linux-2.6/mm/bootmem.c >> =================================================================== >> --- linux-2.6.orig/mm/bootmem.c >> +++ linux-2.6/mm/bootmem.c >> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l >> } >> >> #ifdef CONFIG_NO_BOOTMEM >> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size, >> + u64 align, u64 goal, u64 limit) >> +{ >> + void *ptr; >> + unsigned long end_pfn; >> + >> + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, >> + goal, limit); >> + if (ptr) >> + return ptr; >> + >> + /* check goal according */ >> + end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; >> + if ((end_pfn << PAGE_SHIFT) < (goal + size)) { >> + goal = pgdat->node_start_pfn << PAGE_SHIFT; >> + ptr = __alloc_memory_core_early(pgdat->node_id, size, align, >> + goal, limit); >> + } >> + >> + return ptr; > > I think it would make sense to move the parameter check before doing the > allocation. Then you save the second call. I am trying to avoid the second call. please check another patch about "introduce bootmem_default_goal : don't punish 64bit system without 4g ram" > > And a second nitpick: naming the inner function __foo and the outer one ___foo seems > confusing to me. Could you maybe rename the wrapper? bootmem_compat_alloc_early() or > something like that? ok. Thanks Yinghai -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 8196F6B007B for ; Fri, 5 Mar 2010 21:24:39 -0500 (EST) Date: Sat, 6 Mar 2010 03:24:15 +0100 From: Johannes Weiner Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Message-ID: <20100306022415.GB16967@cmpxchg.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <20100305235812.GA15249@cmpxchg.org> <4B91B4EF.5090502@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B91B4EF.5090502@kernel.org> Sender: owner-linux-mm@kvack.org To: Yinghai Lu Cc: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: On Fri, Mar 05, 2010 at 05:50:39PM -0800, Yinghai Lu wrote: > On 03/05/2010 03:58 PM, Johannes Weiner wrote: > > Hello Yinghai, > > > > On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote: > >> On 03/04/2010 09:17 PM, Greg Thelen wrote: > >>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: > >>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: > >>>>> On several systems I am seeing a boot panic if I use mmotm > >>>>> (stamp-2010-03-02-18-38). If I remove > >>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I > >>>>> find that: > >>>>> * 2.6.33 boots fine. > >>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. > >>>>> * 2.6.33 + mmotm (including > >>>>> bootmem-avoid-dma32-zone-by-default.patch): panics. > >> ... > >>> > >>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I > >>> re-tested with 'make defconfig' to confirm the panic with this later > >>> mmotm. > >> > >> please check > >> > >> [PATCH] early_res: double check with updated goal in alloc_memory_core_early > >> > >> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node > >> change the behavoir about goal. > >> original bootmem one will try go further regardless of goal. > >> > >> and it will break his patch about default goal from MAX_DMA to MAX_DMA32... > >> also broke uncommon machines with <=16M of memory. > >> (really? our x86 kernel still can run on 16M system?) > >> > >> so try again with update goal. > > > > Thanks for the patch, it seems to be correct. > > > > However, I have a more generic question about it, regarding the future of the > > early_res allocator. > > > > Did you plan on keeping the bootmem API for longer? Because my impression was, > > emulating it is a temporary measure until all users are gone and bootmem can > > be finally dropped. > > that depends on every arch maintainer. > > user can compare them on x86 to check if... Humm, now that is a bit disappointing. Because it means we will never get rid of bootmem as long as it works for the other architectures. And your changeset just added ~900 lines of code, some of it being a rather ugly compatibility layer in bootmem that I hoped could go away again sooner than later. I do not know what the upsides for x86 are from no longer using bootmem but it would suck from a code maintainance point of view to get stuck half way through this transition and have now TWO implementations of the bootmem interface we would like to get rid of. > next step will be make fw_mem_map to generiaized and combine them with lmb. > > > > > But then this would require some sort of handling of 'user does not need DMA[32] > > memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res > > allocator as well. > > > > I ask this specifically because you move this fix into the bootmem compatibility > > code while there is not yet a way to tell early_res the same thing, so switching > > a user that _needs_ to specify this requirement from bootmem to early_res is not > > yet possible, is it? > > just let caller set the goal. That means that every caller must be aware of where the DMA zone ends and if it is non-empty and open-code the fallback to the DMA zone if the non-DMA zone is exhausted? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 233FB6B0047 for ; Fri, 5 Mar 2010 21:33:08 -0500 (EST) Message-ID: <4B91BE93.8000401@kernel.org> Date: Fri, 05 Mar 2010 18:31:47 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <20100305235812.GA15249@cmpxchg.org> <4B91B4EF.5090502@kernel.org> <20100306022415.GB16967@cmpxchg.org> In-Reply-To: <20100306022415.GB16967@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Johannes Weiner Cc: Greg Thelen , Andrew Morton , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: On 03/05/2010 06:24 PM, Johannes Weiner wrote: > On Fri, Mar 05, 2010 at 05:50:39PM -0800, Yinghai Lu wrote: >> On 03/05/2010 03:58 PM, Johannes Weiner wrote: >>> Hello Yinghai, >>> >>> On Fri, Mar 05, 2010 at 10:41:56AM -0800, Yinghai Lu wrote: >>>> On 03/04/2010 09:17 PM, Greg Thelen wrote: >>>>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner wrote: >>>>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote: >>>>>>> On several systems I am seeing a boot panic if I use mmotm >>>>>>> (stamp-2010-03-02-18-38). If I remove >>>>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I >>>>>>> find that: >>>>>>> * 2.6.33 boots fine. >>>>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine. >>>>>>> * 2.6.33 + mmotm (including >>>>>>> bootmem-avoid-dma32-zone-by-default.patch): panics. >>>> ... >>>>> >>>>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05. I >>>>> re-tested with 'make defconfig' to confirm the panic with this later >>>>> mmotm. >>>> >>>> please check >>>> >>>> [PATCH] early_res: double check with updated goal in alloc_memory_core_early >>>> >>>> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node >>>> change the behavoir about goal. >>>> original bootmem one will try go further regardless of goal. >>>> >>>> and it will break his patch about default goal from MAX_DMA to MAX_DMA32... >>>> also broke uncommon machines with <=16M of memory. >>>> (really? our x86 kernel still can run on 16M system?) >>>> >>>> so try again with update goal. >>> >>> Thanks for the patch, it seems to be correct. >>> >>> However, I have a more generic question about it, regarding the future of the >>> early_res allocator. >>> >>> Did you plan on keeping the bootmem API for longer? Because my impression was, >>> emulating it is a temporary measure until all users are gone and bootmem can >>> be finally dropped. >> >> that depends on every arch maintainer. >> >> user can compare them on x86 to check if... > > Humm, now that is a bit disappointing. Because it means we will never get rid > of bootmem as long as it works for the other architectures. And your changeset > just added ~900 lines of code, some of it being a rather ugly compatibility > layer in bootmem that I hoped could go away again sooner than later. > > I do not know what the upsides for x86 are from no longer using bootmem but it > would suck from a code maintainance point of view to get stuck half way through > this transition and have now TWO implementations of the bootmem interface we > would like to get rid of. some data, and others can compare them more on x86 systems... I didn't plan to post this data before you said .... for my 1T system nobootmem: text data bss dec hex filename 19185736 4148404 12170736 35504876 21dc2ec vmlinux.nobootmem Memory: 1058662820k/1075838976k available (11388k kernel code, 2106480k absent, 15069676k reserved, 8589k data, 2744k init [ 220.947157] calling ip_auto_config+0x0/0x24d @ 1 bootmem: text data bss dec hex filename 19188441 4153956 12170736 35513133 21de32d vmlinux.bootmem Memory: 1058662796k/1075838976k available (11388k kernel code, 2106480k absent, 15069700k reserved, 8589k data, 2752k init) [ 236.765364] calling ip_auto_config+0x0/0x24d @ 1 YH -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id 04E376B0047 for ; Sat, 6 Mar 2010 00:46:09 -0500 (EST) Message-ID: <4B91EBC6.6080509@kernel.org> Date: Fri, 05 Mar 2010 21:44:38 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: please don't apply : bootmem: avoid DMA32 zone by default References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> In-Reply-To: <4B916BD6.8010701@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: On 03/05/2010 12:38 PM, Yinghai Lu wrote: > if you don't want to drop > | bootmem: avoid DMA32 zone by default > > today mainline tree actually DO NOT need that patch according to print out ... > > please apply this one too. > > [PATCH] x86/bootmem: introduce bootmem_default_goal > > don't punish the 64bit systems with less 4G RAM. > they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... andrew, please drop Johannes' patch : bootmem: avoid DMA32 zone by default so you don't need to apply two fix patches from me: [PATCH] early_res: double check with updated goal in alloc_memory_core_early [PATCH] x86/bootmem: introduce bootmem_default_goal move all bootmem to above 4g, make system performance get worse... Thanks Yinghai Lu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id DCAEC6B0047 for ; Sat, 6 Mar 2010 19:22:59 -0500 (EST) Date: Sat, 6 Mar 2010 16:22:34 -0800 From: Andrew Morton Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default Message-Id: <20100306162234.e2cc84fb.akpm@linux-foundation.org> In-Reply-To: <4B91EBC6.6080509@kernel.org> References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Yinghai Lu Cc: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" List-ID: On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu wrote: > On 03/05/2010 12:38 PM, Yinghai Lu wrote: > > if you don't want to drop > > | bootmem: avoid DMA32 zone by default > > > > today mainline tree actually DO NOT need that patch according to print out ... > > > > please apply this one too. > > > > [PATCH] x86/bootmem: introduce bootmem_default_goal > > > > don't punish the 64bit systems with less 4G RAM. > > they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... > > andrew, > > please drop Johannes' patch : bootmem: avoid DMA32 zone by default I'd rather not. That patch is said to fix a runtime problem which is present in 2.6.33 and hence we planned on backporting it into 2.6.33.x. I don't have a clue what your patches do. Can you tell us? Earlier, Johannes wrote : Humm, now that is a bit disappointing. Because it means we will never : get rid of bootmem as long as it works for the other architectures. : And your changeset just added ~900 lines of code, some of it being a : rather ugly compatibility layer in bootmem that I hoped could go away : again sooner than later. : : I do not know what the upsides for x86 are from no longer using bootmem : but it would suck from a code maintainance point of view to get stuck : half way through this transition and have now TWO implementations of : the bootmem interface we would like to get rid of. Which is a pretty good-sounding argument. Perhaps we should be dropping your patches. What patches _are_ these x86 bootmem changes, anyway? Please identify them so people can take a look and see what they do. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 7D2236B0078 for ; Sat, 6 Mar 2010 19:43:18 -0500 (EST) Message-ID: <4B92F65A.5060305@kernel.org> Date: Sat, 06 Mar 2010 16:42:02 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> In-Reply-To: <20100306162234.e2cc84fb.akpm@linux-foundation.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Linus Torvalds List-ID: On 03/06/2010 04:22 PM, Andrew Morton wrote: > On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu wrote: > >> On 03/05/2010 12:38 PM, Yinghai Lu wrote: >>> if you don't want to drop >>> | bootmem: avoid DMA32 zone by default >>> >>> today mainline tree actually DO NOT need that patch according to print out ... >>> >>> please apply this one too. >>> >>> [PATCH] x86/bootmem: introduce bootmem_default_goal >>> >>> don't punish the 64bit systems with less 4G RAM. >>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... >> >> andrew, >> >> please drop Johannes' patch : bootmem: avoid DMA32 zone by default > > I'd rather not. That patch is said to fix a runtime problem which is > present in 2.6.33 and hence we planned on backporting it into 2.6.33.x. that patch make my box booting time from 215s to 265s. should have better way to fix the problem: just put the mem_map or the big chunk on high. instead put everything above 4g. some thing like static void * __init_refok __earlyonly_bootmem_alloc(int node, unsigned long size, unsigned long align, unsigned long goal) { return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal); } void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size, unsigned long align, unsigned long goal) { #ifdef MAX_DMA32_PFN unsigned long end_pfn; if (WARN_ON_ONCE(slab_is_available())) return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); /* update goal according ...MAX_DMA32_PFN */ end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) && (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) { void *ptr; unsigned long new_goal; new_goal = MAX_DMA32_PFN << PAGE_SHIFT; #ifdef CONFIG_NO_BOOTMEM ptr = __alloc_memory_core_early(pgdat->node_id, size, align, new_goal, -1ULL); #else ptr = alloc_bootmem_core(pgdat->bdata, size, align, new_goal, 0); #endif if (ptr) return ptr; } #endif return __alloc_bootmem_node(pgdat, size, align, goal); } > > I don't have a clue what your patches do. Can you tell us? do use bootmem, and use early_res instead. you are on the to list... please check... http://lkml.org/lkml/2010/2/10/39 > > Earlier, Johannes wrote > > : Humm, now that is a bit disappointing. Because it means we will never > : get rid of bootmem as long as it works for the other architectures. > : And your changeset just added ~900 lines of code, some of it being a > : rather ugly compatibility layer in bootmem that I hoped could go away > : again sooner than later. > : > : I do not know what the upsides for x86 are from no longer using bootmem > : but it would suck from a code maintainance point of view to get stuck > : half way through this transition and have now TWO implementations of > : the bootmem interface we would like to get rid of. > > Which is a pretty good-sounding argument. Perhaps we should be > dropping your patches. > > What patches _are_ these x86 bootmem changes, anyway? Please identify > them so people can take a look and see what they do. http://lkml.org/lkml/2010/2/10/39 and you and linus, ingo, hpa, tglx on the To list. Yinghai -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id CEE2A6B0047 for ; Sat, 6 Mar 2010 19:54:52 -0500 (EST) Message-ID: <4B92F91A.5040607@kernel.org> Date: Sat, 06 Mar 2010 16:53:46 -0800 From: Yinghai Lu MIME-Version: 1.0 Subject: Re: please don't apply : bootmem: avoid DMA32 zone by default References: <49b004811003041321g2567bac8yb73235be32a27e7c@mail.gmail.com> <20100305032106.GA12065@cmpxchg.org> <49b004811003042117n720f356h7e10997a1a783475@mail.gmail.com> <4B915074.4020704@kernel.org> <4B916BD6.8010701@kernel.org> <4B91EBC6.6080509@kernel.org> <20100306162234.e2cc84fb.akpm@linux-foundation.org> <4B92F65A.5060305@kernel.org> In-Reply-To: <4B92F65A.5060305@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Andrew Morton , Jiri Slaby Cc: Greg Thelen , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , Johannes Weiner , linux-mm@kvack.org, "linux-kernel@vger.kernel.org" , Linus Torvalds List-ID: On 03/06/2010 04:42 PM, Yinghai Lu wrote: > On 03/06/2010 04:22 PM, Andrew Morton wrote: >> On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu wrote: >> >>> On 03/05/2010 12:38 PM, Yinghai Lu wrote: >>>> if you don't want to drop >>>> | bootmem: avoid DMA32 zone by default >>>> >>>> today mainline tree actually DO NOT need that patch according to print out ... >>>> >>>> please apply this one too. >>>> >>>> [PATCH] x86/bootmem: introduce bootmem_default_goal >>>> >>>> don't punish the 64bit systems with less 4G RAM. >>>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback... >>> >>> andrew, >>> >>> please drop Johannes' patch : bootmem: avoid DMA32 zone by default >> >> I'd rather not. That patch is said to fix a runtime problem which is >> present in 2.6.33 and hence we planned on backporting it into 2.6.33.x. > > that patch make my box booting time from 215s to 265s. > > should have better way to fix the problem: > just put the mem_map or the big chunk on high. > instead put everything above 4g. > > some thing like > static void * __init_refok __earlyonly_bootmem_alloc(int node, > unsigned long size, > unsigned long align, > unsigned long goal) > { > return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal); > } > > void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size, > unsigned long align, unsigned long goal) > { > #ifdef MAX_DMA32_PFN > unsigned long end_pfn; > > if (WARN_ON_ONCE(slab_is_available())) > return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); > > /* update goal according ...MAX_DMA32_PFN */ > end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages; > > if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) && > (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) { > void *ptr; > unsigned long new_goal; > > new_goal = MAX_DMA32_PFN << PAGE_SHIFT; > #ifdef CONFIG_NO_BOOTMEM > ptr = __alloc_memory_core_early(pgdat->node_id, size, align, > new_goal, -1ULL); > #else > ptr = alloc_bootmem_core(pgdat->bdata, size, align, > new_goal, 0); > #endif > if (ptr) > return ptr; > } > #endif > > return __alloc_bootmem_node(pgdat, size, align, goal); > > } Jiri, can you send out your bootlog and .config? Yinghai -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org