From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id CEBC06B0279 for ; Wed, 24 May 2017 04:20:31 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id q27so5009080pfi.8 for ; Wed, 24 May 2017 01:20:31 -0700 (PDT) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com. [148.163.156.1]) by mx.google.com with ESMTPS id a2si23093769pln.77.2017.05.24.01.20.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 May 2017 01:20:30 -0700 (PDT) Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v4O8F0FJ114394 for ; Wed, 24 May 2017 04:20:30 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 2amvavnk74-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 24 May 2017 04:20:29 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 24 May 2017 09:20:27 +0100 Date: Wed, 24 May 2017 10:20:22 +0200 From: Heiko Carstens Subject: [-next] memory hotplug regression MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Message-Id: <20170524082022.GC5427@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Hello Michal, I just re-tested linux-next with respect to your memory hotplug changes and actually (finally) figured out that your patch ("mm, memory_hotplug: do not associate hotadded memory to zones until online)" changes behaviour on s390: before your patch memory blocks that were offline and located behind the last online memory block were added by default to ZONE_MOVABLE: # cat /sys/devices/system/memory/memory16/valid_zones Movable Normal With your patch this changes, so that they will be added to ZONE_NORMAL by default instead: # cat /sys/devices/system/memory/memory16/valid_zones Normal Movable Sorry, that I didn't realize this earlier! Having the ZONE_MOVABLE default was actually the only point why s390's arch_add_memory() was rather complex compared to other architectures. We always had this behaviour, since we always wanted to be able to offline memory after it was brought online. Given that back then "online_movable" did not exist, the initial s390 memory hotplug support simply added all additional memory to ZONE_MOVABLE. Keeping the default the same would be quite important. FWIW, and a bit unrelated: we had/have very basic lsmem and chmem tools which can be used to list memory states and bring memory online and offline. These tools were part of the s390-tools package and only recently moved to util-linux. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 0ED886B02B4 for ; Wed, 24 May 2017 04:40:00 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id g143so36538149wme.13 for ; Wed, 24 May 2017 01:40:00 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id z9si22083770edb.89.2017.05.24.01.39.58 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 24 May 2017 01:39:58 -0700 (PDT) Date: Wed, 24 May 2017 10:39:57 +0200 From: Michal Hocko Subject: Re: [-next] memory hotplug regression Message-ID: <20170524083956.GC14733@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170524082022.GC5427@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed 24-05-17 10:20:22, Heiko Carstens wrote: > Hello Michal, > > I just re-tested linux-next with respect to your memory hotplug changes and > actually (finally) figured out that your patch ("mm, memory_hotplug: do not > associate hotadded memory to zones until online)" changes behaviour on > s390: > > before your patch memory blocks that were offline and located behind the > last online memory block were added by default to ZONE_MOVABLE: > > # cat /sys/devices/system/memory/memory16/valid_zones > Movable Normal > > With your patch this changes, so that they will be added to ZONE_NORMAL by > default instead: > > # cat /sys/devices/system/memory/memory16/valid_zones > Normal Movable > > Sorry, that I didn't realize this earlier! > > Having the ZONE_MOVABLE default was actually the only point why s390's > arch_add_memory() was rather complex compared to other architectures. > > We always had this behaviour, since we always wanted to be able to offline > memory after it was brought online. Given that back then "online_movable" > did not exist, the initial s390 memory hotplug support simply added all > additional memory to ZONE_MOVABLE. > > Keeping the default the same would be quite important. Hmm, that is really unfortunate because I would _really_ like to get rid of the previous semantic which was really awkward. The whole point of the rework is to get rid of the nasty zone shifting. Is it an option to use `online_movable' rather than `online' in your setup? Btw. my long term plan is to remove the zone range constrains altogether so you could online each memblock to the type you want. Would that be sufficient for you in general? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 89EFA6B0292 for ; Fri, 26 May 2017 08:25:19 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id b28so226522wrb.2 for ; Fri, 26 May 2017 05:25:19 -0700 (PDT) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id 60si710522wri.313.2017.05.26.05.25.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 May 2017 05:25:18 -0700 (PDT) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v4QCNxcr129305 for ; Fri, 26 May 2017 08:25:16 -0400 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0b-001b2d01.pphosted.com with ESMTP id 2apk7ybbpf-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 26 May 2017 08:25:16 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 26 May 2017 13:25:14 +0100 Date: Fri, 26 May 2017 14:25:09 +0200 From: Heiko Carstens Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170524083956.GC14733@dhcp22.suse.cz> Message-Id: <20170526122509.GB14849@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, May 24, 2017 at 10:39:57AM +0200, Michal Hocko wrote: > On Wed 24-05-17 10:20:22, Heiko Carstens wrote: > > Having the ZONE_MOVABLE default was actually the only point why s390's > > arch_add_memory() was rather complex compared to other architectures. > > > > We always had this behaviour, since we always wanted to be able to offline > > memory after it was brought online. Given that back then "online_movable" > > did not exist, the initial s390 memory hotplug support simply added all > > additional memory to ZONE_MOVABLE. > > > > Keeping the default the same would be quite important. > > Hmm, that is really unfortunate because I would _really_ like to get rid > of the previous semantic which was really awkward. The whole point of > the rework is to get rid of the nasty zone shifting. > > Is it an option to use `online_movable' rather than `online' in your setup? > Btw. my long term plan is to remove the zone range constrains altogether > so you could online each memblock to the type you want. Would that be > sufficient for you in general? Why is it a problem to change the default for 'online'? As far as I can see that doesn't have too much to do with the order of zones, no? By the way: we played around a bit with the changes wrt memory hotplug. There are a two odd things: 1) With the new code I can generate overlapping zones for ZONE_DMA and ZONE_NORMAL: --- new code: DMA [mem 0x0000000000000000-0x000000007fffffff] Normal [mem 0x0000000080000000-0x000000017fffffff] # cat /sys/devices/system/memory/block_size_bytes 10000000 # cat /sys/devices/system/memory/memory5/valid_zones DMA # echo 0 > /sys/devices/system/memory/memory5/online # cat /sys/devices/system/memory/memory5/valid_zones Normal # echo 1 > /sys/devices/system/memory/memory5/online Normal # cat /proc/zoneinfo Node 0, zone DMA spanned 524288 <----- present 458752 managed 455078 start_pfn: 0 <----- Node 0, zone Normal spanned 720896 present 589824 managed 571648 start_pfn: 327680 <----- So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless this restriction is gone? --- old code: # echo 0 > /sys/devices/system/memory/memory5/online # cat /sys/devices/system/memory/memory5/valid_zones DMA # echo online_movable > /sys/devices/system/memory/memory5/state -bash: echo: write error: Invalid argument # echo online_kernel > /sys/devices/system/memory/memory5/state -bash: echo: write error: Invalid argument # echo online > /sys/devices/system/memory/memory5/state # cat /sys/devices/system/memory/memory5/valid_zones DMA 2) Another oddity is that after a memory block was brought online it's association to ZONE_NORMAL or ZONE_MOVABLE seems to be fixed. Even if it is brought offline afterwards: # cat /sys/devices/system/memory/memory16/valid_zones Normal Movable # echo online_movable > /sys/devices/system/memory/memory16/state # echo offline > /sys/devices/system/memory/memory16/state # cat /sys/devices/system/memory/memory16/valid_zones Movable <---- should be "Normal Movable" I assume this happens because start_pfn and spanned pages of the zones aren't updated if a memory block at the beginning or end of a zone is brought offline. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 243E06B0292 for ; Mon, 29 May 2017 04:52:34 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id r203so12159943wmb.2 for ; Mon, 29 May 2017 01:52:34 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id x27si9631246eda.67.2017.05.29.01.52.32 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 29 May 2017 01:52:32 -0700 (PDT) Date: Mon, 29 May 2017 10:52:31 +0200 From: Michal Hocko Subject: Re: [-next] memory hotplug regression Message-ID: <20170529085231.GE19725@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170526122509.GB14849@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Fri 26-05-17 14:25:09, Heiko Carstens wrote: > On Wed, May 24, 2017 at 10:39:57AM +0200, Michal Hocko wrote: > > On Wed 24-05-17 10:20:22, Heiko Carstens wrote: > > > Having the ZONE_MOVABLE default was actually the only point why s390's > > > arch_add_memory() was rather complex compared to other architectures. > > > > > > We always had this behaviour, since we always wanted to be able to offline > > > memory after it was brought online. Given that back then "online_movable" > > > did not exist, the initial s390 memory hotplug support simply added all > > > additional memory to ZONE_MOVABLE. > > > > > > Keeping the default the same would be quite important. > > > > Hmm, that is really unfortunate because I would _really_ like to get rid > > of the previous semantic which was really awkward. The whole point of > > the rework is to get rid of the nasty zone shifting. > > > > Is it an option to use `online_movable' rather than `online' in your setup? > > Btw. my long term plan is to remove the zone range constrains altogether > > so you could online each memblock to the type you want. Would that be > > sufficient for you in general? > > Why is it a problem to change the default for 'online'? As far as I can see > that doesn't have too much to do with the order of zones, no? `online' (aka MMOP_ONLINE_KEEP) should always inherit its current zone. The previous implementation made an exception to allow to shift to another zone if it is on the border of two zones. This is what I wanted to get rid of because it is just too ugly to live. But now I am not really sure what is the usecase here. I assume you know how to online the memoery. That's why you had to play tricks with the zones previously. All you need now is to use the proper MMOP_ONLINE* > By the way: we played around a bit with the changes wrt memory > hotplug. There are a two odd things: > > 1) With the new code I can generate overlapping zones for ZONE_DMA and > ZONE_NORMAL: > > --- new code: > > DMA [mem 0x0000000000000000-0x000000007fffffff] > Normal [mem 0x0000000080000000-0x000000017fffffff] > > # cat /sys/devices/system/memory/block_size_bytes > 10000000 > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > # echo 0 > /sys/devices/system/memory/memory5/online > # cat /sys/devices/system/memory/memory5/valid_zones > Normal > # echo 1 > /sys/devices/system/memory/memory5/online > Normal OK, interesting. I will double check the code. > # cat /proc/zoneinfo > Node 0, zone DMA > spanned 524288 <----- > present 458752 > managed 455078 > start_pfn: 0 <----- > > Node 0, zone Normal > spanned 720896 > present 589824 > managed 571648 > start_pfn: 327680 <----- > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > this restriction is gone? > > --- old code: > > # echo 0 > /sys/devices/system/memory/memory5/online > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > # echo online_movable > /sys/devices/system/memory/memory5/state > -bash: echo: write error: Invalid argument > # echo online_kernel > /sys/devices/system/memory/memory5/state > -bash: echo: write error: Invalid argument > # echo online > /sys/devices/system/memory/memory5/state > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > > > 2) Another oddity is that after a memory block was brought online it's > association to ZONE_NORMAL or ZONE_MOVABLE seems to be fixed. Even if it > is brought offline afterwards: This is intended behavior because I got rid of the tricky&ugly zone shifting code. Ultimately I would like to allow for overlapping zones so the explicit online_{movable,kernel} will _always_ work. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id 93B236B0292 for ; Mon, 29 May 2017 06:11:37 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id d127so12501568wmf.15 for ; Mon, 29 May 2017 03:11:37 -0700 (PDT) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id z24si10130211edc.188.2017.05.29.03.11.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 May 2017 03:11:36 -0700 (PDT) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v4TA8pom053439 for ; Mon, 29 May 2017 06:11:34 -0400 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0a-001b2d01.pphosted.com with ESMTP id 2arha48thm-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 29 May 2017 06:11:34 -0400 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 29 May 2017 11:11:32 +0100 Date: Mon, 29 May 2017 12:11:28 +0200 From: Heiko Carstens Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170529085231.GE19725@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170529085231.GE19725@dhcp22.suse.cz> Message-Id: <20170529101128.GA12975@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Mon, May 29, 2017 at 10:52:31AM +0200, Michal Hocko wrote: > > Why is it a problem to change the default for 'online'? As far as I can see > > that doesn't have too much to do with the order of zones, no? > > `online' (aka MMOP_ONLINE_KEEP) should always inherit its current zone. > The previous implementation made an exception to allow to shift to > another zone if it is on the border of two zones. This is what I wanted > to get rid of because it is just too ugly to live. > > But now I am not really sure what is the usecase here. I assume you know > how to online the memoery. That's why you had to play tricks with the > zones previously. All you need now is to use the proper MMOP_ONLINE* Yes, however that implies that existing user space has to be changed to achieve the same semantics as before. That's the usecase I'm talking about. On the other hand this change would finally make s390 behave like all other architectures, which is certainly not a bad thing. So, while thinking again I think you convinced me to agree with this change. > > 2) Another oddity is that after a memory block was brought online it's > > association to ZONE_NORMAL or ZONE_MOVABLE seems to be fixed. Even if it > > is brought offline afterwards: > > This is intended behavior because I got rid of the tricky&ugly zone > shifting code. Ultimately I would like to allow for overlapping zones > so the explicit online_{movable,kernel} will _always_ work. Ok, I see. This change (fixed memory block to zone mapping after first online) is a bit surprising. On the other hand I can't think of a sane usecase why one wants to change the zone a memory block belongs to. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id E7CDE6B0292 for ; Mon, 29 May 2017 06:45:39 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id 8so12677101wms.11 for ; Mon, 29 May 2017 03:45:39 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id i4si9615886edc.294.2017.05.29.03.45.38 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 29 May 2017 03:45:38 -0700 (PDT) Date: Mon, 29 May 2017 12:45:37 +0200 From: Michal Hocko Subject: Re: [-next] memory hotplug regression Message-ID: <20170529104537.GH19725@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170529085231.GE19725@dhcp22.suse.cz> <20170529101128.GA12975@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170529101128.GA12975@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Mon 29-05-17 12:11:28, Heiko Carstens wrote: > On Mon, May 29, 2017 at 10:52:31AM +0200, Michal Hocko wrote: > > > Why is it a problem to change the default for 'online'? As far as I can see > > > that doesn't have too much to do with the order of zones, no? > > > > `online' (aka MMOP_ONLINE_KEEP) should always inherit its current zone. > > The previous implementation made an exception to allow to shift to > > another zone if it is on the border of two zones. This is what I wanted > > to get rid of because it is just too ugly to live. > > > > But now I am not really sure what is the usecase here. I assume you know > > how to online the memoery. That's why you had to play tricks with the > > zones previously. All you need now is to use the proper MMOP_ONLINE* > > Yes, however that implies that existing user space has to be changed to > achieve the same semantics as before. That's the usecase I'm talking about. Yes that is really unfortunate. It is even more unfortunate how the original behavior got merged without a deeper consideration. > On the other hand this change would finally make s390 behave like all other > architectures, which is certainly not a bad thing. So, while thinking again > I think you convinced me to agree with this change. That is definitely good to hear. Btw. I plan to change the semantic even further. MMOP_ONLINE_KEEP currently ignores movable_node setting and I plan to change that. Hopefully this won't break more userspace... > > > 2) Another oddity is that after a memory block was brought online it's > > > association to ZONE_NORMAL or ZONE_MOVABLE seems to be fixed. Even if it > > > is brought offline afterwards: > > > > This is intended behavior because I got rid of the tricky&ugly zone > > shifting code. Ultimately I would like to allow for overlapping zones > > so the explicit online_{movable,kernel} will _always_ work. > > Ok, I see. This change (fixed memory block to zone mapping after first > online) is a bit surprising. On the other hand I can't think of a sane > usecase why one wants to change the zone a memory block belongs to. Longeterm I would really like to remove any constrains on where to online movable or kernel memory. So even if this will be problem it will be only temporary. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id DFB996B02C3 for ; Tue, 30 May 2017 08:18:09 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id g13so19384996wmd.9 for ; Tue, 30 May 2017 05:18:09 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id z3si13587489eda.205.2017.05.30.05.18.08 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 30 May 2017 05:18:08 -0700 (PDT) Date: Tue, 30 May 2017 14:18:06 +0200 From: Michal Hocko Subject: Re: [-next] memory hotplug regression Message-ID: <20170530121806.GD7969@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170526122509.GB14849@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Fri 26-05-17 14:25:09, Heiko Carstens wrote: [...] > 1) With the new code I can generate overlapping zones for ZONE_DMA and > ZONE_NORMAL: > > --- new code: > > DMA [mem 0x0000000000000000-0x000000007fffffff] > Normal [mem 0x0000000080000000-0x000000017fffffff] > > # cat /sys/devices/system/memory/block_size_bytes > 10000000 > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > # echo 0 > /sys/devices/system/memory/memory5/online > # cat /sys/devices/system/memory/memory5/valid_zones > Normal > # echo 1 > /sys/devices/system/memory/memory5/online > Normal > > # cat /proc/zoneinfo > Node 0, zone DMA > spanned 524288 <----- > present 458752 > managed 455078 > start_pfn: 0 <----- > > Node 0, zone Normal > spanned 720896 > present 589824 > managed 571648 > start_pfn: 327680 <----- > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > this restriction is gone? The patch below should help. > --- old code: > > # echo 0 > /sys/devices/system/memory/memory5/online > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > # echo online_movable > /sys/devices/system/memory/memory5/state > -bash: echo: write error: Invalid argument > # echo online_kernel > /sys/devices/system/memory/memory5/state > -bash: echo: write error: Invalid argument this error doesn't make any sense. Because we we want to online kernel memory and DMA is pretty much the kernel memory > # echo online > /sys/devices/system/memory/memory5/state > # cat /sys/devices/system/memory/memory5/valid_zones > DMA --- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id C5A496B0313 for ; Tue, 30 May 2017 08:37:31 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id j28so93466905pfk.14 for ; Tue, 30 May 2017 05:37:31 -0700 (PDT) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com. [148.163.156.1]) by mx.google.com with ESMTPS id v8si13500494pgb.49.2017.05.30.05.37.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2017 05:37:31 -0700 (PDT) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v4UCYst1117003 for ; Tue, 30 May 2017 08:37:30 -0400 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 2as8utg2pc-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 30 May 2017 08:37:30 -0400 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 30 May 2017 13:37:28 +0100 Date: Tue, 30 May 2017 14:37:24 +0200 From: Heiko Carstens Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530121806.GD7969@dhcp22.suse.cz> Message-Id: <20170530123724.GC4874@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > this restriction is gone? > > The patch below should help. It does fix this specific problem, but introduces a new one: # echo online_movable > /sys/devices/system/memory/memory16/state # cat /sys/devices/system/memory/memory16/valid_zones Movable # echo offline > /sys/devices/system/memory/memory16/state # cat /sys/devices/system/memory/memory16/valid_zones <--- no output Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 54EEB6B0279 for ; Tue, 30 May 2017 10:33:01 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id y43so6966209wrc.11 for ; Tue, 30 May 2017 07:33:01 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id x26si12412994edx.74.2017.05.30.07.32.59 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 30 May 2017 07:33:00 -0700 (PDT) Date: Tue, 30 May 2017 16:32:47 +0200 From: Michal Hocko Subject: Re: [-next] memory hotplug regression Message-ID: <20170530143246.GJ7969@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530123724.GC4874@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue 30-05-17 14:37:24, Heiko Carstens wrote: > On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > > this restriction is gone? > > > > The patch below should help. > > It does fix this specific problem, but introduces a new one: > > # echo online_movable > /sys/devices/system/memory/memory16/state > # cat /sys/devices/system/memory/memory16/valid_zones > Movable > # echo offline > /sys/devices/system/memory/memory16/state > # cat /sys/devices/system/memory/memory16/valid_zones > <--- no output > > Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. Could you test the this on top please? --- diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 792c098e0e5f..a26f9f8e6365 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -937,13 +937,18 @@ void __ref move_pfn_range_to_zone(struct zone *zone, set_zone_contiguous(zone); } +/* + * Returns a default kernel memory zone for the given pfn range. + * If no kernel zone covers this pfn range it will automatically go + * to the ZONE_NORMAL. + */ struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); int zid; - for (zid = 0; zid < MAX_NR_ZONES; zid++) { + for (zid = 0; zid <= ZONE_NORMAL; zid++) { struct zone *zone = &pgdat->node_zones[zid]; if (zone_intersects(zone, start_pfn, nr_pages)) -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 3BDA66B0279 for ; Tue, 30 May 2017 10:55:12 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id p62so7025803wrc.13 for ; Tue, 30 May 2017 07:55:12 -0700 (PDT) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id z142si14995340wmc.38.2017.05.30.07.55.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2017 07:55:09 -0700 (PDT) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v4UErZwF025428 for ; Tue, 30 May 2017 10:55:08 -0400 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0b-001b2d01.pphosted.com with ESMTP id 2as03rtkh0-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 30 May 2017 10:55:07 -0400 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 30 May 2017 15:55:06 +0100 Date: Tue, 30 May 2017 16:55:01 +0200 From: Heiko Carstens Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530143246.GJ7969@dhcp22.suse.cz> Message-Id: <20170530145501.GD4874@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue, May 30, 2017 at 04:32:47PM +0200, Michal Hocko wrote: > On Tue 30-05-17 14:37:24, Heiko Carstens wrote: > > On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > > > this restriction is gone? > > > > > > The patch below should help. > > > > It does fix this specific problem, but introduces a new one: > > > > # echo online_movable > /sys/devices/system/memory/memory16/state > > # cat /sys/devices/system/memory/memory16/valid_zones > > Movable > > # echo offline > /sys/devices/system/memory/memory16/state > > # cat /sys/devices/system/memory/memory16/valid_zones > > <--- no output > > > > Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. > > Could you test the this on top please? > --- > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 792c098e0e5f..a26f9f8e6365 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -937,13 +937,18 @@ void __ref move_pfn_range_to_zone(struct zone *zone, > set_zone_contiguous(zone); > } > > +/* > + * Returns a default kernel memory zone for the given pfn range. > + * If no kernel zone covers this pfn range it will automatically go > + * to the ZONE_NORMAL. > + */ > struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, > unsigned long nr_pages) > { > struct pglist_data *pgdat = NODE_DATA(nid); > int zid; > > - for (zid = 0; zid < MAX_NR_ZONES; zid++) { > + for (zid = 0; zid <= ZONE_NORMAL; zid++) { > struct zone *zone = &pgdat->node_zones[zid]; > > if (zone_intersects(zone, start_pfn, nr_pages)) Still broken, but in different way(s): # cat /sys/devices/system/memory/memory16/valid_zones Normal Movable # echo online_movable > /sys/devices/system/memory/memory16/state # cat /sys/devices/system/memory/memory16/valid_zones Movable # cat /sys/devices/system/memory/memory18/valid_zones Movable # echo online > /sys/devices/system/memory/memory18/state # cat /sys/devices/system/memory/memory18/valid_zones Normal <--- should be Movable # cat /sys/devices/system/memory/memory17/valid_zones <--- no output -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 23CDC6B0279 for ; Tue, 30 May 2017 11:04:25 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id g13so20053308wmd.9 for ; Tue, 30 May 2017 08:04:25 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id e11si10484793eda.49.2017.05.30.08.04.23 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 30 May 2017 08:04:23 -0700 (PDT) Date: Tue, 30 May 2017 17:04:21 +0200 From: Michal Hocko Subject: Re: [-next] memory hotplug regression Message-ID: <20170530150421.GM7969@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> <20170530145501.GD4874@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530145501.GD4874@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue 30-05-17 16:55:01, Heiko Carstens wrote: > On Tue, May 30, 2017 at 04:32:47PM +0200, Michal Hocko wrote: > > On Tue 30-05-17 14:37:24, Heiko Carstens wrote: > > > On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > > > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > > > > this restriction is gone? > > > > > > > > The patch below should help. > > > > > > It does fix this specific problem, but introduces a new one: > > > > > > # echo online_movable > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > Movable > > > # echo offline > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > <--- no output > > > > > > Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. > > > > Could you test the this on top please? > > --- > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 792c098e0e5f..a26f9f8e6365 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -937,13 +937,18 @@ void __ref move_pfn_range_to_zone(struct zone *zone, > > set_zone_contiguous(zone); > > } > > > > +/* > > + * Returns a default kernel memory zone for the given pfn range. > > + * If no kernel zone covers this pfn range it will automatically go > > + * to the ZONE_NORMAL. > > + */ > > struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, > > unsigned long nr_pages) > > { > > struct pglist_data *pgdat = NODE_DATA(nid); > > int zid; > > > > - for (zid = 0; zid < MAX_NR_ZONES; zid++) { > > + for (zid = 0; zid <= ZONE_NORMAL; zid++) { > > struct zone *zone = &pgdat->node_zones[zid]; > > > > if (zone_intersects(zone, start_pfn, nr_pages)) > > Still broken, but in different way(s): > > # cat /sys/devices/system/memory/memory16/valid_zones > Normal Movable > # echo online_movable > /sys/devices/system/memory/memory16/state > # cat /sys/devices/system/memory/memory16/valid_zones > Movable > # cat /sys/devices/system/memory/memory18/valid_zones > Movable > # echo online > /sys/devices/system/memory/memory18/state > # cat /sys/devices/system/memory/memory18/valid_zones > Normal <--- should be Movable > # cat /sys/devices/system/memory/memory17/valid_zones > <--- no output OK, I will sit on this tomorrow with a clean head without doing 10 things at the same time. Sorry about your wasted time! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 0FB626B0279 for ; Wed, 31 May 2017 02:24:44 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id r203so989002wmb.2 for ; Tue, 30 May 2017 23:24:43 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p12si15941823wrd.273.2017.05.30.23.24.42 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 30 May 2017 23:24:42 -0700 (PDT) Date: Wed, 31 May 2017 08:24:40 +0200 From: Michal Hocko Subject: Re: [-next] memory hotplug regression Message-ID: <20170531062439.GA3853@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> <20170530145501.GD4874@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530145501.GD4874@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue 30-05-17 16:55:01, Heiko Carstens wrote: > On Tue, May 30, 2017 at 04:32:47PM +0200, Michal Hocko wrote: > > On Tue 30-05-17 14:37:24, Heiko Carstens wrote: > > > On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > > > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > > > > this restriction is gone? > > > > > > > > The patch below should help. > > > > > > It does fix this specific problem, but introduces a new one: > > > > > > # echo online_movable > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > Movable > > > # echo offline > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > <--- no output > > > > > > Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. > > > > Could you test the this on top please? > > --- > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 792c098e0e5f..a26f9f8e6365 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -937,13 +937,18 @@ void __ref move_pfn_range_to_zone(struct zone *zone, > > set_zone_contiguous(zone); > > } > > > > +/* > > + * Returns a default kernel memory zone for the given pfn range. > > + * If no kernel zone covers this pfn range it will automatically go > > + * to the ZONE_NORMAL. > > + */ > > struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, > > unsigned long nr_pages) > > { > > struct pglist_data *pgdat = NODE_DATA(nid); > > int zid; > > > > - for (zid = 0; zid < MAX_NR_ZONES; zid++) { > > + for (zid = 0; zid <= ZONE_NORMAL; zid++) { > > struct zone *zone = &pgdat->node_zones[zid]; > > > > if (zone_intersects(zone, start_pfn, nr_pages)) > > Still broken, but in different way(s): > > # cat /sys/devices/system/memory/memory16/valid_zones > Normal Movable > # echo online_movable > /sys/devices/system/memory/memory16/state > # cat /sys/devices/system/memory/memory16/valid_zones > Movable > # cat /sys/devices/system/memory/memory18/valid_zones > Movable > # echo online > /sys/devices/system/memory/memory18/state > # cat /sys/devices/system/memory/memory18/valid_zones > Normal <--- should be Movable > # cat /sys/devices/system/memory/memory17/valid_zones > <--- no output OK, so this is an independent problem and an unrelated one to the patch I've posted. We need two patches actually. Damn, I hate MMOP_ONLINE_KEEP. I will send 2 patches as a reply to this email. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id 6EC846B0279 for ; Wed, 31 May 2017 02:25:56 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id 204so992757wmy.1 for ; Tue, 30 May 2017 23:25:56 -0700 (PDT) Received: from mail-wr0-f196.google.com (mail-wr0-f196.google.com. [209.85.128.196]) by mx.google.com with ESMTPS id 68si17296507wra.23.2017.05.30.23.25.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2017 23:25:55 -0700 (PDT) Received: by mail-wr0-f196.google.com with SMTP id 6so561124wrb.1 for ; Tue, 30 May 2017 23:25:55 -0700 (PDT) From: Michal Hocko Subject: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior Date: Wed, 31 May 2017 08:25:45 +0200 Message-Id: <20170531062545.4122-1-mhocko@kernel.org> In-Reply-To: <20170531062439.GA3853@dhcp22.suse.cz> References: <20170531062439.GA3853@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko From: Michal Hocko Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently $ grep . memory3?/valid_zones memory34/valid_zones:Normal Movable memory35/valid_zones:Normal Movable memory36/valid_zones:Normal Movable memory37/valid_zones:Normal Movable $ echo online_movable > memory34/state $ grep . memory3?/valid_zones memory34/valid_zones:Movable memory35/valid_zones:Movable memory36/valid_zones:Movable memory37/valid_zones:Movable $ echo online > memory36/state $ grep . memory3?/valid_zones memory34/valid_zones:Movable memory36/valid_zones:Normal memory37/valid_zones:Movable so we have effectivelly punched a hole into the movable zone. The problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong. It only checks whether the given range is already part of the movable zone which is not the case here as only memory34 is in the zone. Fix this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that is false then we can be sure that movable onlining is the right thing to do. Reported-by: Heiko Carstens Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online" Signed-off-by: Michal Hocko --- mm/memory_hotplug.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0a895df2397e..b3895fd609f4 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -950,11 +950,12 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, if (online_type == MMOP_ONLINE_KEEP) { struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; /* - * MMOP_ONLINE_KEEP inherits the current zone which is - * ZONE_NORMAL by default but we might be within ZONE_MOVABLE - * already. + * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use + * movable zone if that is not possible (e.g. we are within + * or past the existing movable zone) */ - if (zone_intersects(movable_zone, start_pfn, nr_pages)) + if (!allow_online_pfn_range(nid, start_pfn, nr_pages, + MMOP_ONLINE_KERNEL)) zone = movable_zone; } else if (online_type == MMOP_ONLINE_MOVABLE) { zone = &pgdat->node_zones[ZONE_MOVABLE]; -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 664F06B02C3 for ; Wed, 31 May 2017 02:26:12 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id k30so943422wrc.9 for ; Tue, 30 May 2017 23:26:12 -0700 (PDT) Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com. [74.125.82.68]) by mx.google.com with ESMTPS id b7si18121403wrd.314.2017.05.30.23.26.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2017 23:26:11 -0700 (PDT) Received: by mail-wm0-f68.google.com with SMTP id g15so1454610wmc.2 for ; Tue, 30 May 2017 23:26:11 -0700 (PDT) From: Michal Hocko Subject: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone Date: Wed, 31 May 2017 08:26:05 +0200 Message-Id: <20170531062605.4347-1-mhocko@kernel.org> In-Reply-To: <20170531062439.GA3853@dhcp22.suse.cz> References: <20170531062439.GA3853@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko From: Michal Hocko Heiko Carstens has noticed that he can generate overlapping zones for ZONE_DMA and ZONE_NORMAL: DMA [mem 0x0000000000000000-0x000000007fffffff] Normal [mem 0x0000000080000000-0x000000017fffffff] $ cat /sys/devices/system/memory/block_size_bytes 10000000 $ cat /sys/devices/system/memory/memory5/valid_zones DMA $ echo 0 > /sys/devices/system/memory/memory5/online $ cat /sys/devices/system/memory/memory5/valid_zones Normal $ echo 1 > /sys/devices/system/memory/memory5/online Normal $ cat /proc/zoneinfo Node 0, zone DMA spanned 524288 <----- present 458752 managed 455078 start_pfn: 0 <----- Node 0, zone Normal spanned 720896 present 589824 managed 571648 start_pfn: 327680 <----- The reason is that we assume that the default zone for kernel onlining is ZONE_NORMAL. This was a simplification introduced by the memory hotplug rework and it is easily fixable by checking the range overlap in the zone order and considering the first matching zone as the default one. If there is no such zone then assume ZONE_NORMAL as we have been doing so far. Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online" Reported-by: Heiko Carstens Signed-off-by: Michal Hocko --- drivers/base/memory.c | 2 +- include/linux/memory_hotplug.h | 2 ++ mm/memory_hotplug.c | 27 ++++++++++++++++++++++++--- 3 files changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index b86fda30ce62..c7c4e0325cdb 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev, nid = pfn_to_nid(start_pfn); if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) { - strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name); + strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name); append = true; } diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 9e0249d0f5e4..ed167541e4fc 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type); +extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn, + unsigned long nr_pages); #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b3895fd609f4..a0348de3e18c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, { struct pglist_data *pgdat = NODE_DATA(nid); struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; - struct zone *normal_zone = &pgdat->node_zones[ZONE_NORMAL]; + struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages); /* * TODO there shouldn't be any inherent reason to have ZONE_NORMAL @@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, return true; return movable_zone->zone_start_pfn >= pfn + nr_pages; } else if (online_type == MMOP_ONLINE_MOVABLE) { - return zone_end_pfn(normal_zone) <= pfn; + return zone_end_pfn(default_zone) <= pfn; } /* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */ @@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone, } /* + * Returns a default kernel memory zone for the given pfn range. + * If no kernel zone covers this pfn range it will automatically go + * to the ZONE_NORMAL. + */ +struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, + unsigned long nr_pages) +{ + struct pglist_data *pgdat = NODE_DATA(nid); + int zid; + + for (zid = 0; zid <= ZONE_NORMAL; zid++) { + struct zone *zone = &pgdat->node_zones[zid]; + + if (zone_intersects(zone, start_pfn, nr_pages)) + return zone; + } + + return &pgdat->node_zones[ZONE_NORMAL]; +} + +/* * Associates the given pfn range with the given node and the zone appropriate * for the given online type. */ @@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); - struct zone *zone = &pgdat->node_zones[ZONE_NORMAL]; + struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages); if (online_type == MMOP_ONLINE_KEEP) { struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id 9BD276B0279 for ; Thu, 1 Jun 2017 02:50:02 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id g143so7834615wme.13 for ; Wed, 31 May 2017 23:50:02 -0700 (PDT) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id l6si19254235ede.337.2017.05.31.23.50.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 31 May 2017 23:50:01 -0700 (PDT) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v516nWp8079978 for ; Thu, 1 Jun 2017 02:49:59 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0b-001b2d01.pphosted.com with ESMTP id 2ata1msss3-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 01 Jun 2017 02:49:59 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 1 Jun 2017 07:49:57 +0100 Date: Thu, 1 Jun 2017 08:49:54 +0200 From: Heiko Carstens Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> <20170530145501.GD4874@osiris> <20170531062439.GA3853@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170531062439.GA3853@dhcp22.suse.cz> Message-Id: <20170601064954.GB7593@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, May 31, 2017 at 08:24:40AM +0200, Michal Hocko wrote: > > # cat /sys/devices/system/memory/memory16/valid_zones > > Normal Movable > > # echo online_movable > /sys/devices/system/memory/memory16/state > > # cat /sys/devices/system/memory/memory16/valid_zones > > Movable > > # cat /sys/devices/system/memory/memory18/valid_zones > > Movable > > # echo online > /sys/devices/system/memory/memory18/state > > # cat /sys/devices/system/memory/memory18/valid_zones > > Normal <--- should be Movable > > # cat /sys/devices/system/memory/memory17/valid_zones > > <--- no output > > OK, so this is an independent problem and an unrelated one to the > patch I've posted. We need two patches actually. Damn, I hate > MMOP_ONLINE_KEEP. I will send 2 patches as a reply to this email. Tested with your patches on top of linux-next as of yesterday, however starting at commit fa812e869a6fe7495a17150bb2639075081ef709 ("mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()"), since the "mm: per-lruvec slab stats" patch series breaks everything ;) Tested-by: Heiko Carstens -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id C07E26B02B4 for ; Thu, 1 Jun 2017 03:13:18 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id a77so7951383wma.12 for ; Thu, 01 Jun 2017 00:13:18 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id t100si6387909wrc.143.2017.06.01.00.13.17 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 01 Jun 2017 00:13:17 -0700 (PDT) Date: Thu, 1 Jun 2017 09:13:10 +0200 From: Michal Hocko Subject: Re: [-next] memory hotplug regression Message-ID: <20170601071310.GA32677@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> <20170530145501.GD4874@osiris> <20170531062439.GA3853@dhcp22.suse.cz> <20170601064954.GB7593@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170601064954.GB7593@osiris> Sender: owner-linux-mm@kvack.org List-ID: To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu 01-06-17 08:49:54, Heiko Carstens wrote: > On Wed, May 31, 2017 at 08:24:40AM +0200, Michal Hocko wrote: > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > Normal Movable > > > # echo online_movable > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > Movable > > > # cat /sys/devices/system/memory/memory18/valid_zones > > > Movable > > > # echo online > /sys/devices/system/memory/memory18/state > > > # cat /sys/devices/system/memory/memory18/valid_zones > > > Normal <--- should be Movable > > > # cat /sys/devices/system/memory/memory17/valid_zones > > > <--- no output > > > > OK, so this is an independent problem and an unrelated one to the > > patch I've posted. We need two patches actually. Damn, I hate > > MMOP_ONLINE_KEEP. I will send 2 patches as a reply to this email. > > Tested with your patches on top of linux-next as of yesterday, however > starting at commit fa812e869a6fe7495a17150bb2639075081ef709 ("mm/zswap.c: > delete an error message for a failed memory allocation in > zswap_dstmem_prepare()"), since the "mm: per-lruvec slab stats" patch > series breaks everything ;) > > Tested-by: Heiko Carstens Thanks a lot for testing! I will post those patches for wider review later today. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968806AbdEXIUp (ORCPT ); Wed, 24 May 2017 04:20:45 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:39906 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S966983AbdEXIUf (ORCPT ); Wed, 24 May 2017 04:20:35 -0400 Date: Wed, 24 May 2017 10:20:22 +0200 From: Heiko Carstens To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [-next] memory hotplug regression MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17052408-0040-0000-0000-000003B50D8F X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17052408-0041-0000-0000-000020406F8B Message-Id: <20170524082022.GC5427@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-24_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=1 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705240039 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Michal, I just re-tested linux-next with respect to your memory hotplug changes and actually (finally) figured out that your patch ("mm, memory_hotplug: do not associate hotadded memory to zones until online)" changes behaviour on s390: before your patch memory blocks that were offline and located behind the last online memory block were added by default to ZONE_MOVABLE: # cat /sys/devices/system/memory/memory16/valid_zones Movable Normal With your patch this changes, so that they will be added to ZONE_NORMAL by default instead: # cat /sys/devices/system/memory/memory16/valid_zones Normal Movable Sorry, that I didn't realize this earlier! Having the ZONE_MOVABLE default was actually the only point why s390's arch_add_memory() was rather complex compared to other architectures. We always had this behaviour, since we always wanted to be able to offline memory after it was brought online. Given that back then "online_movable" did not exist, the initial s390 memory hotplug support simply added all additional memory to ZONE_MOVABLE. Keeping the default the same would be quite important. FWIW, and a bit unrelated: we had/have very basic lsmem and chmem tools which can be used to list memory states and bring memory online and offline. These tools were part of the s390-tools package and only recently moved to util-linux. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1033864AbdEXIkK (ORCPT ); Wed, 24 May 2017 04:40:10 -0400 Received: from mx2.suse.de ([195.135.220.15]:37430 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1761351AbdEXIkE (ORCPT ); Wed, 24 May 2017 04:40:04 -0400 Date: Wed, 24 May 2017 10:39:57 +0200 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression Message-ID: <20170524083956.GC14733@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170524082022.GC5427@osiris> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 24-05-17 10:20:22, Heiko Carstens wrote: > Hello Michal, > > I just re-tested linux-next with respect to your memory hotplug changes and > actually (finally) figured out that your patch ("mm, memory_hotplug: do not > associate hotadded memory to zones until online)" changes behaviour on > s390: > > before your patch memory blocks that were offline and located behind the > last online memory block were added by default to ZONE_MOVABLE: > > # cat /sys/devices/system/memory/memory16/valid_zones > Movable Normal > > With your patch this changes, so that they will be added to ZONE_NORMAL by > default instead: > > # cat /sys/devices/system/memory/memory16/valid_zones > Normal Movable > > Sorry, that I didn't realize this earlier! > > Having the ZONE_MOVABLE default was actually the only point why s390's > arch_add_memory() was rather complex compared to other architectures. > > We always had this behaviour, since we always wanted to be able to offline > memory after it was brought online. Given that back then "online_movable" > did not exist, the initial s390 memory hotplug support simply added all > additional memory to ZONE_MOVABLE. > > Keeping the default the same would be quite important. Hmm, that is really unfortunate because I would _really_ like to get rid of the previous semantic which was really awkward. The whole point of the rework is to get rid of the nasty zone shifting. Is it an option to use `online_movable' rather than `online' in your setup? Btw. my long term plan is to remove the zone range constrains altogether so you could online each memblock to the type you want. Would that be sufficient for you in general? -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764695AbdEZMZT (ORCPT ); Fri, 26 May 2017 08:25:19 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:45802 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753276AbdEZMZR (ORCPT ); Fri, 26 May 2017 08:25:17 -0400 Date: Fri, 26 May 2017 14:25:09 +0200 From: Heiko Carstens To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170524083956.GC14733@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17052612-0008-0000-0000-000004559510 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17052612-0009-0000-0000-00001DD45EAD Message-Id: <20170526122509.GB14849@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-26_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705260227 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 24, 2017 at 10:39:57AM +0200, Michal Hocko wrote: > On Wed 24-05-17 10:20:22, Heiko Carstens wrote: > > Having the ZONE_MOVABLE default was actually the only point why s390's > > arch_add_memory() was rather complex compared to other architectures. > > > > We always had this behaviour, since we always wanted to be able to offline > > memory after it was brought online. Given that back then "online_movable" > > did not exist, the initial s390 memory hotplug support simply added all > > additional memory to ZONE_MOVABLE. > > > > Keeping the default the same would be quite important. > > Hmm, that is really unfortunate because I would _really_ like to get rid > of the previous semantic which was really awkward. The whole point of > the rework is to get rid of the nasty zone shifting. > > Is it an option to use `online_movable' rather than `online' in your setup? > Btw. my long term plan is to remove the zone range constrains altogether > so you could online each memblock to the type you want. Would that be > sufficient for you in general? Why is it a problem to change the default for 'online'? As far as I can see that doesn't have too much to do with the order of zones, no? By the way: we played around a bit with the changes wrt memory hotplug. There are a two odd things: 1) With the new code I can generate overlapping zones for ZONE_DMA and ZONE_NORMAL: --- new code: DMA [mem 0x0000000000000000-0x000000007fffffff] Normal [mem 0x0000000080000000-0x000000017fffffff] # cat /sys/devices/system/memory/block_size_bytes 10000000 # cat /sys/devices/system/memory/memory5/valid_zones DMA # echo 0 > /sys/devices/system/memory/memory5/online # cat /sys/devices/system/memory/memory5/valid_zones Normal # echo 1 > /sys/devices/system/memory/memory5/online Normal # cat /proc/zoneinfo Node 0, zone DMA spanned 524288 <----- present 458752 managed 455078 start_pfn: 0 <----- Node 0, zone Normal spanned 720896 present 589824 managed 571648 start_pfn: 327680 <----- So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless this restriction is gone? --- old code: # echo 0 > /sys/devices/system/memory/memory5/online # cat /sys/devices/system/memory/memory5/valid_zones DMA # echo online_movable > /sys/devices/system/memory/memory5/state -bash: echo: write error: Invalid argument # echo online_kernel > /sys/devices/system/memory/memory5/state -bash: echo: write error: Invalid argument # echo online > /sys/devices/system/memory/memory5/state # cat /sys/devices/system/memory/memory5/valid_zones DMA 2) Another oddity is that after a memory block was brought online it's association to ZONE_NORMAL or ZONE_MOVABLE seems to be fixed. Even if it is brought offline afterwards: # cat /sys/devices/system/memory/memory16/valid_zones Normal Movable # echo online_movable > /sys/devices/system/memory/memory16/state # echo offline > /sys/devices/system/memory/memory16/state # cat /sys/devices/system/memory/memory16/valid_zones Movable <---- should be "Normal Movable" I assume this happens because start_pfn and spanned pages of the zones aren't updated if a memory block at the beginning or end of a zone is brought offline. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750984AbdE2Iwj (ORCPT ); Mon, 29 May 2017 04:52:39 -0400 Received: from mx2.suse.de ([195.135.220.15]:58560 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750909AbdE2Iwi (ORCPT ); Mon, 29 May 2017 04:52:38 -0400 Date: Mon, 29 May 2017 10:52:31 +0200 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression Message-ID: <20170529085231.GE19725@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170526122509.GB14849@osiris> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 26-05-17 14:25:09, Heiko Carstens wrote: > On Wed, May 24, 2017 at 10:39:57AM +0200, Michal Hocko wrote: > > On Wed 24-05-17 10:20:22, Heiko Carstens wrote: > > > Having the ZONE_MOVABLE default was actually the only point why s390's > > > arch_add_memory() was rather complex compared to other architectures. > > > > > > We always had this behaviour, since we always wanted to be able to offline > > > memory after it was brought online. Given that back then "online_movable" > > > did not exist, the initial s390 memory hotplug support simply added all > > > additional memory to ZONE_MOVABLE. > > > > > > Keeping the default the same would be quite important. > > > > Hmm, that is really unfortunate because I would _really_ like to get rid > > of the previous semantic which was really awkward. The whole point of > > the rework is to get rid of the nasty zone shifting. > > > > Is it an option to use `online_movable' rather than `online' in your setup? > > Btw. my long term plan is to remove the zone range constrains altogether > > so you could online each memblock to the type you want. Would that be > > sufficient for you in general? > > Why is it a problem to change the default for 'online'? As far as I can see > that doesn't have too much to do with the order of zones, no? `online' (aka MMOP_ONLINE_KEEP) should always inherit its current zone. The previous implementation made an exception to allow to shift to another zone if it is on the border of two zones. This is what I wanted to get rid of because it is just too ugly to live. But now I am not really sure what is the usecase here. I assume you know how to online the memoery. That's why you had to play tricks with the zones previously. All you need now is to use the proper MMOP_ONLINE* > By the way: we played around a bit with the changes wrt memory > hotplug. There are a two odd things: > > 1) With the new code I can generate overlapping zones for ZONE_DMA and > ZONE_NORMAL: > > --- new code: > > DMA [mem 0x0000000000000000-0x000000007fffffff] > Normal [mem 0x0000000080000000-0x000000017fffffff] > > # cat /sys/devices/system/memory/block_size_bytes > 10000000 > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > # echo 0 > /sys/devices/system/memory/memory5/online > # cat /sys/devices/system/memory/memory5/valid_zones > Normal > # echo 1 > /sys/devices/system/memory/memory5/online > Normal OK, interesting. I will double check the code. > # cat /proc/zoneinfo > Node 0, zone DMA > spanned 524288 <----- > present 458752 > managed 455078 > start_pfn: 0 <----- > > Node 0, zone Normal > spanned 720896 > present 589824 > managed 571648 > start_pfn: 327680 <----- > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > this restriction is gone? > > --- old code: > > # echo 0 > /sys/devices/system/memory/memory5/online > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > # echo online_movable > /sys/devices/system/memory/memory5/state > -bash: echo: write error: Invalid argument > # echo online_kernel > /sys/devices/system/memory/memory5/state > -bash: echo: write error: Invalid argument > # echo online > /sys/devices/system/memory/memory5/state > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > > > 2) Another oddity is that after a memory block was brought online it's > association to ZONE_NORMAL or ZONE_MOVABLE seems to be fixed. Even if it > is brought offline afterwards: This is intended behavior because I got rid of the tricky&ugly zone shifting code. Ultimately I would like to allow for overlapping zones so the explicit online_{movable,kernel} will _always_ work. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750983AbdE2KLg (ORCPT ); Mon, 29 May 2017 06:11:36 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:46820 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750898AbdE2KLf (ORCPT ); Mon, 29 May 2017 06:11:35 -0400 Date: Mon, 29 May 2017 12:11:28 +0200 From: Heiko Carstens To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170529085231.GE19725@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170529085231.GE19725@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17052910-0020-0000-0000-00000377171A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17052910-0021-0000-0000-000041EAE6C4 Message-Id: <20170529101128.GA12975@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-29_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705290192 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 29, 2017 at 10:52:31AM +0200, Michal Hocko wrote: > > Why is it a problem to change the default for 'online'? As far as I can see > > that doesn't have too much to do with the order of zones, no? > > `online' (aka MMOP_ONLINE_KEEP) should always inherit its current zone. > The previous implementation made an exception to allow to shift to > another zone if it is on the border of two zones. This is what I wanted > to get rid of because it is just too ugly to live. > > But now I am not really sure what is the usecase here. I assume you know > how to online the memoery. That's why you had to play tricks with the > zones previously. All you need now is to use the proper MMOP_ONLINE* Yes, however that implies that existing user space has to be changed to achieve the same semantics as before. That's the usecase I'm talking about. On the other hand this change would finally make s390 behave like all other architectures, which is certainly not a bad thing. So, while thinking again I think you convinced me to agree with this change. > > 2) Another oddity is that after a memory block was brought online it's > > association to ZONE_NORMAL or ZONE_MOVABLE seems to be fixed. Even if it > > is brought offline afterwards: > > This is intended behavior because I got rid of the tricky&ugly zone > shifting code. Ultimately I would like to allow for overlapping zones > so the explicit online_{movable,kernel} will _always_ work. Ok, I see. This change (fixed memory block to zone mapping after first online) is a bit surprising. On the other hand I can't think of a sane usecase why one wants to change the zone a memory block belongs to. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751088AbdE2Kpk (ORCPT ); Mon, 29 May 2017 06:45:40 -0400 Received: from mx2.suse.de ([195.135.220.15]:39723 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750886AbdE2Kpj (ORCPT ); Mon, 29 May 2017 06:45:39 -0400 Date: Mon, 29 May 2017 12:45:37 +0200 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression Message-ID: <20170529104537.GH19725@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170529085231.GE19725@dhcp22.suse.cz> <20170529101128.GA12975@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170529101128.GA12975@osiris> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 29-05-17 12:11:28, Heiko Carstens wrote: > On Mon, May 29, 2017 at 10:52:31AM +0200, Michal Hocko wrote: > > > Why is it a problem to change the default for 'online'? As far as I can see > > > that doesn't have too much to do with the order of zones, no? > > > > `online' (aka MMOP_ONLINE_KEEP) should always inherit its current zone. > > The previous implementation made an exception to allow to shift to > > another zone if it is on the border of two zones. This is what I wanted > > to get rid of because it is just too ugly to live. > > > > But now I am not really sure what is the usecase here. I assume you know > > how to online the memoery. That's why you had to play tricks with the > > zones previously. All you need now is to use the proper MMOP_ONLINE* > > Yes, however that implies that existing user space has to be changed to > achieve the same semantics as before. That's the usecase I'm talking about. Yes that is really unfortunate. It is even more unfortunate how the original behavior got merged without a deeper consideration. > On the other hand this change would finally make s390 behave like all other > architectures, which is certainly not a bad thing. So, while thinking again > I think you convinced me to agree with this change. That is definitely good to hear. Btw. I plan to change the semantic even further. MMOP_ONLINE_KEEP currently ignores movable_node setting and I plan to change that. Hopefully this won't break more userspace... > > > 2) Another oddity is that after a memory block was brought online it's > > > association to ZONE_NORMAL or ZONE_MOVABLE seems to be fixed. Even if it > > > is brought offline afterwards: > > > > This is intended behavior because I got rid of the tricky&ugly zone > > shifting code. Ultimately I would like to allow for overlapping zones > > so the explicit online_{movable,kernel} will _always_ work. > > Ok, I see. This change (fixed memory block to zone mapping after first > online) is a bit surprising. On the other hand I can't think of a sane > usecase why one wants to change the zone a memory block belongs to. Longeterm I would really like to remove any constrains on where to online movable or kernel memory. So even if this will be problem it will be only temporary. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751435AbdE3MSL (ORCPT ); Tue, 30 May 2017 08:18:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:53695 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750946AbdE3MSJ (ORCPT ); Tue, 30 May 2017 08:18:09 -0400 Date: Tue, 30 May 2017 14:18:06 +0200 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression Message-ID: <20170530121806.GD7969@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170526122509.GB14849@osiris> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 26-05-17 14:25:09, Heiko Carstens wrote: [...] > 1) With the new code I can generate overlapping zones for ZONE_DMA and > ZONE_NORMAL: > > --- new code: > > DMA [mem 0x0000000000000000-0x000000007fffffff] > Normal [mem 0x0000000080000000-0x000000017fffffff] > > # cat /sys/devices/system/memory/block_size_bytes > 10000000 > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > # echo 0 > /sys/devices/system/memory/memory5/online > # cat /sys/devices/system/memory/memory5/valid_zones > Normal > # echo 1 > /sys/devices/system/memory/memory5/online > Normal > > # cat /proc/zoneinfo > Node 0, zone DMA > spanned 524288 <----- > present 458752 > managed 455078 > start_pfn: 0 <----- > > Node 0, zone Normal > spanned 720896 > present 589824 > managed 571648 > start_pfn: 327680 <----- > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > this restriction is gone? The patch below should help. > --- old code: > > # echo 0 > /sys/devices/system/memory/memory5/online > # cat /sys/devices/system/memory/memory5/valid_zones > DMA > # echo online_movable > /sys/devices/system/memory/memory5/state > -bash: echo: write error: Invalid argument > # echo online_kernel > /sys/devices/system/memory/memory5/state > -bash: echo: write error: Invalid argument this error doesn't make any sense. Because we we want to online kernel memory and DMA is pretty much the kernel memory > # echo online > /sys/devices/system/memory/memory5/state > # cat /sys/devices/system/memory/memory5/valid_zones > DMA --- >>From 91a432ceb6af9a8f3791d97b6731d2010cbd5b47 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 30 May 2017 13:56:23 +0200 Subject: [PATCH] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone Heiko Carstens has noticed that he can generate overlapping zones for ZONE_DMA and ZONE_NORMAL: DMA [mem 0x0000000000000000-0x000000007fffffff] Normal [mem 0x0000000080000000-0x000000017fffffff] $ cat /sys/devices/system/memory/block_size_bytes 10000000 $ cat /sys/devices/system/memory/memory5/valid_zones DMA $ echo 0 > /sys/devices/system/memory/memory5/online $ cat /sys/devices/system/memory/memory5/valid_zones Normal $ echo 1 > /sys/devices/system/memory/memory5/online Normal $ cat /proc/zoneinfo Node 0, zone DMA spanned 524288 <----- present 458752 managed 455078 start_pfn: 0 <----- Node 0, zone Normal spanned 720896 present 589824 managed 571648 start_pfn: 327680 <----- The reason is that we assume that the default zone for kernel onlining is ZONE_NORMAL. This was a simplification introduced by the memory hotplug rework and it is easily fixable by checking the range overlap in the zone order and considering the first matching zone as the default one. If there is no such zone then assume ZONE_NORMAL as we have been doing so far. Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online" Reported-by: Heiko Carstens Signed-off-by: Michal Hocko --- drivers/base/memory.c | 2 +- include/linux/memory_hotplug.h | 2 ++ mm/memory_hotplug.c | 22 +++++++++++++++++++--- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index b86fda30ce62..c7c4e0325cdb 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev, nid = pfn_to_nid(start_pfn); if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) { - strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name); + strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name); append = true; } diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 9e0249d0f5e4..ed167541e4fc 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type); +extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn, + unsigned long nr_pages); #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0a895df2397e..792c098e0e5f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, { struct pglist_data *pgdat = NODE_DATA(nid); struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; - struct zone *normal_zone = &pgdat->node_zones[ZONE_NORMAL]; + struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages); /* * TODO there shouldn't be any inherent reason to have ZONE_NORMAL @@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, return true; return movable_zone->zone_start_pfn >= pfn + nr_pages; } else if (online_type == MMOP_ONLINE_MOVABLE) { - return zone_end_pfn(normal_zone) <= pfn; + return zone_end_pfn(default_zone) <= pfn; } /* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */ @@ -937,6 +937,22 @@ void __ref move_pfn_range_to_zone(struct zone *zone, set_zone_contiguous(zone); } +struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, + unsigned long nr_pages) +{ + struct pglist_data *pgdat = NODE_DATA(nid); + int zid; + + for (zid = 0; zid < MAX_NR_ZONES; zid++) { + struct zone *zone = &pgdat->node_zones[zid]; + + if (zone_intersects(zone, start_pfn, nr_pages)) + return zone; + } + + return &pgdat->node_zones[ZONE_NORMAL]; +} + /* * Associates the given pfn range with the given node and the zone appropriate * for the given online type. @@ -945,7 +961,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); - struct zone *zone = &pgdat->node_zones[ZONE_NORMAL]; + struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages); if (online_type == MMOP_ONLINE_KEEP) { struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; -- 2.11.0 -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751608AbdE3Mhb (ORCPT ); Tue, 30 May 2017 08:37:31 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41991 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750946AbdE3Mha (ORCPT ); Tue, 30 May 2017 08:37:30 -0400 Date: Tue, 30 May 2017 14:37:24 +0200 From: Heiko Carstens To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530121806.GD7969@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17053012-0040-0000-0000-0000039B5741 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17053012-0041-0000-0000-0000258DE58D Message-Id: <20170530123724.GC4874@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-30_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705300237 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > this restriction is gone? > > The patch below should help. It does fix this specific problem, but introduces a new one: # echo online_movable > /sys/devices/system/memory/memory16/state # cat /sys/devices/system/memory/memory16/valid_zones Movable # echo offline > /sys/devices/system/memory/memory16/state # cat /sys/devices/system/memory/memory16/valid_zones <--- no output Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751761AbdE3OdB (ORCPT ); Tue, 30 May 2017 10:33:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:53880 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751127AbdE3OdA (ORCPT ); Tue, 30 May 2017 10:33:00 -0400 Date: Tue, 30 May 2017 16:32:47 +0200 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression Message-ID: <20170530143246.GJ7969@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530123724.GC4874@osiris> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 30-05-17 14:37:24, Heiko Carstens wrote: > On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > > this restriction is gone? > > > > The patch below should help. > > It does fix this specific problem, but introduces a new one: > > # echo online_movable > /sys/devices/system/memory/memory16/state > # cat /sys/devices/system/memory/memory16/valid_zones > Movable > # echo offline > /sys/devices/system/memory/memory16/state > # cat /sys/devices/system/memory/memory16/valid_zones > <--- no output > > Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. Could you test the this on top please? --- diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 792c098e0e5f..a26f9f8e6365 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -937,13 +937,18 @@ void __ref move_pfn_range_to_zone(struct zone *zone, set_zone_contiguous(zone); } +/* + * Returns a default kernel memory zone for the given pfn range. + * If no kernel zone covers this pfn range it will automatically go + * to the ZONE_NORMAL. + */ struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); int zid; - for (zid = 0; zid < MAX_NR_ZONES; zid++) { + for (zid = 0; zid <= ZONE_NORMAL; zid++) { struct zone *zone = &pgdat->node_zones[zid]; if (zone_intersects(zone, start_pfn, nr_pages)) -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751782AbdE3PEZ (ORCPT ); Tue, 30 May 2017 11:04:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:56988 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751582AbdE3PEY (ORCPT ); Tue, 30 May 2017 11:04:24 -0400 Date: Tue, 30 May 2017 17:04:21 +0200 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression Message-ID: <20170530150421.GM7969@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> <20170530145501.GD4874@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530145501.GD4874@osiris> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 30-05-17 16:55:01, Heiko Carstens wrote: > On Tue, May 30, 2017 at 04:32:47PM +0200, Michal Hocko wrote: > > On Tue 30-05-17 14:37:24, Heiko Carstens wrote: > > > On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > > > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > > > > this restriction is gone? > > > > > > > > The patch below should help. > > > > > > It does fix this specific problem, but introduces a new one: > > > > > > # echo online_movable > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > Movable > > > # echo offline > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > <--- no output > > > > > > Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. > > > > Could you test the this on top please? > > --- > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 792c098e0e5f..a26f9f8e6365 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -937,13 +937,18 @@ void __ref move_pfn_range_to_zone(struct zone *zone, > > set_zone_contiguous(zone); > > } > > > > +/* > > + * Returns a default kernel memory zone for the given pfn range. > > + * If no kernel zone covers this pfn range it will automatically go > > + * to the ZONE_NORMAL. > > + */ > > struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, > > unsigned long nr_pages) > > { > > struct pglist_data *pgdat = NODE_DATA(nid); > > int zid; > > > > - for (zid = 0; zid < MAX_NR_ZONES; zid++) { > > + for (zid = 0; zid <= ZONE_NORMAL; zid++) { > > struct zone *zone = &pgdat->node_zones[zid]; > > > > if (zone_intersects(zone, start_pfn, nr_pages)) > > Still broken, but in different way(s): > > # cat /sys/devices/system/memory/memory16/valid_zones > Normal Movable > # echo online_movable > /sys/devices/system/memory/memory16/state > # cat /sys/devices/system/memory/memory16/valid_zones > Movable > # cat /sys/devices/system/memory/memory18/valid_zones > Movable > # echo online > /sys/devices/system/memory/memory18/state > # cat /sys/devices/system/memory/memory18/valid_zones > Normal <--- should be Movable > # cat /sys/devices/system/memory/memory17/valid_zones > <--- no output OK, I will sit on this tomorrow with a clean head without doing 10 things at the same time. Sorry about your wasted time! -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751264AbdE3Sjh (ORCPT ); Tue, 30 May 2017 14:39:37 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:58135 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750951AbdE3Sje (ORCPT ); Tue, 30 May 2017 14:39:34 -0400 Date: Tue, 30 May 2017 16:55:01 +0200 From: Heiko Carstens To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530143246.GJ7969@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17053014-0020-0000-0000-00000378D031 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17053014-0021-0000-0000-000041EE8C51 Message-Id: <20170530145501.GD4874@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-30_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705300280 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 30, 2017 at 04:32:47PM +0200, Michal Hocko wrote: > On Tue 30-05-17 14:37:24, Heiko Carstens wrote: > > On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > > > this restriction is gone? > > > > > > The patch below should help. > > > > It does fix this specific problem, but introduces a new one: > > > > # echo online_movable > /sys/devices/system/memory/memory16/state > > # cat /sys/devices/system/memory/memory16/valid_zones > > Movable > > # echo offline > /sys/devices/system/memory/memory16/state > > # cat /sys/devices/system/memory/memory16/valid_zones > > <--- no output > > > > Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. > > Could you test the this on top please? > --- > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 792c098e0e5f..a26f9f8e6365 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -937,13 +937,18 @@ void __ref move_pfn_range_to_zone(struct zone *zone, > set_zone_contiguous(zone); > } > > +/* > + * Returns a default kernel memory zone for the given pfn range. > + * If no kernel zone covers this pfn range it will automatically go > + * to the ZONE_NORMAL. > + */ > struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, > unsigned long nr_pages) > { > struct pglist_data *pgdat = NODE_DATA(nid); > int zid; > > - for (zid = 0; zid < MAX_NR_ZONES; zid++) { > + for (zid = 0; zid <= ZONE_NORMAL; zid++) { > struct zone *zone = &pgdat->node_zones[zid]; > > if (zone_intersects(zone, start_pfn, nr_pages)) Still broken, but in different way(s): # cat /sys/devices/system/memory/memory16/valid_zones Normal Movable # echo online_movable > /sys/devices/system/memory/memory16/state # cat /sys/devices/system/memory/memory16/valid_zones Movable # cat /sys/devices/system/memory/memory18/valid_zones Movable # echo online > /sys/devices/system/memory/memory18/state # cat /sys/devices/system/memory/memory18/valid_zones Normal <--- should be Movable # cat /sys/devices/system/memory/memory17/valid_zones <--- no output From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751025AbdEaGYp (ORCPT ); Wed, 31 May 2017 02:24:45 -0400 Received: from mx2.suse.de ([195.135.220.15]:50567 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750779AbdEaGYn (ORCPT ); Wed, 31 May 2017 02:24:43 -0400 Date: Wed, 31 May 2017 08:24:40 +0200 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression Message-ID: <20170531062439.GA3853@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> <20170530145501.GD4874@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530145501.GD4874@osiris> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 30-05-17 16:55:01, Heiko Carstens wrote: > On Tue, May 30, 2017 at 04:32:47PM +0200, Michal Hocko wrote: > > On Tue 30-05-17 14:37:24, Heiko Carstens wrote: > > > On Tue, May 30, 2017 at 02:18:06PM +0200, Michal Hocko wrote: > > > > > So ZONE_DMA ends within ZONE_NORMAL. This shouldn't be possible, unless > > > > > this restriction is gone? > > > > > > > > The patch below should help. > > > > > > It does fix this specific problem, but introduces a new one: > > > > > > # echo online_movable > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > Movable > > > # echo offline > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > <--- no output > > > > > > Memory block 16 is the only one I onlined and offlineto ZONE_MOVABLE. > > > > Could you test the this on top please? > > --- > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 792c098e0e5f..a26f9f8e6365 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -937,13 +937,18 @@ void __ref move_pfn_range_to_zone(struct zone *zone, > > set_zone_contiguous(zone); > > } > > > > +/* > > + * Returns a default kernel memory zone for the given pfn range. > > + * If no kernel zone covers this pfn range it will automatically go > > + * to the ZONE_NORMAL. > > + */ > > struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, > > unsigned long nr_pages) > > { > > struct pglist_data *pgdat = NODE_DATA(nid); > > int zid; > > > > - for (zid = 0; zid < MAX_NR_ZONES; zid++) { > > + for (zid = 0; zid <= ZONE_NORMAL; zid++) { > > struct zone *zone = &pgdat->node_zones[zid]; > > > > if (zone_intersects(zone, start_pfn, nr_pages)) > > Still broken, but in different way(s): > > # cat /sys/devices/system/memory/memory16/valid_zones > Normal Movable > # echo online_movable > /sys/devices/system/memory/memory16/state > # cat /sys/devices/system/memory/memory16/valid_zones > Movable > # cat /sys/devices/system/memory/memory18/valid_zones > Movable > # echo online > /sys/devices/system/memory/memory18/state > # cat /sys/devices/system/memory/memory18/valid_zones > Normal <--- should be Movable > # cat /sys/devices/system/memory/memory17/valid_zones > <--- no output OK, so this is an independent problem and an unrelated one to the patch I've posted. We need two patches actually. Damn, I hate MMOP_ONLINE_KEEP. I will send 2 patches as a reply to this email. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751029AbdEaGZ5 (ORCPT ); Wed, 31 May 2017 02:25:57 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:33126 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750869AbdEaGZ4 (ORCPT ); Wed, 31 May 2017 02:25:56 -0400 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko Subject: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior Date: Wed, 31 May 2017 08:25:45 +0200 Message-Id: <20170531062545.4122-1-mhocko@kernel.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170531062439.GA3853@dhcp22.suse.cz> References: <20170531062439.GA3853@dhcp22.suse.cz> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Michal Hocko Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently $ grep . memory3?/valid_zones memory34/valid_zones:Normal Movable memory35/valid_zones:Normal Movable memory36/valid_zones:Normal Movable memory37/valid_zones:Normal Movable $ echo online_movable > memory34/state $ grep . memory3?/valid_zones memory34/valid_zones:Movable memory35/valid_zones:Movable memory36/valid_zones:Movable memory37/valid_zones:Movable $ echo online > memory36/state $ grep . memory3?/valid_zones memory34/valid_zones:Movable memory36/valid_zones:Normal memory37/valid_zones:Movable so we have effectivelly punched a hole into the movable zone. The problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong. It only checks whether the given range is already part of the movable zone which is not the case here as only memory34 is in the zone. Fix this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that is false then we can be sure that movable onlining is the right thing to do. Reported-by: Heiko Carstens Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online" Signed-off-by: Michal Hocko --- mm/memory_hotplug.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0a895df2397e..b3895fd609f4 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -950,11 +950,12 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, if (online_type == MMOP_ONLINE_KEEP) { struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; /* - * MMOP_ONLINE_KEEP inherits the current zone which is - * ZONE_NORMAL by default but we might be within ZONE_MOVABLE - * already. + * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use + * movable zone if that is not possible (e.g. we are within + * or past the existing movable zone) */ - if (zone_intersects(movable_zone, start_pfn, nr_pages)) + if (!allow_online_pfn_range(nid, start_pfn, nr_pages, + MMOP_ONLINE_KERNEL)) zone = movable_zone; } else if (online_type == MMOP_ONLINE_MOVABLE) { zone = &pgdat->node_zones[ZONE_MOVABLE]; -- 2.11.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751160AbdEaG0N (ORCPT ); Wed, 31 May 2017 02:26:13 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:34791 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750779AbdEaG0M (ORCPT ); Wed, 31 May 2017 02:26:12 -0400 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko Subject: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone Date: Wed, 31 May 2017 08:26:05 +0200 Message-Id: <20170531062605.4347-1-mhocko@kernel.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170531062439.GA3853@dhcp22.suse.cz> References: <20170531062439.GA3853@dhcp22.suse.cz> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Michal Hocko Heiko Carstens has noticed that he can generate overlapping zones for ZONE_DMA and ZONE_NORMAL: DMA [mem 0x0000000000000000-0x000000007fffffff] Normal [mem 0x0000000080000000-0x000000017fffffff] $ cat /sys/devices/system/memory/block_size_bytes 10000000 $ cat /sys/devices/system/memory/memory5/valid_zones DMA $ echo 0 > /sys/devices/system/memory/memory5/online $ cat /sys/devices/system/memory/memory5/valid_zones Normal $ echo 1 > /sys/devices/system/memory/memory5/online Normal $ cat /proc/zoneinfo Node 0, zone DMA spanned 524288 <----- present 458752 managed 455078 start_pfn: 0 <----- Node 0, zone Normal spanned 720896 present 589824 managed 571648 start_pfn: 327680 <----- The reason is that we assume that the default zone for kernel onlining is ZONE_NORMAL. This was a simplification introduced by the memory hotplug rework and it is easily fixable by checking the range overlap in the zone order and considering the first matching zone as the default one. If there is no such zone then assume ZONE_NORMAL as we have been doing so far. Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online" Reported-by: Heiko Carstens Signed-off-by: Michal Hocko --- drivers/base/memory.c | 2 +- include/linux/memory_hotplug.h | 2 ++ mm/memory_hotplug.c | 27 ++++++++++++++++++++++++--- 3 files changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index b86fda30ce62..c7c4e0325cdb 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev, nid = pfn_to_nid(start_pfn); if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) { - strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name); + strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name); append = true; } diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 9e0249d0f5e4..ed167541e4fc 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type); +extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn, + unsigned long nr_pages); #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b3895fd609f4..a0348de3e18c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, { struct pglist_data *pgdat = NODE_DATA(nid); struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; - struct zone *normal_zone = &pgdat->node_zones[ZONE_NORMAL]; + struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages); /* * TODO there shouldn't be any inherent reason to have ZONE_NORMAL @@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, return true; return movable_zone->zone_start_pfn >= pfn + nr_pages; } else if (online_type == MMOP_ONLINE_MOVABLE) { - return zone_end_pfn(normal_zone) <= pfn; + return zone_end_pfn(default_zone) <= pfn; } /* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */ @@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone, } /* + * Returns a default kernel memory zone for the given pfn range. + * If no kernel zone covers this pfn range it will automatically go + * to the ZONE_NORMAL. + */ +struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, + unsigned long nr_pages) +{ + struct pglist_data *pgdat = NODE_DATA(nid); + int zid; + + for (zid = 0; zid <= ZONE_NORMAL; zid++) { + struct zone *zone = &pgdat->node_zones[zid]; + + if (zone_intersects(zone, start_pfn, nr_pages)) + return zone; + } + + return &pgdat->node_zones[ZONE_NORMAL]; +} + +/* * Associates the given pfn range with the given node and the zone appropriate * for the given online type. */ @@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); - struct zone *zone = &pgdat->node_zones[ZONE_NORMAL]; + struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages); if (online_type == MMOP_ONLINE_KEEP) { struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; -- 2.11.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751406AbdFAGuB (ORCPT ); Thu, 1 Jun 2017 02:50:01 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:36395 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751053AbdFAGuA (ORCPT ); Thu, 1 Jun 2017 02:50:00 -0400 Date: Thu, 1 Jun 2017 08:49:54 +0200 From: Heiko Carstens To: Michal Hocko Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> <20170530145501.GD4874@osiris> <20170531062439.GA3853@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170531062439.GA3853@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17060106-0040-0000-0000-000003BE950B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17060106-0041-0000-0000-00002052DD93 Message-Id: <20170601064954.GB7593@osiris> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-01_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706010126 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 31, 2017 at 08:24:40AM +0200, Michal Hocko wrote: > > # cat /sys/devices/system/memory/memory16/valid_zones > > Normal Movable > > # echo online_movable > /sys/devices/system/memory/memory16/state > > # cat /sys/devices/system/memory/memory16/valid_zones > > Movable > > # cat /sys/devices/system/memory/memory18/valid_zones > > Movable > > # echo online > /sys/devices/system/memory/memory18/state > > # cat /sys/devices/system/memory/memory18/valid_zones > > Normal <--- should be Movable > > # cat /sys/devices/system/memory/memory17/valid_zones > > <--- no output > > OK, so this is an independent problem and an unrelated one to the > patch I've posted. We need two patches actually. Damn, I hate > MMOP_ONLINE_KEEP. I will send 2 patches as a reply to this email. Tested with your patches on top of linux-next as of yesterday, however starting at commit fa812e869a6fe7495a17150bb2639075081ef709 ("mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()"), since the "mm: per-lruvec slab stats" patch series breaks everything ;) Tested-by: Heiko Carstens From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751333AbdFAHNT (ORCPT ); Thu, 1 Jun 2017 03:13:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:54846 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751013AbdFAHNS (ORCPT ); Thu, 1 Jun 2017 03:13:18 -0400 Date: Thu, 1 Jun 2017 09:13:10 +0200 From: Michal Hocko To: Heiko Carstens Cc: Gerald Schaefer , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [-next] memory hotplug regression Message-ID: <20170601071310.GA32677@dhcp22.suse.cz> References: <20170524082022.GC5427@osiris> <20170524083956.GC14733@dhcp22.suse.cz> <20170526122509.GB14849@osiris> <20170530121806.GD7969@dhcp22.suse.cz> <20170530123724.GC4874@osiris> <20170530143246.GJ7969@dhcp22.suse.cz> <20170530145501.GD4874@osiris> <20170531062439.GA3853@dhcp22.suse.cz> <20170601064954.GB7593@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170601064954.GB7593@osiris> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 01-06-17 08:49:54, Heiko Carstens wrote: > On Wed, May 31, 2017 at 08:24:40AM +0200, Michal Hocko wrote: > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > Normal Movable > > > # echo online_movable > /sys/devices/system/memory/memory16/state > > > # cat /sys/devices/system/memory/memory16/valid_zones > > > Movable > > > # cat /sys/devices/system/memory/memory18/valid_zones > > > Movable > > > # echo online > /sys/devices/system/memory/memory18/state > > > # cat /sys/devices/system/memory/memory18/valid_zones > > > Normal <--- should be Movable > > > # cat /sys/devices/system/memory/memory17/valid_zones > > > <--- no output > > > > OK, so this is an independent problem and an unrelated one to the > > patch I've posted. We need two patches actually. Damn, I hate > > MMOP_ONLINE_KEEP. I will send 2 patches as a reply to this email. > > Tested with your patches on top of linux-next as of yesterday, however > starting at commit fa812e869a6fe7495a17150bb2639075081ef709 ("mm/zswap.c: > delete an error message for a failed memory allocation in > zswap_dstmem_prepare()"), since the "mm: per-lruvec slab stats" patch > series breaks everything ;) > > Tested-by: Heiko Carstens Thanks a lot for testing! I will post those patches for wider review later today. -- Michal Hocko SUSE Labs