From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF6D5C433EF for ; Mon, 6 Jun 2022 10:23:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 704846B0071; Mon, 6 Jun 2022 06:23:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AF386B0073; Mon, 6 Jun 2022 06:23:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 502F28D0001; Mon, 6 Jun 2022 06:23:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3BA956B0071 for ; Mon, 6 Jun 2022 06:23:03 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1192560826 for ; Mon, 6 Jun 2022 10:23:03 +0000 (UTC) X-FDA: 79547423046.16.0400672 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf04.hostedemail.com (Postfix) with ESMTP id 2A89340041 for ; Mon, 6 Jun 2022 10:22:40 +0000 (UTC) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2567q3wT012559; Mon, 6 Jun 2022 10:16:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=r/7ltCWmf5+CKc8gc8bl1t4b1h7KsBC3K0+0gW7Z0lE=; b=nAk+2lbNVTQW4/Yq+bZG7rmAbRE4ObGjHVHSJ/fOqyE2rAhOSI3OdLM01rE/q4Y9PoNg gSMMv/wdCbYXq/GvT+HBWn6RXqhv8W32xFIQwPSugTPkGrythzT37PL14hBp6cloByiQ 5/5XUi8Gq23vrj7/9G3eyAgyqLEN+HHCOXmbHBh6kKtoUk/aL0TOw2450JejvaCl33Ct IEZvqFfpZ4wn9fOFqmXmleFCTZJk2JLqUXT+nK8vKLgtlNZKPuwWvEN6pqe3ZlK5oVlL ahmuzBrOkqL1l4ZBt+BPQlUdpx9p/tJQBehyxZyYobIuMAWwgCy4kLi69q5iQFJv0EDJ gQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ggh6sd6da-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Jun 2022 10:16:47 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2569oWPH020869; Mon, 6 Jun 2022 10:16:47 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ggh6sd6cn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Jun 2022 10:16:46 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 256A5Wks013910; Mon, 6 Jun 2022 10:16:44 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma03ams.nl.ibm.com with ESMTP id 3gfy19a7g5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Jun 2022 10:16:44 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 256AGgoF22806976 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 6 Jun 2022 10:16:42 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E34F6A4051; Mon, 6 Jun 2022 10:16:41 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CFAA1A404D; Mon, 6 Jun 2022 10:16:37 +0000 (GMT) Received: from [9.43.87.254] (unknown [9.43.87.254]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 6 Jun 2022 10:16:37 +0000 (GMT) Message-ID: Date: Mon, 6 Jun 2022 15:46:36 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [RFC PATCH v4 4/7] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM Content-Language: en-US To: Bharata B Rao , linux-mm@kvack.org, akpm@linux-foundation.org Cc: Huang Ying , Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes References: <20220527122528.129445-1-aneesh.kumar@linux.ibm.com> <20220527122528.129445-5-aneesh.kumar@linux.ibm.com> <5706f5e9-0609-98c9-a0cd-7d96336d73dd@amd.com> <8e651a1e-d189-3e8a-438f-298f21402bd2@linux.ibm.com> From: Aneesh Kumar K V In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: qyDZgh5gwRpVe2aO-bzsSTlAQrg9c9sK X-Proofpoint-ORIG-GUID: LD_gjvEYbJ2LODADHpKqSS83C8Epu5FQ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-06_03,2022-06-03_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0 impostorscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 phishscore=0 lowpriorityscore=0 mlxlogscore=999 bulkscore=0 clxscore=1015 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206060046 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=nAk+2lbN; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf04.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com X-Stat-Signature: z5doxjy8smgeawtbdpyjuce5xy56eq5x X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2A89340041 X-HE-Tag: 1654510960-591371 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/6/22 3:41 PM, Bharata B Rao wrote: > On 6/3/2022 2:34 PM, Aneesh Kumar K V wrote: >> On 6/2/22 12:06 PM, Bharata B Rao wrote: >>> On 6/1/2022 7:19 PM, Aneesh Kumar K V wrote: >>>> On 6/1/22 11:59 AM, Bharata B Rao wrote: >>>>> I was experimenting with this patchset and found this behaviour. >>>>> Here's what I did: >>>>> >>>>> Boot a KVM guest with vNVDIMM device which ends up with device_dax >>>>> driver by default. >>>>> >>>>> Use it as RAM by binding it to dax kmem driver. It now appears as >>>>> RAM with a new NUMA node that is put to memtier1 (the existing tier >>>>> where DRAM already exists) >>>>> >>>> >>>> That should have placed it in memtier2. >>>> >>>>> I can move it to memtier2 (MEMORY_RANK_PMEM) manually, but isn't >>>>> that expected to happen automatically when a node with dax kmem >>>>> device comes up? >>>>> >>>> >>>> This can happen if we have added the same NUMA node to memtier1 before dax kmem driver initialized the pmem memory. Can you check before the above node_set_memory_tier_rank() whether the specific NUMA node is already part of any memory tier? >>> >>> When we reach node_set_memory_tier_rank(), node1 (that has the pmem device) >>> is already part of memtier1 whose nodelist shows 0-1. >>> >> >> can you find out which code path added node1 to memtier1? > > node_set_memory_tier_rank+0x63/0x80 > migrate_on_reclaim_callback+0x40/0x4d > blocking_notifier_call_chain+0x68/0x90 > memory_notify+0x1b/0x20 > online_pages+0x257/0x2f0 > memory_subsys_online+0x99/0x150 > device_online+0x65/0x90 > online_memory_block+0x1b/0x20 > walk_memory_blocks+0x85/0xc0 > ? generic_online_page+0x40/0x40 > add_memory_resource+0x1fa/0x2d0 > add_memory_driver_managed+0x80/0xc0 > dev_dax_kmem_probe+0x1af/0x250 > dax_bus_probe+0x6e/0xa0 > > After this the explicit call to node_set_memory_tier_rank(numa_node, MEMORY_RANK_PMEM) > from dev_dax_kmem_probe() finds that the memtier is already set. > >> Do you have regular memory also appearing on node1? > > No, regular memory is on Node0. > Thanks for the stack trace. I was getting the kvm setup on my laptop to test this. We should move node_set_mem_tier() early. You had automatic online on memory hotplug /* online pages if requested */ if (mhp_default_online_type != MMOP_OFFLINE) walk_memory_blocks(start, size, NULL, online_memory_block); which caused memory to be onlined before we could do node_set_mem_tier. That is a bug on my side. Will send you a change after testing . -aneesh