From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=DUYh=S2=lists.infradead.org=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8FDEFC282CE
	for <infradead-linux-arm-kernel@archiver.kernel.org>; Wed, 24 Apr 2019 05:59:43 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 5ABD42148D
	for <infradead-linux-arm-kernel@archiver.kernel.org>; Wed, 24 Apr 2019 05:59:43 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="idYObjpY"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5ABD42148D
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20170209; h=Sender:
	Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:
	Message-ID:References:To:Subject:From:Reply-To:Content-ID:Content-Description
	:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=hXU5ph7SaULQi5G02tBM01mJOAmP3eOskR5WB080Vb0=; b=idYObjpYts+lPm
	2jHCk9WN20QpKa9WmFhUZ446BuaY7hQiUkEIq5SujTFlXVWA2f676VPpqvLDpvuVfFW9PKSN5OnVD
	6IuzJHoYGaKveKAtKVsreAinskeYR8QOMEc72jHDXWp8uMhmuN8SmBx4fOIXijx22Qx6fuWexiNln
	HLeC41y0R2rmlNixwJCj+d3qWuEUGBlDxZjthwrja9Ht3UgSVesK2WHWLH+8W94tkH1eKwjbAnMqA
	28kdy/KiaE0XMRFyvBR+ZTMFRS3+Qb4BXiA58B4zBmemQBVewOU/n3zvyhOnZUV88Fv1YPknJgwyw
	6FNI0y13SahlidsAO+Hg==;
Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux))
	id 1hJAwU-0003Oc-VK; Wed, 24 Apr 2019 05:59:38 +0000
Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]
 helo=foss.arm.com)
 by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux))
 id 1hJAwR-0003OG-Kv
 for linux-arm-kernel@lists.infradead.org; Wed, 24 Apr 2019 05:59:37 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3AFDCA78;
 Tue, 23 Apr 2019 22:59:35 -0700 (PDT)
Received: from [10.163.1.68] (unknown [10.163.1.68])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7260F3F5AF;
 Tue, 23 Apr 2019 22:59:27 -0700 (PDT)
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: Re: [PATCH V2 2/2] arm64/mm: Enable memory hot remove
To: Mark Rutland <mark.rutland@arm.com>
References: <1555221553-18845-1-git-send-email-anshuman.khandual@arm.com>
 <1555221553-18845-3-git-send-email-anshuman.khandual@arm.com>
 <20190415134841.GC13990@lakrids.cambridge.arm.com>
 <2faba38b-ab79-2dda-1b3c-ada5054d91fa@arm.com>
 <20190417142154.GA393@lakrids.cambridge.arm.com>
 <bba0b71c-2d04-d589-e2bf-5de37806548f@arm.com>
 <20190417173948.GB15589@lakrids.cambridge.arm.com>
 <1bdae67b-fcd6-7868-8a92-c8a306c04ec6@arm.com>
 <97413c39-a4a9-ea1b-7093-eb18f950aad7@arm.com>
 <20190423160525.GD56999@lakrids.cambridge.arm.com>
Message-ID: <ebb9aba0-5ca3-41ed-4183-9d72a354f529@arm.com>
Date: Wed, 24 Apr 2019 11:29:28 +0530
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <20190423160525.GD56999@lakrids.cambridge.arm.com>
Content-Language: en-US
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20190423_225935_699903_DF548B75 
X-CRM114-Status: GOOD (  30.67  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Cc: cai@lca.pw, mhocko@suse.com, ira.weiny@intel.com, david@redhat.com,
 catalin.marinas@arm.com, will.deacon@arm.com, linux-kernel@vger.kernel.org,
 linux-mm@kvack.org, logang@deltatee.com, james.morse@arm.com,
 cpandya@codeaurora.org, arunks@codeaurora.org, akpm@linux-foundation.org,
 osalvador@suse.de, mgorman@techsingularity.net, dan.j.williams@intel.com,
 linux-arm-kernel@lists.infradead.org, robin.murphy@arm.com
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On 04/23/2019 09:35 PM, Mark Rutland wrote:
> On Tue, Apr 23, 2019 at 01:01:58PM +0530, Anshuman Khandual wrote:
>> Generic usage for init_mm.pagetable_lock
>>
>> Unless I have missed something else these are the generic init_mm kernel page table
>> modifiers at runtime (at least which uses init_mm.page_table_lock)
>>
>> 	1. ioremap_page_range()		/* Mapped I/O memory area */
>> 	2. apply_to_page_range()	/* Change existing kernel linear map */
>> 	3. vmap_page_range()		/* Vmalloc area */
> 
> Internally, those all use the __p??_alloc() functions to handle racy
> additions by transiently taking the PTL when installing a new table, but
> otherwise walk kernel tables _without_ the PTL held. Note that none of
> these ever free an intermediate level of table.

Right they dont free intermediate level page table but I was curious about the
only the leaf level modifications.

> 
> I believe that the idea is that operations on separate VMAs should never

I guess you meant kernel virtual range with 'VMA' but not the actual VMA which is
vm_area_struct applicable only for the user space not the kernel.

> conflict at the leaf level, and operations on the same VMA should be
> serialised somehow w.r.t. that VMA.

AFAICT see there is nothing other than hotplug lock i.e mem_hotplug_lock which
prevents concurrent init_mm modifications and the current situation is only safe
because some how these VA areas dont overlap with respect to intermediate page
table level spans.

> 
> AFAICT, these functions are _never_ called on the linear/direct map or
> vmemmap VA ranges, and whether or not these can conflict with hot-remove
> is entirely dependent on whether those ranges can share a level of table
> with the vmalloc region.

Right but all these VA ranges (linear, vmemmap, vmalloc) are wired in on init_mm
hence wondering if it is prudent to assume layout scheme which varies a lot based
on different architectures while deciding possible race protections. Wondering why
these user should not call [get|put]_online_mems() to prevent race with hotplug.
Will try this out.

Unless generic MM expects these VA ranges (linear, vmemmap, vmalloc) layout to be
in certain manner from the platform guaranteeing non-overlap at intermediate level
page table spans. Only then we would not a lock.
 
> 
> Do you know how likely that is to occur? e.g. what proportion of the

TBH I dont know.

> vmalloc region may share a level of table with the linear or vmemmap
> regions in a typical arm64 or x86 configuration? Can we deliberately
> provoke this failure case?

I have not enumerated those yet but there are multiple configs on arm64 and
probably on x86 which decides kernel VA space layout causing these potential
races. But regardless its not right to assume on vmalloc range span and not
take a lock.

Not sure how to provoke this failure case from user space with simple hotplug
because vmalloc physical allocation normally cannot be controlled without a
hacked kernel change.

> 
> [...]
> 
>> In all of the above.
>>
>> - Page table pages [p4d|pud|pmd|pte]_alloc_[kernel] settings are
>>   protected with init_mm.page_table_lock
> 
> Racy addition is protect in this manner.

Right.

> 
>> - Should not it require init_mm.page_table_lock for all leaf level
>>   (PUD|PMD|PTE) modification as well ?
> 
> As above, I believe that the PTL is assumed to not be necessary there
> since other mutual exclusion should be in effect to prevent racy
> modification of leaf entries.

Wondering what are those mutual exclusions other than the memory hotplug lock.
Again if its on kernel VA space layout assumptions its not a good idea.

> 
>> - Should not this require init_mm.page_table_lock for page table walk
>>   itself ?
>>
>> Not taking an overall lock for all these three operations will
>> potentially race with an ongoing memory hot remove operation which
>> takes an overall lock as proposed. Wondering if this has this been
>> safe till now ?
> 
> I suspect that the answer is that hot-remove is not thoroughly
> stress-tested today, and conflicts are possible but rare.

Will make these generic modifiers call [get|put]_online_mems() in a separate
patch at least to protect themselves from memory hot remove operation.

> 
> As above, can we figure out how likely conflicts are, and try to come up
> with a stress test?

Will try something out by hot plugging a memory range without actually onlining it
while there is another vmalloc stress running on the system.

> 
> Is it possible to avoid these specific conflicts (ignoring ptdump) by
> aligning VA regions such that they cannot share intermediate levels of
> table?

Kernel VA space layout is platform specific where core MM does not mandate much. 
Hence generic modifiers should not make any assumptions regarding it but protect
themselves with locks. Doing any thing other than that is just pushing the problem
to future.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel