From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EDF6C433E1 for ; Mon, 17 Aug 2020 16:45:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D12B5206FA for ; Mon, 17 Aug 2020 16:45:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PMx8C5C1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D12B5206FA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 683F16B0008; Mon, 17 Aug 2020 12:45:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6344B6B000A; Mon, 17 Aug 2020 12:45:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FC006B000E; Mon, 17 Aug 2020 12:45:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 371316B0008 for ; Mon, 17 Aug 2020 12:45:01 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DB89A180AD822 for ; Mon, 17 Aug 2020 16:45:00 +0000 (UTC) X-FDA: 77160635160.22.tail53_1904de027018 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id A0285180703DB for ; Mon, 17 Aug 2020 16:45:00 +0000 (UTC) X-HE-Tag: tail53_1904de027018 X-Filterd-Recvd-Size: 10602 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Mon, 17 Aug 2020 16:44:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1597682699; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=pcYkyPWBVcznTpwnbufrxvhUpqlxWGs89nYJmr3blPU=; b=PMx8C5C16UyCFxTtRB68buJzH2STR/GmWFMGY1qzoPruGOHuryXO2U/fcDwYxbdzBQhb4h hrleRd6ghEMUpvAe2SYrX7j4cEl5AsCv0PqyYJMdpqutgG/0d65qYKbYMtML7zcxlcq92i HEQKw95INRFlYzc/sPCY1XsptAWtGHE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-546-Uteqv4KXPl2sDEQ1Gn6yoA-1; Mon, 17 Aug 2020 12:44:55 -0400 X-MC-Unique: Uteqv4KXPl2sDEQ1Gn6yoA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6DE0E100CF64; Mon, 17 Aug 2020 16:44:53 +0000 (UTC) Received: from [10.36.113.111] (ovpn-113-111.ams2.redhat.com [10.36.113.111]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6725210098A5; Mon, 17 Aug 2020 16:44:51 +0000 (UTC) Subject: Re: [RFC 0/7] Support high-order page bulk allocation To: Minchan Kim Cc: Andrew Morton , linux-mm , Joonsoo Kim , Vlastimil Babka , John Dias , Suren Baghdasaryan , pullip.cho@samsung.com References: <20200814173131.2803002-1-minchan@kernel.org> <4e2bd095-b693-9fed-40e0-ab538ec09aaa@redhat.com> <20200817152706.GB3852332@google.com> <20200817163018.GC3852332@google.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= mQINBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABtCREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT6JAlgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63W5Ag0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAGJAjwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat GmbH Message-ID: Date: Mon, 17 Aug 2020 18:44:50 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200817163018.GC3852332@google.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0.002 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 X-Rspamd-Queue-Id: A0285180703DB X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 17.08.20 18:30, Minchan Kim wrote: > On Mon, Aug 17, 2020 at 05:45:59PM +0200, David Hildenbrand wrote: >> On 17.08.20 17:27, Minchan Kim wrote: >>> On Sun, Aug 16, 2020 at 02:31:22PM +0200, David Hildenbrand wrote: >>>> On 14.08.20 19:31, Minchan Kim wrote: >>>>> There is a need for special HW to require bulk allocation of >>>>> high-order pages. For example, 4800 * order-4 pages. >>>>> >>>>> To meet the requirement, a option is using CMA area because >>>>> page allocator with compaction under memory pressure is >>>>> easily failed to meet the requirement and too slow for 4800 >>>>> times. However, CMA has also the following drawbacks: >>>>> >>>>> * 4800 of order-4 * cma_alloc is too slow >>>>> >>>>> To avoid the slowness, we could try to allocate 300M contiguous >>>>> memory once and then split them into order-4 chunks. >>>>> The problem of this approach is CMA allocation fails one of the >>>>> pages in those range couldn't migrate out, which happens easily >>>>> with fs write under memory pressure. >>>> >>>> Why not chose a value in between? Like try to allocate MAX_ORDER - 1 >>>> chunks and split them. That would already heavily reduce the call fr= equency. >>> >>> I think you meant this: >>> >>> alloc_pages(GFP_KERNEL|__GFP_NOWARN, MAX_ORDER - 1) >>> >>> It would work if system has lots of non-fragmented free memory. >>> However, once they are fragmented, it doesn't work. That's why we hav= e >>> seen even order-4 allocation failure in the field easily and that's w= hy >>> CMA was there. >>> >>> CMA has more logics to isolate the memory during allocation/freeing a= s >>> well as fragmentation avoidance so that it has less chance to be stea= led >>> from others and increase high success ratio. That's why I want this A= PI >>> to be used with CMA or movable zone. >> >> I was talking about doing MAX_ORDER - 1 CMA allocations instead of one >> big 300M allocation. As you correctly note, memory placed into CMA >> should be movable, except for (short/long) term pinnings. In these >> cases, doing allocations smaller than 300M and splitting them up shoul= d >> be good enough to reduce the call frequency, no? >=20 > I should have written that. The 300M I mentioned is really minimum size= . > In some scenraio, we need way bigger than 300M, up to several GB. > Furthermore, the demand would be increased in near future. And what will the driver do with that data besides providing it to the device? Can it be mapped to user space? I think we really need more information / the actual user. >> >>> >>> A usecase is device can set a exclusive CMA area up when system boots= . >>> When device needs 4800 * order-4 pages, it could call this bulk again= st >>> of the area so that it could effectively be guaranteed to allocate >>> enough fast. >> >> Just wondering >> >> a) Why does it have to be fast? >=20 > That's because it's related to application latency, which ends up > user feel bad. Okay, but in theory, your device-needs are very similar to application-needs, besides you requiring order-4 pages, correct? Similar to an application that starts up and pins 300M (or more), just with ordr-4 pages. I don't get quite yet why you need a range allocator for that. Because you intend to use CMA? >=20 >> b) Why does it need that many order-4 pages? >=20 > It's HW requirement. I couldn't say much about that. Hm. >=20 >> c) How dynamic is the device need at runtime? >=20 > Whenever the application launched. It depends on user's usage pattern. >=20 >> d) Would it be reasonable in your setup to mark a CMA region in a way >> such that it will never be used for other (movable) allocations, >=20 > I don't get your point. If we don't want the area to used up for > other movable allocation, why should we use it as CMA first? > It sounds like reserved memory and just wasted the memory. Right, it's just very hard to get what you are trying to achieve without the actual user at hand. For example, will the pages you allocate be movable? Does the device allow for that? If not, then the MOVABLE zone is usually not valid (similar to gigantic pages not being allocated from the MOVABLE zone). So your stuck with the NORMAL zone or CMA. Especially for the NORMAL zone, alloc_contig_range() is currently not prepared to properly handle sub-MAX_ORDER - 1 ranges. If any involved pageblock contains an unmovable page, the allcoation will fail (see pageblock isolation / has_unmovable_pages()). So CMA would be your only option. --=20 Thanks, David / dhildenb