From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA8051FFC46; Thu, 27 Mar 2025 11:17:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743074224; cv=none; b=smCJpZAQ0sG0spZXPSs+2oG0EJ2pE4P9cWWXcZeGYkQDFiWkiQtXnzEsp/obHu02Dtzu6TPojn5xUH5WKtST+JEvcwRPlN0JieG2YvNAXVGCMHM+mIlNaXcuEHh5F3oI6R5xtYqJTpdwTcEglr0Xc18sJT6FP0oYb9DHYMha/Nc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743074224; c=relaxed/simple; bh=DQVHqiuPN11gb2JpuH2GOnPPU4bG0nvnSH4cAgUtCsA=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=lv2f6vCTONQanM4PP9NaQcDz6BigENoEC5MsnJbbscbBD5wWmpZeHC82BQdMTiSRPVkNwFn/r/yJIuFF6r259kvss8zMIWGOBe5P+b1xO+EskYOsFcFYCil8kkatOMFAcP5Cji6N3+9rL4bJZi1juWdr3b/7JAmrhOzX8atRSIw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4ZNgxS3XBPzvWpc; Thu, 27 Mar 2025 19:13:00 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id B17191800B4; Thu, 27 Mar 2025 19:16:57 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 27 Mar 2025 19:16:56 +0800 Message-ID: <076babae-9fc6-13f5-36a3-95dde0115f77@huawei.com> Date: Thu, 27 Mar 2025 19:16:56 +0800 Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch To: David Hildenbrand , , CC: , , , <21cnbao@gmail.com>, , , , Kefeng Wang References: <1720075944-27201-1-git-send-email-yangge1116@126.com> <4119c1d0-5010-b2e7-3f1c-edd37f16f1f2@huawei.com> <91ac638d-b2d6-4683-ab29-fb647f58af63@redhat.com> From: Jinjiang Tu In-Reply-To: <91ac638d-b2d6-4683-ab29-fb647f58af63@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemo200002.china.huawei.com (7.202.195.209) 在 2025/3/26 20:46, David Hildenbrand 写道: > On 26.03.25 13:42, Jinjiang Tu wrote: >> Hi, >> > > Hi! > >> We notiched a 12.3% performance regression for LibMicro pwrite >> testcase due to >> commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before >> adding to LRU batch"). >> >> The testcase is executed as follows, and the file is tmpfs file. >>      pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE > > Do we know how much that reflects real workloads? (IOW, how much > should we care) No, it's hard to say. > >> >> this testcase writes 1KB (only one page) to the tmpfs and repeats >> this step for many times. The Flame >> graph shows the performance regression comes from >> folio_mark_accessed() and workingset_activation(). >> >> folio_mark_accessed() is called for the same page for many times. >> Before this patch, each call will >> add the page to cpu_fbatches.activate. When the fbatch is full, the >> fbatch is drained and the page >> is promoted to active list. And then, folio_mark_accessed() does >> nothing in later calls. >> >> But after this patch, the folio clear lru flags after it is added to >> cpu_fbatches.activate. After then, >> folio_mark_accessed will never call folio_activate() again due to the >> page is without lru flag, and >> the fbatch will not be full and the folio will not be marked active, >> later folio_mark_accessed() >> calls will always call workingset_activation(), leading to >> performance regression. > > Would there be a good place to drain the LRU to effectively get that > processed? (we can always try draining if the LRU flag is not set) Maybe we could drain the search the cpu_fbatches.activate of the local cpu in __lru_cache_activate_folio()? Drain other fbatches is meaningless . > >