From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28E0CC4332F for ; Wed, 5 Oct 2022 07:24:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229734AbiJEHYj (ORCPT ); Wed, 5 Oct 2022 03:24:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229573AbiJEHYi (ORCPT ); Wed, 5 Oct 2022 03:24:38 -0400 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5449CCE; Wed, 5 Oct 2022 00:24:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1664954676; x=1696490676; h=from:to:cc:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version:subject; bh=sAnwOBup503KhTsZYl8bBrVGrc8jgV4o1uRbH+z5bFA=; b=ma0xgp4812ggD9BvCDWYXiXfq6cjdXO6KgE41RxIA+KS/1QGdcUI65pJ vJg8q7/YP4p+y3el395U+4qJu+7giRQnLRvPUwszu4LRCVe5z5r8kXlUD Tgs6/fVdkcE+fDCEhvxRUJbbsNS4roXK1S5H7YOtY5B0jpHqQYnSbuE7Y U=; X-IronPort-AV: E=Sophos;i="5.95,159,1661817600"; d="scan'208";a="251968327" Subject: RE: significant drop fio IOPS performance on v5.10 Thread-Topic: significant drop fio IOPS performance on v5.10 Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1d-1c3c2014.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Oct 2022 07:24:24 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-iad-1d-1c3c2014.us-east-1.amazon.com (Postfix) with ESMTPS id 393D7E018F; Wed, 5 Oct 2022 07:24:21 +0000 (UTC) Received: from EX19D045UWC004.ant.amazon.com (10.13.139.203) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1497.38; Wed, 5 Oct 2022 07:24:21 +0000 Received: from EX19D017UWC003.ant.amazon.com (10.13.139.227) by EX19D045UWC004.ant.amazon.com (10.13.139.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.12; Wed, 5 Oct 2022 07:24:21 +0000 Received: from EX19D017UWC003.ant.amazon.com ([fe80::78e9:1d67:81fd:68c5]) by EX19D017UWC003.ant.amazon.com ([fe80::78e9:1d67:81fd:68c5%6]) with mapi id 15.02.1118.012; Wed, 5 Oct 2022 07:24:21 +0000 From: "Lu, Davina" To: Theodore Ts'o CC: "linux-ext4@vger.kernel.org" , "adilger.kernel@dilger.ca" , "regressions@lists.linux.dev" , "stable@vger.kernel.org" , "Mohamed Abuelfotoh, Hazem" , hazem ahmed mohamed , "Ritesh Harjani (IBM)" , "Kiselev, Oleg" , "Liu, Frank" Thread-Index: AdjO5vTzPItE0he4TsCHXHwBRGK6iAB1NzvQAJElJTAAIpWHgAE+4VbA Date: Wed, 5 Oct 2022 07:24:21 +0000 Message-ID: References: <357ace228adf4e859df5e9f3f4f18b49@amazon.com> <1cdc68e6a98d4e93a95be5d887bcc75d@amazon.com> <5c819c9d6190452f9b10bb78a72cb47f@amazon.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.43.160.95] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi Ted, Thanks for your reply! Sorry for the late reply since I was busy on another= urgent EXT4 issue recently. Understood the concern about the patch here-- the patch is just a simple st= art and good suggestions on to make it safe update in multiple CPUs. 1. Max thread number could not really need 256. From my side, ~30 could be= good enough and this only apply with cpu_num >=3D some_threads (E.g. 16), = we can do this in a sysctl/mount option etc... 2. Definitely agree that we need to make safe way and thanks for clear expl= anation. A quick question, so if we make sure converting unwritten extents= on same inode is in "atomic" way, then it should be a good patch to start= with? About the larger the journal size -- which can help this IOPS issue, I foun= d is, with small journal size with bad IOPS, journal transaction are more f= requently (e.g. from 450 to 700 in a period of time) and each transaction h= as very less handlers (e.g from 1000 to 70). Tracing the code, I found: the reserved block exists in each journal update= in EXT4 which requires more blocks, then in=20 add_transaction_credits() if (jbd2_log_space_left(journal) < journal->j_max_transaction_buffers) { //= since journal size is too small, it has less change to goes into this if to= start an journal update. if (jbd2_log_space_left(journal) < journal->j_max_transaction_= buffers)=20 __jbd2_log_wait_for_space(journal); // good to trigger c= ommit->=20 jbd2_log_do_checkpoint() -> jbd2_log_start_commit() -> __jbd2_log_start_commit -> journal->j_commit_request = =3D target; So most of time, in this func, it has to wait for journal commit... The slo= wer journal update could affect the IO performance from my understanding (f= io test is like: one pwrite64 and then followed one fsync). Therefore if we= larger the journal size from 128M to 2GB, mostly the "if" condition can go= into to ease this problem. Thanks Davina -----Original Message----- From: Theodore Ts'o =20 Sent: Thursday, September 29, 2022 8:36 AM To: Lu, Davina Cc: linux-ext4@vger.kernel.org; adilger.kernel@dilger.ca; regressions@lists= .linux.dev; stable@vger.kernel.org; Mohamed Abuelfotoh, Hazem ; hazem ahmed mohamed ; Ritesh Ha= rjani (IBM) ; Kiselev, Oleg ; L= iu, Frank Subject: RE: [EXTERNAL]significant drop fio IOPS performance on v5.10 CAUTION: This email originated from outside of the organization. Do not cli= ck links or open attachments unless you can confirm the sender and know the= content is safe. On Wed, Sep 28, 2022 at 06:07:34AM +0000, Lu, Davina wrote: > > Hello, > > My analyzing is that the degradation is introduce by commit=20 > {244adf6426ee31a83f397b700d964cff12a247d3} and the issue is the=20 > contention on rsv_conversion_wq. The simplest option is to increase=20 > the journal size, but that introduces more operational complexity. > Another option is to add the following change in attachment "allow=20 > more ext4-rsv-conversion workqueue.patch" Hi, thanks for the patch. However, I don't believe as written it is safe. = That's because your patch will allow multiple CPU's to run ext4_flush_comp= leted_IO(), and this function is not set up to be safe to be run concurrent= ly. That is, I don't see the necessary locking if we have two CPU's trying= to convert unwritten extents on the same inode. It could be made safe, and certainly if the problem is that you are worried= about contention across multiple inodes being written to by different FIO = jobs, then certainly this could be a potential contention issue. However, what doesn't make sense is that increasing the journal size also s= eems to fix the issue for you. That implies the problem is one of the jour= nal being to small, and so you are running into an issue of stalls caused b= y the need to do a synchronous checkpoint to clear space in the journal. T= hat is a different problem than one of there being a contention problem wit= h rsv_conversion_wq. So I want to make sure we understand what you are seeing before we make suc= h a change. One potential concern is that will cause a large number of add= itional kernel threads. Now, if these extra kernel threads are doing usefu= l work, then that's not necessarily an issue. But if not, then it's going to be burning a large amount of kernel memory (= especially for a system with a large number of CPU cores). Regards, - Ted