From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57B0ECA9EAF for ; Thu, 24 Oct 2019 20:18:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3388C21655 for ; Thu, 24 Oct 2019 20:18:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726908AbfJXUSF (ORCPT ); Thu, 24 Oct 2019 16:18:05 -0400 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:57130 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725958AbfJXUSE (ORCPT ); Thu, 24 Oct 2019 16:18:04 -0400 Received: from callcc.thunk.org (guestnat-104-133-0-98.corp.google.com [104.133.0.98] (may be forged)) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id x9OKI0Gc002459 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 24 Oct 2019 16:18:01 -0400 Received: by callcc.thunk.org (Postfix, from userid 15806) id 46513420456; Thu, 24 Oct 2019 16:18:00 -0400 (EDT) Date: Thu, 24 Oct 2019 16:18:00 -0400 From: "Theodore Y. Ts'o" To: Xiaohui1 Li =?utf-8?B?5p2O5pmT6L6J?= Cc: "lixiaohui1@xiaomi.corp-partner.google.com" , "linux-ext4@vger.kernel.org" Subject: Re: [PATCH v3 09/13] ext4: fast-commit commit path changes Message-ID: <20191024201800.GE1124@mit.edu> References: <1571900042725.99617@xiaomi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1571900042725.99617@xiaomi.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, Oct 24, 2019 at 06:54:44AM +0000, Xiaohui1 Li 李晓辉 wrote: > > But i also have an idea which can simplify the fast commit patch. > because we want to fix fsync cost too much time problems on our > mobile phone without format the whole ext4 partition , and i found > current fast commit patch can't do this job as it need to > readjustment the layout of journal area and will destroy phone > user's data from my opinion . That's not correct. The fast commit feature can be added to an existing ext4 file system. That's because when the ext4 file system is mounted (or when e2fsck is run) the contents of the file system journal (if any) are replayed and then discard. On a clean shutdown, the journal is empty to begin with. Hence, restructuring the journal so that a portion of the space can be used for fast commits can be done without modifying or otherwise destroying the data on the pre-existing file system. > so my simplify idea is that: > when jbd2 thread begin to commit the current transaction , why not > divide the commiting work into two sub work ? firstly flush metadata > generated by fsynced handles to disk, and then append a commit end > block. and then tell the fsync threads that no need to wait, as > their metadata has already been flush to disk journal area, the > fsync work is finished. and then the second sub work is to > committing metadata and data generated by left handles in current > transaction. The problem, as I stated in my earlier message, is that the handles that were not involved in the fsync in many cases will have been started and completed before the changes reflected by the handles involving the inode to be fsync'ed. We can't just "separate out the handles" and commit the ones that are necessary, and then do the rest in a separate transaction. The problem is entagled dependencies. For example, one of the handles not involved with the fsync may have modified the inode table or the allocation bitmap that is involved with the update to the inode to be fsync'ed. We can't just flush the metadata blocks involved with the "fsync handles", since they will include the modifications made by other file systems via "the rest of the handles." So no, we can't do what you are suggesting. If it were that easy, we would have done it a long time ago. The reason why you can't separate out some of the handles from others is referenced in the LWN article, "Soft Updates, Hard Problems"[1]. What you are suggesting is not exactly soft updates, but it suffers from the same problem, namely that of entangled updates, where the same block is modified by multiple handles. If you track all of the logical dependencies, you could potentially "roll back" in memory those changes which are not yet committed, and then after commit of the "fsync hanldes", roll them forward again. But this is hopelessly complicated to get right. [1] https://lwn.net/Articles/339337/ So if you implemented your suggestion, and the system were to crash between the first and second commit, the file system would be corrupted, and in the worst case, e2fsck might not be able to recover the file system, and all of the user's data would be lost. Of course, if you are sure that your system will never crash, because the kernel is bug-free(tm), then you could skip using the journalling altogher..... - Ted P.S. It's actually a little bit more complicated than that; you also need to worry about power drops, so the battery needs to be embedded, so there is no chance the battery will come flying out when the phone is dropped. The EC also has to be able to give a low-pattery warning so that the system can be shut down cleanly before the battery power goes to zero, and you can't allow the emergency poweroff where the user pushes and holds the power buton for eight seconds. The last, after all, won't be needed because we are making the hopelessly unrealistic assumption that the kernel is completely, 100%, bug-free(tm). :-)