From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 254CBC433FE for ; Mon, 29 Nov 2021 13:35:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=XepViygXtoSdYYm5QUAnIovzP3tkyII6LLuvA0Lk10I=; b=Og8N0QJz8epVnH QP5XM9GOnGpD2QTit1ewe4iuT9G60W4xXiQ33Jfe3PiWuvtJtpsAmM+QsSZkbHEFd7uC4Ate/Pvfs 9yHbyDQPx/WVuRw9ECQs0IcVKmsFWjr9kvQUdxXEPwzZGH/d0ikNBm4GlyWxkED6eTatkRvgpYYx9 HcfaQOf7VniGU58wKpWVLyvMBoGqANr0qlKe1tQe9qN6Tdz10uyV0TBKojzqYeHx6opFzIlAYzgC6 ewh8oFS/C7Q5o1ZaugZp1kU1eUi/H41cP2qWJCR0BfssKbrLvaVhExvcT8y0bbvy4ViR/CowON0w3 0Egw6ykpoJXI6WgtszOQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mrgnF-000rK3-DH; Mon, 29 Nov 2021 13:34:05 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mrgnB-000rJO-1v for linux-arm-kernel@lists.infradead.org; Mon, 29 Nov 2021 13:34:02 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638192838; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=CFjxye30orMI7i89fFT7gDDtQnI0xfPiALwP4O76QSI=; b=JMuOEUmIgAAhNkhOwdzT6wp1XlI71Z7VoMlN/z5jSCir3ITNmuxoe7/DRwOnH7lpvlAFNB jxk5yVpIw4UjWrAmmYhvFZO0aZKLrklGhMh7xlNENWpkopECrZy7GC3xsodqe6B1xinvSq GGO1BNpi7iC7FTw0LRFsmNPMAFt7wUA= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-79-bp2hCp7GN5eJ2Fh2TRaDOg-1; Mon, 29 Nov 2021 08:33:55 -0500 X-MC-Unique: bp2hCp7GN5eJ2Fh2TRaDOg-1 Received: by mail-wm1-f72.google.com with SMTP id j25-20020a05600c1c1900b00332372c252dso10882837wms.1 for ; Mon, 29 Nov 2021 05:33:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CFjxye30orMI7i89fFT7gDDtQnI0xfPiALwP4O76QSI=; b=Cq5iU17+qGGfzK6WCoOVvx8c8303pjfvv6rSaqEuK1mSwCFyeOXZHK1yFgV4MwMcPG PKJxepdHHWeVWRhaWD53XQ8vCzHV9q45XOT9GkicUYBW2IxKn9quwKkg3Wpl8DvanVP6 IYAZBkyC3eMjldegISD6t78IHwqnAIzMGqopJLrMXTNjrOyTR/7vRV6kaIcFkJtOOaVe kR946/x2w9g22+rtuI3nK7qzLRONHNCvHA5qQwUvrvO4S050N1gPv7ksz5NbaW69VHAW yMwCUxHW0OLXpW7ahxY5a5qBkQ/vYJuvNzH4hCH4QU1tpn+ymm06SvVFW11ZN2JfteyP K3Wg== X-Gm-Message-State: AOAM530VsEAx92aO+TpNUYEZHm/HgA46aoTygBAyzXkT7cquLikw9QBe lr7WqO+XqK9KWKcfSwm8rGS/pJd57yKRdyInwmc8HCEwt7n+cwfGKD6BariJrVM+8/GUBQcCdQK 720MHqKtKM5BiGGl+ANbwL8/6OC0EtaHlPj6fBlLD/fMKhLw7TDY= X-Received: by 2002:a5d:45c4:: with SMTP id b4mr33673336wrs.222.1638192834385; Mon, 29 Nov 2021 05:33:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJyE+OPNlvNuS41ipScxbexU0H/eOprmV9W8IL1xapCUHlkoUBHlCidfnJeg1zWR4MrdMtFwIrb2f23/wIndXyg= X-Received: by 2002:a5d:45c4:: with SMTP id b4mr33673296wrs.222.1638192834063; Mon, 29 Nov 2021 05:33:54 -0800 (PST) MIME-Version: 1.0 References: <20211124192024.2408218-1-catalin.marinas@arm.com> <20211124192024.2408218-4-catalin.marinas@arm.com> <20211127123958.588350-1-agruenba@redhat.com> In-Reply-To: From: Andreas Gruenbacher Date: Mon, 29 Nov 2021 14:33:42 +0100 Message-ID: Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults To: Catalin Marinas Cc: Matthew Wilcox , Linus Torvalds , Josef Bacik , David Sterba , Al Viro , Andrew Morton , Will Deacon , linux-fsdevel , LKML , Linux ARM , linux-btrfs Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=agruenba@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211129_053401_217595_BA88BD18 X-CRM114-Status: GOOD ( 50.14 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Nov 29, 2021 at 1:22 PM Catalin Marinas wrote: > On Sat, Nov 27, 2021 at 07:05:39PM +0100, Andreas Gruenbacher wrote: > > On Sat, Nov 27, 2021 at 4:21 PM Catalin Marinas wrote: > > > That's similar, somehow, to the arch-specific probing in one of my > > > patches: [1]. We could do the above if we can guarantee that the maximum > > > error margin in copy_to_user() is smaller than SUBPAGE_FAULT_SIZE. For > > > arm64 copy_to_user(), it is fine, but for copy_from_user(), if we ever > > > need to handle fault_in_readable(), it isn't (on arm64 up to 64 bytes > > > even if aligned: reads of large blocks are done in 4 * 16 loads, and if > > > one of them fails e.g. because of the 16-byte sub-page fault, no write > > > is done, hence such larger than 16 delta). > > > > > > If you want something in the generic fault_in_writeable(), we probably > > > need a loop over UACCESS_MAX_WRITE_ERROR in SUBPAGE_FAULT_SIZE > > > increments. But I thought I'd rather keep this in the arch-specific code. > > > > I see, that's even crazier than I'd thought. The looping / probing is > > still pretty generic, so I'd still consider putting it in the generic > > code. > > In the arm64 probe_subpage_user_writeable(), the loop is skipped if > !system_supports_mte() (static label). It doesn't make much difference > for search_ioctl() in terms of performance but I'd like the arch code to > dynamically decide when to probe. An arch_has_subpage_faults() static > inline function would solve this. > > However, the above assumes that the only way of probing is by doing a > get_user/put_user(). A non-destructive probe with MTE would be to read > the actual tags in memory and compare them with the top byte of the > pointer. > > There's the CHERI architecture as well. Although very early days for > arm64, we do have an incipient port (https://www.morello-project.org/). > The void __user * pointers are propagated inside the kernel as 128-bit > capabilities. A fault_in() would check whether the address (bottom > 64-bit) is within the range and permissions specified in the upper > 64-bit of the capability. There is no notion of sub-page fault > granularity here and no need to do a put_user() as the check is just > done on the pointer/capability. > > Given the above, my preference is to keep the probing arch-specific. > > > We also still have fault_in_safe_writeable which is more difficult to > > fix, and fault_in_readable which we don't want to leave behind broken, > > either. > > fault_in_safe_writeable() can be done by using get_user() instead of > put_user() for arm64 MTE and probably SPARC ADI (an alternative is to > read the in-memory tags and compare them with the pointer). So we'd keep the existing fault_in_safe_writeable() logic for the actual fault-in and use get_user() to check for sub-page faults? If so, then that should probably also be hidden in arch code. > For CHERI, that's different again since the fault_in_safe_writeable capability > encodes the read/write permissions independently. > > However, do we actually want to change the fault_in_safe_writeable() and > fault_in_readable() functions at this stage? I could not get any of them > to live-lock, though I only tried btrfs, ext4 and gfs2. As per the > earlier discussion, normal files accesses are guaranteed to make > progress. The only problematic one was O_DIRECT which seems to be > alright for the above filesystems (the fs either bails out after several > attempts or uses GUP to read which skips the uaccess altogether). Only gfs2 uses fault_in_safe_writeable(). For buffered reads, progress is guaranteed because failures are at a byte granularity. O_DIRECT reads and writes happen in device block size granularity, but the pages are grabbed with get_user_pages() before the copying happens. So by the time the copying happens, the pages are guaranteed to be resident, and we don't need to loop around fault_in_*(). You've mentioned before that copying to/from struct page bypasses sub-page fault checking. If that is the case, then the checking probably needs to happen in iomap_dio_bio_iter and dio_refill_pages instead. > Happy to address them if there is a real concern, I just couldn't trigger it. Hopefully it should now be clear why you couldn't. One way of reproducing with fault_in_safe_writeable() would be to use that in btrfs instead of fault_in_writeable(), of course. We're not doing any chunked reads from user space with page faults disabled as far as I'm aware right now, so we probably don't have a reproducer for fault_in_readable(). It would still be worth fixing fault_in_readable() to prevent things from blowing up very unexpectedly later, though. Thanks, Andreas > > > Of course, the above fault_in_writeable() still needs the btrfs > > > search_ioctl() counterpart toget_user_pages change the probing on the actual fault > > > address or offset. > > > > Yes, but that change is relatively simple and it eliminates the need > > for probing the entire buffer, so it's a good thing. Maybe you want to > > add this though: > > > > --- a/fs/btrfs/ioctl.c > > +++ b/fs/btrfs/ioctl.c > > @@ -2202,3 +2202,3 @@ static noinline int search_ioctl(struct inode *inode, > > unsigned long sk_offset = 0; > > - char __user *fault_in_addr; > > + char __user *fault_in_addr, *end; > > > > @@ -2230,6 +2230,6 @@ static noinline int search_ioctl(struct inode *inode, > > fault_in_addr = ubuf; > > + end = ubuf + *buf_size; > > while (1) { > > ret = -EFAULT; > > - if (fault_in_writeable(fault_in_addr, > > - *buf_size - (fault_in_addr - ubuf))) > > + if (fault_in_writeable(fault_in_addr, end - fault_in_addr)) > > break; > > Thanks, I'll add it. > > -- > Catalin > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel