From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0D75625 for ; Mon, 3 Jun 2024 00:30:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374632; cv=none; b=giVVIy4LxguDbCeAJlEG2WEskrn7ny7VnOTuo5wqRwbDtocSlC6pthi4ljyyoVtZ3K4R3PlcyEzQa+BYfqGnjNRHlSOAnbnmA2vw/bZ+Og2Q645pRxmgGXDoy1I+cHy7lmvyiHPd98ZOjlBk9H51/t0ZyGAIl4gz51UwN8lm3KU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374632; c=relaxed/simple; bh=RAggnMUrVO5BBpQOO1v9g/misTf4on1cIogrVX+3rTs=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=WypxtFzathYU0L/eT1s2sYNH4VMUuDNF5A8M0pWDEuh16aZ/oesKoEsbIhDhAE2RnFklwtlig3ZjFxl8HLy6Mo/x98kwfvJJHSARjQo5AyfNwwbXUILZrGAGTWQh+0bCuJSl3GMqgN66sqzoT5grpIw23PX6hMmhVTkLv821/tQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oJekQGHw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oJekQGHw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9670BC2BBFC; Mon, 3 Jun 2024 00:30:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717374632; bh=RAggnMUrVO5BBpQOO1v9g/misTf4on1cIogrVX+3rTs=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=oJekQGHw8UL+KNpsmNJ+b0dMuiMB7EIaymV5/3JcoLI3b6l7twD9oyPPuEVwc2bGO pYgr3fZN3F2+WXRlxTWxOoLuSklN+89JEHlf59+TA0s31H5CWWdnih4RwuKsQ3/ded hBGu7hacNFQIl5MunjQxuDdQeDbSO6hmiUuBo0Q6CRDwnjImJCDhGPZdf5QWFKSt8/ oRkErk6sblrRPPNt42Z15dp6MYtgOhf5oHX4x4U2mgpj0mGVD/7MjzWRbR7OoiMqS1 rk+9Ip5Fl5UwMo+ySO1Y0B6DgAGtsHWGHrjtkfU6Odv1BbX/161Pg0L/XnsqLq7KRl YS1SDCqC+OgWg== Message-ID: Date: Mon, 3 Jun 2024 09:30:30 +0900 Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] zonefs: move super block reading from page to folio To: Matthew Wilcox Cc: "Darrick J. Wong" , Johannes Thumshirn , linux-fsdevel@vger.kernel.org, Johannes Thumshirn References: <20240514152208.26935-1-jth@kernel.org> <20240531011616.GA52973@frogsfrogsfrogs> <5eedc500-5d85-4e41-87b5-61901ca59847@kernel.org> From: Damien Le Moal Content-Language: en-US Organization: Western Digital Research In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 6/2/24 02:51, Matthew Wilcox wrote: > On Fri, May 31, 2024 at 10:28:50AM +0900, Damien Le Moal wrote: >>>> This will stop working at some point. It'll return NULL once we get >>>> to the memdesc future (because the memdesc will be a slab, not a folio). >>> >>> Hmmm, xfs_buf.c plays a similar trick here for sub-page buffers. I'm >>> assuming that will get ported to ... whatever the memdesc future holds? > > I don't think it does, exactly? Are you referring to kmem_to_page()? > That will continue to work. You're not trying to get a folio from a > slab allocation; that will start to fail. > >>>> I think the right way to handle this is to call read_mapping_folio(). >>>> That will allocate a folio in the page cache for you (obeying the >>>> minimum folio size). Then you can examine the contents. It should >>>> actually remove code from zonefs. Don't forget to call folio_put() >>>> when you're done with it (either at unmount or at the end of mount if >>>> you copy what you need elsewhere). >>> >>> The downside of using bd_mapping is that userspace can scribble all over >>> the folio contents. For zonefs that's less of a big deal because it >>> only reads it once, but for everyone else (e.g. ext4) it's been a huge >> >> Yes, and zonefs super block is read-only, we never update it after formatting. >> >>> problem. I guess you could always do max(ZONEFS_SUPER_SIZE, >>> block_size(sb->s_bdev)) if you don't want to use the pagecache. >> >> Good point. ZONEFS_SUPER_SIZE is 4K and given that I only know of 512e and 4K >> zoned block devices, this is not an issue yet. But better safe than sorry, so >> doing the max() thing you propose is better. Will patch that. > > I think you should use read_mapping_folio() for now instead of > complicating zonefs. Once there's a grand new buffer cache, switch to > that, but I don't think you're introducing a significant vulnerability > by using the block device's page cache. I was not really thinking about vulnerability here, but rather compatibility with devices having a block size larger than 4K... But given that these are rare (at best), a fix for a more intelligent ZONEFS_SUPER_SIZE is not urgent, and not hard at all anyway. -- Damien Le Moal Western Digital Research