From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D73DC32753 for ; Thu, 1 Aug 2019 16:21:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 44479206B8 for ; Thu, 1 Aug 2019 16:21:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731949AbfHAQVz (ORCPT ); Thu, 1 Aug 2019 12:21:55 -0400 Received: from verein.lst.de ([213.95.11.211]:44781 "EHLO verein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727024AbfHAQVy (ORCPT ); Thu, 1 Aug 2019 12:21:54 -0400 Received: by verein.lst.de (Postfix, from userid 2407) id A955168AFE; Thu, 1 Aug 2019 18:21:48 +0200 (CEST) Date: Thu, 1 Aug 2019 18:21:47 +0200 From: Christoph Hellwig To: Matthew Wilcox Cc: Dave Chinner , linux-fsdevel@vger.kernel.org, hch@lst.de, linux-xfs@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 1/2] iomap: Support large pages Message-ID: <20190801162147.GB25871@lst.de> References: <20190731171734.21601-1-willy@infradead.org> <20190731171734.21601-2-willy@infradead.org> <20190731230315.GJ7777@dread.disaster.area> <20190801035955.GI4700@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190801035955.GI4700@bombadil.infradead.org> User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Wed, Jul 31, 2019 at 08:59:55PM -0700, Matthew Wilcox wrote: > - nbits = BITS_TO_LONGS(page_size(page) / SECTOR_SIZE); > - iop = kmalloc(struct_size(iop, uptodate, nbits), > - GFP_NOFS | __GFP_NOFAIL); > - atomic_set(&iop->read_count, 0); > - atomic_set(&iop->write_count, 0); > - bitmap_zero(iop->uptodate, nbits); > + n = BITS_TO_LONGS(page_size(page) >> inode->i_blkbits); > + iop = kmalloc(struct_size(iop, uptodate, n), > + GFP_NOFS | __GFP_NOFAIL | __GFP_ZERO); I am really worried about potential very large GFP_NOFS | __GFP_NOFAIL allocations here. And thinking about this a bit more while walking at the beach I wonder if a better option is to just allocate one iomap per tail page if needed rather than blowing the head page one up. We'd still always use the read_count and write_count in the head page, but the bitmaps in the tail pages, which should be pretty easily doable. Note that we'll also need to do another optimization first that I skipped in the initial iomap writeback path work: We only really need an iomap if the blocksize is smaller than the page and there actually is an extent boundary inside that page. If a (small or huge) page is backed by a single extent we can skip the whole iomap thing. That is at least for now, because I have a series adding optional t10 protection information tuples (8 bytes per 512 bytes of data) to the end of the iomap, which would grow it quite a bit for the PI case, and would make also allocating the updatodate bit dynamically uglies (but not impossible). Note that we'll also need to remove the line that limits the iomap allocation size in iomap_begin to 1024 times the page size to a better chance at contiguous allocations for huge page faults and generally avoid pointless roundtrips to the allocator. It might or might be time to revisit that limit in general, not just for huge pages.