From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86F841FCD07 for ; Wed, 5 Mar 2025 23:28:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741217316; cv=none; b=ZFUbUbzNw8cRcGMuJmQRjenNZS/jlyyf/MnygUd+VGFKryrWOxjvfuojtemZR6WO8a4UscDmY3180ZMAkp59Tk3pXuVBmAvACF+Mia0dZNidijtvHplWQ2Byo9NjJYSkVvXYz1dlwqHnNRvvUqUhyE88q0epgeCMXfj/rynbVLU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741217316; c=relaxed/simple; bh=GuotUEdTdzd6EBJX01Srja4znm1zYonmGUZiB16PZMs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ejNc9L9yQO9kkCI4jltMh9swYCTf7aT5HIBq6vBafHyiBjDFz0otThKt53iLwNX4gWCInTx9sRB8F0Go+wDUo3DkQkKw9jFdwxla/tZ+4QYyZIxaER4Afw0LQNC1n/JgDZ3JWEc798FRNs7pjVxl7JaZsWmFRDY/UHEdi9squOM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=GFry4HaZ; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="GFry4HaZ" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-223cc017ef5so342005ad.0 for ; Wed, 05 Mar 2025 15:28:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1741217314; x=1741822114; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=wL2Q7RiMVp9lPxvJ4ysXYxEXDzC1lyqdd5cwfIH0CYg=; b=GFry4HaZkHlqf8ml4sAhfMZackLMhLdWPt7Wdp4Qf/10O01yIA6mzxgEqb7Fgovsvr emZbaqJf/wv/IpwK/2FOfBBENjf4gFHEgsn7bUyYsuieSGhxhq/vsXEq3oNg+nd0vEwR gmIMfetNcz+U3ko+cEbUXlRiQHkEdqmTYvY0SsAseDJl/d5KFTavNwF7KuhgOYoM+KTT 8Ffvja52pvHJEcuz73xAWKhfYIPpl27Dk+PwNhx3k2BQviNj2Ab3+CydwAGYlEnmAD3F 9rw85xa4njskFkhEjMHZUcaE6IHPyZCalgd12NcC6KclGucWSaJPrwu3IQUJBsgBOM38 b24w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741217314; x=1741822114; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=wL2Q7RiMVp9lPxvJ4ysXYxEXDzC1lyqdd5cwfIH0CYg=; b=QLPr97c4+PH8YU53sz6iVcwnsDoWiFF5xVXTK+5TgFSw79qYHqM+QYxk5TwGTSeMqc cOYoG9lSKu2VpH4Bip37iby51VCOervRMswCUZisppdYp5B7RdNEw3mNtkENd4HbRVEU sY4zBDCOXv99eKTfI93cTOYW8shH0Nv+Ne+j7ZlW0TRwKayk01D0G32WpriO6tg5bpts WVrI3s+38A/RSnKD1e/IEK9rkheXjoeMoHjqKWocHeppwN4DBXFSHw37ZngdR7w/q/V6 fiX6wWG520WHpTezZ6T3DUhF7gFa3bTKH5P0PLE5SDGLVRR4KgDhmft/KEUrbIOgI7mS UIGg== X-Forwarded-Encrypted: i=1; AJvYcCXsGfYx6+lYQLds3YTFgVaTnZseQ6NQZT5i7YovLyhEC9cNLAwUZ3iSH6l1omOTZG4ytQGpm6b1Euo=@vger.kernel.org X-Gm-Message-State: AOJu0YzRWW3jgmsavWhj8e2A3MINOucXFgn7NzSwSXq3bV6kOZNKySQo kyPuC+iCZXEjG1NODicbQ9LGE2LJMl8YP0RNmEhcNoPa5+Ntr1r9R1tvKxd1XTY= X-Gm-Gg: ASbGncuwr1+xe2Ku7zZukK9KKfrbUvb3pKr4hmqPL5OMQkVBqg/Sff8ATG30S7TaAB/ irbOyTlJXWIp1zOOKtVNSU2KBfjhcuZAaNvRbuSdMVKqPK6Oe8CfCWXCg/BRbIeLq37iGMjyDHd 8W4flc3LN+qttfvNQWQ89C1sDQrZnj2Zrr1dSm0ImuRq6xMFb7HeSq+3djRyBaDQsITANv3pk0/ Uo0mGXvPk2U+twd0GDGUuQw0ABdItu6gRXeG941oLnZ8EogGm5BOzFVOm6r2BZr1Fd8YIgXBSrN 1onNZnzOO4wruLT1pK5eswneiKcRfFDmK/S7etir4gPPDmGKiNgQSzCxOiDFWRBpmlY7zgbmApt ahbIVIeVmUFhH/eVCnxxw X-Google-Smtp-Source: AGHT+IENQE37YRQ2jjX/wdc1l6lrPuXwLc6BhtqLGGsk+w2S0LOEexylx7RtO+aY17WsRKAV2IQE9g== X-Received: by 2002:a17:903:22c7:b0:223:432b:593d with SMTP id d9443c01a7336-223f1d101ddmr81262575ad.42.1741217313687; Wed, 05 Mar 2025 15:28:33 -0800 (PST) Received: from dread.disaster.area (pa49-186-89-135.pa.vic.optusnet.com.au. [49.186.89.135]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-224101632d5sm48685ad.122.2025.03.05.15.28.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Mar 2025 15:28:33 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.98) (envelope-from ) id 1tpyAA-00000009N7u-1qst; Thu, 06 Mar 2025 10:28:30 +1100 Date: Thu, 6 Mar 2025 10:28:30 +1100 From: Dave Chinner To: "Darrick J. Wong" Cc: Christoph Hellwig , Carlos Maiolino , Dave Chinner , linux-xfs@vger.kernel.org Subject: Re: [PATCH 10/12] xfs: use vmalloc instead of vm_map_area for buffer backing memory Message-ID: References: <20250305140532.158563-1-hch@lst.de> <20250305140532.158563-11-hch@lst.de> <20250305225407.GM2803749@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250305225407.GM2803749@frogsfrogsfrogs> On Wed, Mar 05, 2025 at 02:54:07PM -0800, Darrick J. Wong wrote: > On Thu, Mar 06, 2025 at 08:20:08AM +1100, Dave Chinner wrote: > > On Wed, Mar 05, 2025 at 07:05:27AM -0700, Christoph Hellwig wrote: > > > The fallback buffer allocation path currently open codes a suboptimal > > > version of vmalloc to allocate pages that are then mapped into > > > vmalloc space. Switch to using vmalloc instead, which uses all the > > > optimizations in the common vmalloc code, and removes the need to > > > track the backing pages in the xfs_buf structure. > > > > > > Signed-off-by: Christoph Hellwig > > ..... > > > > > @@ -1500,29 +1373,43 @@ static void > > > xfs_buf_submit_bio( > > > struct xfs_buf *bp) > > > { > > > - unsigned int size = BBTOB(bp->b_length); > > > - unsigned int map = 0, p; > > > + unsigned int map = 0; > > > struct blk_plug plug; > > > struct bio *bio; > > > > > > - bio = bio_alloc(bp->b_target->bt_bdev, bp->b_page_count, > > > - xfs_buf_bio_op(bp), GFP_NOIO); > > > - bio->bi_private = bp; > > > - bio->bi_end_io = xfs_buf_bio_end_io; > > > + if (is_vmalloc_addr(bp->b_addr)) { > > > + unsigned int size = BBTOB(bp->b_length); > > > + unsigned int alloc_size = roundup(size, PAGE_SIZE); > > > + void *data = bp->b_addr; > > > > > > - if (bp->b_page_count == 1) { > > > - __bio_add_page(bio, virt_to_page(bp->b_addr), size, > > > - offset_in_page(bp->b_addr)); > > > - } else { > > > - for (p = 0; p < bp->b_page_count; p++) > > > - __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0); > > > - bio->bi_iter.bi_size = size; /* limit to the actual size used */ > > > + bio = bio_alloc(bp->b_target->bt_bdev, alloc_size >> PAGE_SHIFT, > > > + xfs_buf_bio_op(bp), GFP_NOIO); > > > + > > > + do { > > > + unsigned int len = min(size, PAGE_SIZE); > > > > > > - if (is_vmalloc_addr(bp->b_addr)) > > > - flush_kernel_vmap_range(bp->b_addr, > > > - xfs_buf_vmap_len(bp)); > > > + ASSERT(offset_in_page(data) == 0); > > > + __bio_add_page(bio, vmalloc_to_page(data), len, 0); > > > + data += len; > > > + size -= len; > > > + } while (size); > > > + > > > + flush_kernel_vmap_range(bp->b_addr, alloc_size); > > > + } else { > > > + /* > > > + * Single folio or slab allocation. Must be contiguous and thus > > > + * only a single bvec is needed. > > > + */ > > > + bio = bio_alloc(bp->b_target->bt_bdev, 1, xfs_buf_bio_op(bp), > > > + GFP_NOIO); > > > + __bio_add_page(bio, virt_to_page(bp->b_addr), > > > + BBTOB(bp->b_length), > > > + offset_in_page(bp->b_addr)); > > > } > > > > How does offset_in_page() work with a high order folio? It can only > > return a value between 0 and (PAGE_SIZE - 1). i.e. shouldn't this > > be: > > > > folio = kmem_to_folio(bp->b_addr); > > > > bio_add_folio_nofail(bio, folio, BBTOB(bp->b_length), > > offset_in_folio(folio, bp->b_addr)); > > I think offset_in_folio() returns 0 in the !kmem && !vmalloc case > because we allocate the folio and set b_addr to folio_address(folio); > and we never call the kmem alloc code for sizes greater than PAGE_SIZE. Yes, but that misses my point: this is a folio conversion, whilst this treats a folio as a page. We're trying to get rid of this sort of page/folio type confusion (i.e. stuff like "does offset_in_page() work correctly on large folios"). New code shouldn't be adding new issues like these, especially when there are existing folio-based APIs that are guaranteed to work correctly and won't need fixing in future before pages and folios can be fully separated. -Dave. -- Dave Chinner david@fromorbit.com