From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF7EE7605E for ; Wed, 7 Aug 2024 14:18:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723040290; cv=none; b=uNSoPNyK8fJkdf/53z9RoqqoRkyAySGnw+ZkhujxQx1vsAqGmA1rSpHV9zlUBwwRo41hoPa5csQ2uQY+/MC6T/oAgBT3seVJosPXsML2aJrRtr8yMIUT0h6YApYn0+65TKhmBt+LiMQpmrJ7dCWAnHbuzfmNOwt3a1BZp+3LmGc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723040290; c=relaxed/simple; bh=vXoN9bzkzc6JpWrpPMm3dZa1zqs3BJ0W5o2bVXx5cOY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=AcC9QeKv0fjoyzcP5S8je/iJ4oSlr6BtaMEKwsAbpwJWmAKPNgwCvl4OikTcVyYVEVrSiZNty/C6p1kTGfF967zn08iW099vb5cKvJYW9ImJqAqSSTtS02ro6cOap1KC0KM+RrxI2xHf9u63WRkAZM1nSQYap3Q9Wg97XSsrf2U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=toxicpanda.com; spf=none smtp.mailfrom=toxicpanda.com; dkim=pass (2048-bit key) header.d=toxicpanda-com.20230601.gappssmtp.com header.i=@toxicpanda-com.20230601.gappssmtp.com header.b=g/RGIXPO; arc=none smtp.client-ip=209.85.160.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=toxicpanda.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=toxicpanda.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20230601.gappssmtp.com header.i=@toxicpanda-com.20230601.gappssmtp.com header.b="g/RGIXPO" Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-44fe28eb1bfso9040241cf.0 for ; Wed, 07 Aug 2024 07:18:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20230601.gappssmtp.com; s=20230601; t=1723040287; x=1723645087; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=xzoGv51qG688e8CkqfOWZi/K63J+LGOYfGSyGwjiq6g=; b=g/RGIXPOIs68m1a/KSj9ZU2k46KmGY+kl6SnjHwjDhabpgWfi67uBG/+1vhUFFiI3k GD3a5r1O2keLkBRAklRxJrdXpZvgKisIMmJLSKNyDa89v9Y3e4HCUSRpkNPlzOq4BnjZ Lm1tFkYKVH0VYiQBJA+iVV9LDNueQG9SJBpDS8D4fCndKISkL9or3R49JyXJ+AJlbIqU YH0dgGjmbCjAckMUd50YaL8H9mN/Dk+ZcvNLNQCJIufbavZmLa6pzBopQdaelRDfRbsK gopKQbv2DTlNrkPWQcfvNXCmhda0pjJd11Y0Up0yftNjCdBCG/DsmNZn9uo8CCw3+Mjg psKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723040287; x=1723645087; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xzoGv51qG688e8CkqfOWZi/K63J+LGOYfGSyGwjiq6g=; b=qfAqWBqXlHlRLqphZ74/0JPJV5MRb2pD6IQYuC5jmAZr3ECIyruDEgEwg3+ijFqGbl Gq3XAdUS/mQHZWfVFlODrw0tOsGX+WVPoL66+P2zBbrQlOwdBRzXQH66QUms6WguTvs8 YYmqqI+wCzW0rfFLtARsrZUNsLEfBwXVSUk4o9UxIRWTZj9VJK6EZN9W81+2srOo14oo Rk1Bwn6YsE/nx6i22g42ksnQkFaepyu7gjUazwCjCgP5Rhb0NftfitowIy+ecG82gP/d dQM6jO7D/7Aqn6tC+/jE3mRYncH+IX58+LI1UMb7SQr9B8d6MNrkQFjaaR7dV01HE2D1 P31w== X-Gm-Message-State: AOJu0YyGro+dJORzR4LZD6+7RlKpKVlqzyvqaGRPiVJLxQMPOaYleerP xM3ImR5q1kEdWpLGmmKmdcKSYLQ3vZcTQhSEWXxy7+vODjxov76NRhHoxeriOyGi97WYoatmUMw q X-Google-Smtp-Source: AGHT+IGJSWQFH+R3ifCujrxDZg0GTFUwE4Bo8NI4NS8qjJCXw6vzvQd3fSMz6O5vZ0krh018wm7Org== X-Received: by 2002:ac8:5796:0:b0:446:5568:a6de with SMTP id d75a77b69052e-451893033femr242913661cf.48.1723040286746; Wed, 07 Aug 2024 07:18:06 -0700 (PDT) Received: from localhost (syn-076-182-020-124.res.spectrum.com. [76.182.20.124]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-451c86ff93bsm5287851cf.20.2024.08.07.07.18.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Aug 2024 07:18:06 -0700 (PDT) Date: Wed, 7 Aug 2024 10:18:05 -0400 From: Josef Bacik To: Qu Wenruo Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs: refactor __extent_writepage_io() to do sector-by-sector submission Message-ID: <20240807141805.GB242945@perftesting> References: <9102c028537fbc1d94c4b092dd4a9940661bc58b.1723020573.git.wqu@suse.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9102c028537fbc1d94c4b092dd4a9940661bc58b.1723020573.git.wqu@suse.com> On Wed, Aug 07, 2024 at 06:21:00PM +0930, Qu Wenruo wrote: > Unlike the bitmap usage inside raid56, for __extent_writepage_io() we > handle the subpage submission not sector-by-sector, but for each dirty > range we found. > > This is not a big deal normally, as normally the subpage complex code is > already mostly optimized out. > > For the sake of consistency and for the future of subpage sector-perfect > compression support, this patch does: > > - Extract the sector submission code into submit_one_sector() > > - Add the needed code to extract the dirty bitmap for subpage case > > - Use bitmap_and() to calculate the target sectors we need to submit > > For x86_64, the dirty bitmap will be fixed to 1, with the length of 1, > so we're still doing the same workload per sector. > > For larger page sizes, the overhead will be a little larger, as previous > we only need to do one extent_map lookup per-dirty-range, but now it > will be one extent_map lookup per-sector. I'd like this to be a followup, because even in the normal page case (or hell the subpage case) we shouldn't have to look up the extent map over and over again, it could span a much larger area. > > But that is the same frequency as x86_64, so we're just aligning the > behavior to x86_64. > > Signed-off-by: Qu Wenruo > --- > The plan of sector-perfect subpage compression is to introduce another > bitmap, possibly inside bio_ctrl, to indicate which sectors do not need > writeback submission (either inline, or async submission). > > So that __extent_writepage_io() can properly skip those ranges to > support sector-perfect subpage compression. > --- > fs/btrfs/extent_io.c | 188 +++++++++++++++++-------------------------- > fs/btrfs/subpage.c | 17 ++++ > fs/btrfs/subpage.h | 3 + > 3 files changed, 96 insertions(+), 112 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 040c92541bc9..6acd763e8f26 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -1333,56 +1333,65 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, > } > > /* > - * Find the first byte we need to write. > + * Return 0 if we have submitted or queued the sector for submission. > + * Return <0 for critical errors. > * > - * For subpage, one page can contain several sectors, and > - * __extent_writepage_io() will just grab all extent maps in the page > - * range and try to submit all non-inline/non-compressed extents. > - * > - * This is a big problem for subpage, we shouldn't re-submit already written > - * data at all. > - * This function will lookup subpage dirty bit to find which range we really > - * need to submit. > - * > - * Return the next dirty range in [@start, @end). > - * If no dirty range is found, @start will be page_offset(page) + PAGE_SIZE. > + * Caller should make sure filepos < i_size and handle filepos >= i_size case. > */ > -static void find_next_dirty_byte(const struct btrfs_fs_info *fs_info, > - struct folio *folio, u64 *start, u64 *end) > +static int submit_one_sector(struct btrfs_inode *inode, > + struct folio *folio, > + u64 filepos, struct btrfs_bio_ctrl *bio_ctrl, > + loff_t i_size) > { > - struct btrfs_subpage *subpage = folio_get_private(folio); > - struct btrfs_subpage_info *spi = fs_info->subpage_info; > - u64 orig_start = *start; > - /* Declare as unsigned long so we can use bitmap ops */ > - unsigned long flags; > - int range_start_bit; > - int range_end_bit; > + struct btrfs_fs_info *fs_info = inode->root->fs_info; > + struct extent_map *em; > + u64 block_start; > + u64 disk_bytenr; > + u64 extent_offset; > + u64 em_end; > + u32 sectorsize = fs_info->sectorsize; > > - /* > - * For regular sector size == page size case, since one page only > - * contains one sector, we return the page offset directly. > - */ > - if (!btrfs_is_subpage(fs_info, folio->mapping)) { > - *start = folio_pos(folio); > - *end = folio_pos(folio) + folio_size(folio); > - return; > + ASSERT(IS_ALIGNED(filepos, sectorsize)); > + > + /* @filepos >= i_size case should be handled by the caller. */ > + ASSERT(filepos < i_size); > + > + em = btrfs_get_extent(inode, NULL, filepos, sectorsize); > + if (IS_ERR(em)) > + return PTR_ERR_OR_ZERO(em); > + > + extent_offset = filepos - em->start; > + em_end = extent_map_end(em); > + ASSERT(filepos <= em_end); > + ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize)); > + ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize)); > + > + block_start = extent_map_block_start(em); > + disk_bytenr = extent_map_block_start(em) + extent_offset; > + > + ASSERT(!extent_map_is_compressed(em)); > + ASSERT(block_start != EXTENT_MAP_HOLE); > + ASSERT(block_start != EXTENT_MAP_INLINE); > + > + free_extent_map(em); > + em = NULL; > + > + btrfs_set_range_writeback(inode, filepos, filepos + sectorsize - 1); > + if (!folio_test_writeback(folio)) { > + btrfs_err(fs_info, > + "folio %lu not writeback, cur %llu end %llu", > + folio->index, filepos, filepos + sectorsize); > } > - > - range_start_bit = spi->dirty_offset + > - (offset_in_folio(folio, orig_start) >> > - fs_info->sectorsize_bits); > - > - /* We should have the page locked, but just in case */ > - spin_lock_irqsave(&subpage->lock, flags); > - bitmap_next_set_region(subpage->bitmaps, &range_start_bit, &range_end_bit, > - spi->dirty_offset + spi->bitmap_nr_bits); > - spin_unlock_irqrestore(&subpage->lock, flags); > - > - range_start_bit -= spi->dirty_offset; > - range_end_bit -= spi->dirty_offset; > - > - *start = folio_pos(folio) + range_start_bit * fs_info->sectorsize; > - *end = folio_pos(folio) + range_end_bit * fs_info->sectorsize; > + /* > + * Although the PageDirty bit is cleared before entering this > + * function, subpage dirty bit is not cleared. > + * So clear subpage dirty bit here so next time we won't submit > + * folio for range already written to disk. > + */ > + btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize); > + submit_extent_folio(bio_ctrl, disk_bytenr, folio, > + sectorsize, filepos - folio_pos(folio)); > + return 0; > } > > /* > @@ -1401,10 +1410,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, > { > struct btrfs_fs_info *fs_info = inode->root->fs_info; > u64 cur = start; > - u64 end = start + len - 1; > - u64 extent_offset; > - u64 block_start; > - struct extent_map *em; > + unsigned long dirty_bitmap = 1; > + unsigned long range_bitmap = 0; > + unsigned int bitmap_size = 1; > + int bit; > int ret = 0; > int nr = 0; > > @@ -1419,18 +1428,23 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, > return 1; > } > > + if (btrfs_is_subpage(fs_info, inode->vfs_inode.i_mapping)) { > + ASSERT(fs_info->subpage_info); > + btrfs_get_subpage_dirty_bitmap(fs_info, folio, &dirty_bitmap); > + bitmap_size = fs_info->subpage_info->bitmap_nr_bits; > + } > + for (cur = start; cur < start + len; cur += fs_info->sectorsize) > + set_bit((cur - start) >> fs_info->sectorsize_bits, &range_bitmap); > + bitmap_and(&dirty_bitmap, &dirty_bitmap, &range_bitmap, bitmap_size); I'd rather move this completely under the btrfs_is_subpage() case, we don't need to do this where sectorsize == page size. Other than that this cleans things up nicely, you can add my Reviewed-by: Josef Bacik to v2. Thanks, Josef