Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Christoph Hellwig <hch@lst.de>
Cc: Naohiro Aota <Naohiro.Aota@wdc.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: new scrub code vs zoned file systems
Date: Thu, 1 Jun 2023 06:48:40 +0800	[thread overview]
Message-ID: <c90550d4-d3db-318a-01a7-5dbb475b782e@gmx.com> (raw)
In-Reply-To: <ea984319-decb-ce86-aed4-d4520bf3ad3d@gmx.com>



On 2023/6/1 06:25, Qu Wenruo wrote:
> 
> 
> On 2023/5/31 22:04, Johannes Thumshirn wrote:
>> On 31.05.23 15:31, Christoph Hellwig wrote:
>>> On Wed, May 31, 2023 at 01:25:14PM +0000, Johannes Thumshirn wrote:
>>>> Hmm at least flush_scrub_stripes() should not go into the simple write
>>>> path at all:
>>>
>>> Except for the dev-replace case, which seems to trigger this
>>> write.
>>>
>>
>> Heh and this has never actually worked IMHO.
>>
>> I did a crude hack to bandaid scrub:
>> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
>> index d7d8faf1978a..b20115bd0675 100644
>> --- a/fs/btrfs/scrub.c
>> +++ b/fs/btrfs/scrub.c
>> @@ -1709,9 +1709,20 @@ static int flush_scrub_stripes(struct scrub_ctx 
>> *sctx)
>>
>>                          ASSERT(stripe->dev == 
>> fs_info->dev_replace.srcdev);
>>
>> -                       bitmap_andnot(&good, 
>> &stripe->extent_sector_bitmap,
>> -                                     &stripe->error_bitmap, 
>> stripe->nr_sectors);
>> -                       scrub_write_sectors(sctx, stripe, good, true);
>> +                       if (btrfs_is_zoned(fs_info)) {
>> +                               if 
>> (!bitmap_empty(&stripe->extent_sector_bitmap,
>> +                                                 stripe->nr_sectors)) {
>> +                                       btrfs_repair_one_zone(fs_info,
>> +                                                             
>> sctx->stripes[0].bg->start);
>> +                                       break;
> 
> This doesn't look good, is this a hack to use repair to do the dev-replace?
> 
>> +                               }
>> +                       } else {
>> +                               bitmap_andnot(&good,
>> +                                             
>> &stripe->extent_sector_bitmap,
>> +                                             &stripe->error_bitmap,
>> +                                             stripe->nr_sectors);
>> +                               scrub_write_sectors(sctx, stripe, 
>> good, true);
>> +                       }
>>                  }
>>          }
>>
>>
>>
>> But then it doesn't work as well because:
>>
>> static int relocating_repair_kthread(void *data)
>> {
>>     [...]
>>          sb_start_write(fs_info->sb);
>>          if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) {
>>                  btrfs_info(fs_info,
>>                             "zoned: skip relocating block group %llu 
>> to repair: EBUSY",
>>                             target);
>>                  sb_end_write(fs_info->sb);
>>                  return -EBUSY;
>>
>> That will always fail, because in the case of dev-replace we already have
>> BTRFS_EXCLOP_DEV_REPLACE set.
>>
>> I've just spotted btrfs_exclop_start_try_lock(), that could solve our 
>> problem
>> here.
> 
> To me, the problem can be solved in a much simpler way, if it's
> dev-replace for zoned device, let's write the whole stripe to the target
> device, and wait for it.
> 
> For the btrfs_record_physical_zoned(), we can skip the OE things if
> bbio::inode is NULL.
> 
> Would the following change solves the problem?
> 
> Thanks,
> Qu
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index d7d8faf1978a..3fa480cd905e 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1709,8 +1709,15 @@ static int flush_scrub_stripes(struct scrub_ctx
> *sctx)
> 
>                          ASSERT(stripe->dev == 
> fs_info->dev_replace.srcdev);
> 
> -                       bitmap_andnot(&good, &stripe->extent_sector_bitmap,
> -                                     &stripe->error_bitmap,
> stripe->nr_sectors);
> +                       if (btrfs_is_zoned(fs_info))
> +                               /*
> +                                * For zoned case, we need to write the
> whole
> +                                * stripe back, no gaps allowed.
> +                                */
> +                               bitmap_set(&good, 0, stripe->nr_sectors);

In fact this is not even needed.

The scrub_write_sectors() already have all the fill_writer_pointer_gap() 
calls for the block group.

Meaning we can pass the existing @good bitmap, even with gaps, and 
scrub_write_sectors() would handle it properly.
Only the NULL inode pointer check is needed.

Thus the initial problem of that crash is really not about the write gaps.

[   53.691003] nvme3n1: Zone Management Append(0x7d) @ LBA 65536, 4 
blocks, Zone Is Full (sct 0x1 / sc 0xb9) DNR
[   53.694996] I/O error, dev nvme3n1, sector 786432 op 
0xd:(ZONE_APPEND) flags 0x4000 phys_seg 3 prio class 2

Any clue on why the target zone is full during replace?

Thanks,
Qu
> +                       else
> +                               bitmap_andnot(&good,
> &stripe->extent_sector_bitmap,
> +                                             &stripe->error_bitmap,
> stripe->nr_sectors);
>                          scrub_write_sectors(sctx, stripe, good, true);
>                  }
>          }
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 98d6b8cc3874..cced6aeff8d7 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -1659,6 +1659,13 @@ void btrfs_record_physical_zoned(struct btrfs_bio
> *bbio)
>          const u64 physical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
>          struct btrfs_ordered_extent *ordered;
> 
> +       /*
> +        * For scrub case we have no inode, and doesn't need to bother
> ordered
> +        * extents.
> +        */
> +       if (!bbio->inode)
> +               return;
> +
>          ordered = btrfs_lookup_ordered_extent(bbio->inode,
> bbio->file_offset);
>          if (WARN_ON(!ordered))
>                  return;

  reply	other threads:[~2023-05-31 22:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31 12:52 new scrub code vs zoned file systems Christoph Hellwig
2023-05-31 13:10 ` Johannes Thumshirn
2023-05-31 13:20   ` Christoph Hellwig
2023-05-31 13:25     ` Johannes Thumshirn
2023-05-31 13:30       ` Christoph Hellwig
2023-05-31 14:04         ` Johannes Thumshirn
2023-05-31 14:17           ` Christoph Hellwig
2023-06-01  2:09             ` Qu Wenruo
2023-06-01  4:40               ` Christoph Hellwig
2023-06-01  5:00                 ` Qu Wenruo
2023-06-01  5:17                   ` Naohiro Aota
2023-06-01  5:21                     ` Naohiro Aota
2023-06-01  7:21                       ` Qu Wenruo
2023-06-01  7:27                         ` Christoph Hellwig
2023-06-01  8:46                           ` Qu Wenruo
2023-06-01  5:22                     ` Christoph Hellwig
2023-06-01  5:34                       ` Christoph Hellwig
2023-06-01  5:45                     ` Qu Wenruo
2023-06-01  5:47                       ` Christoph Hellwig
2023-05-31 22:25           ` Qu Wenruo
2023-05-31 22:48             ` Qu Wenruo [this message]
2023-06-01  4:53             ` Christoph Hellwig
2023-06-01  5:04               ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c90550d4-d3db-318a-01a7-5dbb475b782e@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=hch@lst.de \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox