public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Eric Sandeen <sandeen@redhat.com>, xfs <linux-xfs@vger.kernel.org>
Subject: Re: mkfs is broken due to platform_zero_range
Date: Tue, 4 May 2021 10:51:09 +1000	[thread overview]
Message-ID: <20210504005109.GK63242@dread.disaster.area> (raw)
In-Reply-To: <20210504002053.GC7448@magnolia>

On Mon, May 03, 2021 at 05:20:53PM -0700, Darrick J. Wong wrote:
> So... I have a machine with an nvme drive manufactured by a certain
> manufacturer who isn't known for the quality of their firmware
> implementation.  I'm pretty sure that this is a result of the use of
> fallocate(FALLOC_FL_ZERO_RANGE) to zero the log during format.
> 
> If I format a device, mounting and repair both fail because the primary
> superblock UUID doesn't match the log UUID:
.....
> And the format works this time too:
> 
> [root@abacus654 ~]# strace -s99 -o /tmp/a mkfs.xfs /dev/nvme0n1  -f
> meta-data=/dev/nvme0n1           isize=512    agcount=6, agsize=268435455 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1
> data     =                       bsize=4096   blocks=1542990848, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> Discarding blocks...Done.
> (reverse-i-search)`-n': od -tx1 -Ad -c /tmp/badlog3 | head ^C15
> [root@abacus654 ~]# xfs_repair -n /dev/nvme0n1
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
> 
> In conclusion, the drive firmware is broken.
> 
> Question: Should we be doing /some/ kind of re-read after a zeroing the
> log to detect these sh*tty firmwares and fall back to a pwrite()?

No, userspace should not have to wrok around broken hardware. The
kernel needs to blacklist/quirk this device so that it will do
either:

a) redirect to a zeroing mechanism that actually works on that
device; or

b) fail the fallocate() call with -EOPNOTSUPP so that the
application can fall back to manual zeroing.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-05-04  0:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04  0:20 mkfs is broken due to platform_zero_range Darrick J. Wong
2021-05-04  0:51 ` Dave Chinner [this message]
2021-05-04  6:57 ` Christoph Hellwig
2021-05-04  7:04 ` Chaitanya Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210504005109.GK63242@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox