public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jon Derrick <jonathan.derrick@intel.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	linux-ext4@vger.kernel.org, linux-nvme@lists.infradead.org,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	sagi@grimberg.me, Keith Busch <keith.busch@intel.com>
Subject: Re: BUG: hot removal during writes on ext4 formatted nvme device
Date: Thu, 18 May 2017 09:34:59 +0800	[thread overview]
Message-ID: <20170518013454.GA13864@ming.t460p> (raw)
In-Reply-To: <037dd98d-595e-f96b-6241-2474bd2c3e8f@intel.com>

On Mon, May 22, 2017 at 06:38:12PM -0600, Jon Derrick wrote:
> Hello,
> 
> I've encountered a BUG that I've experienced during hot removal on an
> ext4-formatted nvme device undergoing writes. I have been able to verify
> that 4.5, 4.6, 4.10.12, 4.11, and 4.12-rc1 show similar issues (the v4.6
> trace below shows issues with block that have already been fixed). I'm
> using VMD hardware for my hotplug controller so 4.5 is as far back as I
> can go (maybe someone else can verify on non-VMD hardware?).
> 
> To reproduce:
> 1) mkfs.ext4 <nvme>
> 2) mount <nvme> <mnt>
> 3) dd if=/dev/zero of=<mnt>/file bs=1M count=10000
> 4) Hot remove the drive while above is writing
> 
> From what I can tell, the ext4 sb is trying to be committed in the error
> path. There is supposed to be a check if the device is still alive via
> block_device_ejected(), but my guess is that there is a race between the
> removal/deletion in genhd and this check. I would appreciate any help
> resolving this.
>

Recently I played fio over NVMe partition direclty with hot-remove too, and
found that d3cfb2a0ac0b8487d28(block: block new I/O just after queue is set
as dying) is helpful for this kind of issue.

Also the following patch fixes one issue in remove path.

	http://marc.info/?l=linux-block&m=149498450028434&w=2

So could you test v4.12-rc1(d3cfb2a0 is merged) with the above patch?

With these patches in, block layer & NVMe should make sure that all I/O can
be finished with -EIO before del_gendisk() returns once after hot-remove
is triggered, then the failure handling of fs might need further investigation.


Thanks,
Ming

  reply	other threads:[~2017-05-18  1:35 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-17 20:09 BUG: hot removal during writes on ext4 formatted nvme device Jon Derrick
2017-05-18  1:34 ` Ming Lei [this message]
2017-05-18 14:25 ` Dmitry Monakhov
2017-05-18 23:53   ` Jon Derrick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170518013454.GA13864@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jonathan.derrick@intel.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox