From: Ming Lei <ming.lei@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
Chao Leng <lengchao@huawei.com>, Yi Zhang <yi.zhang@redhat.com>
Subject: Re: [PATCH 3/4] nvme: tcp: fix race between timeout and normal completion
Date: Tue, 20 Oct 2020 17:44:20 +0800 [thread overview]
Message-ID: <20201020094420.GD1429635@T590> (raw)
In-Reply-To: <e9d2e28e-fb55-358c-3e8c-6f3e9dd91c25@grimberg.me>
On Tue, Oct 20, 2020 at 01:11:11AM -0700, Sagi Grimberg wrote:
>
> > NVMe TCP timeout handler allows to abort request directly when the
> > controller isn't in LIVE state. nvme_tcp_error_recovery() updates
> > controller state as RESETTING, and schedule reset work function. If
> > new timeout comes before the work function is called, the new timedout
> > request will be aborted directly, however at that time, the controller
> > isn't shut down yet, then timeout abort vs. normal completion race
> > will be triggered.
>
> This assertion is incorrect, the before completing the request from
> the timeout handler, we call nvme_tcp_stop_queue, which guarantees upon
> return that no more completions will be seen from this queue.
OK, then looks the issue can be fixed by patch 1 & 2 only.
Yi, can you test again and see if the issue can be fixed by patch 1 & 2?
Thanks,
Ming
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@kernel.dk>, Yi Zhang <yi.zhang@redhat.com>,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
Chao Leng <lengchao@huawei.com>, Keith Busch <kbusch@kernel.org>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 3/4] nvme: tcp: fix race between timeout and normal completion
Date: Tue, 20 Oct 2020 17:44:20 +0800 [thread overview]
Message-ID: <20201020094420.GD1429635@T590> (raw)
In-Reply-To: <e9d2e28e-fb55-358c-3e8c-6f3e9dd91c25@grimberg.me>
On Tue, Oct 20, 2020 at 01:11:11AM -0700, Sagi Grimberg wrote:
>
> > NVMe TCP timeout handler allows to abort request directly when the
> > controller isn't in LIVE state. nvme_tcp_error_recovery() updates
> > controller state as RESETTING, and schedule reset work function. If
> > new timeout comes before the work function is called, the new timedout
> > request will be aborted directly, however at that time, the controller
> > isn't shut down yet, then timeout abort vs. normal completion race
> > will be triggered.
>
> This assertion is incorrect, the before completing the request from
> the timeout handler, we call nvme_tcp_stop_queue, which guarantees upon
> return that no more completions will be seen from this queue.
OK, then looks the issue can be fixed by patch 1 & 2 only.
Yi, can you test again and see if the issue can be fixed by patch 1 & 2?
Thanks,
Ming
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-10-20 9:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-16 14:28 [PATCH 0/4] blk-mq/nvme-tcp: fix timed out related races Ming Lei
2020-10-16 14:28 ` Ming Lei
2020-10-16 14:28 ` [PATCH 1/4] blk-mq: check rq->state explicitly in blk_mq_tagset_count_completed_rqs Ming Lei
2020-10-16 14:28 ` Ming Lei
2020-10-19 0:50 ` Ming Lei
2020-10-19 0:50 ` Ming Lei
2020-10-16 14:28 ` [PATCH 2/4] blk-mq: think request as completed if it isn't IN_FLIGHT Ming Lei
2020-10-16 14:28 ` Ming Lei
2020-10-16 14:28 ` [PATCH 3/4] nvme: tcp: fix race between timeout and normal completion Ming Lei
2020-10-16 14:28 ` Ming Lei
2020-10-20 8:11 ` Sagi Grimberg
2020-10-20 8:11 ` Sagi Grimberg
2020-10-20 9:44 ` Ming Lei [this message]
2020-10-20 9:44 ` Ming Lei
2020-10-16 14:28 ` [PATCH 4/4] nvme: tcp: complete non-IO requests atomically Ming Lei
2020-10-16 14:28 ` Ming Lei
2020-10-20 8:14 ` Sagi Grimberg
2020-10-20 8:14 ` Sagi Grimberg
2020-10-20 7:32 ` [PATCH 0/4] blk-mq/nvme-tcp: fix timed out related races Yi Zhang
2020-10-20 7:32 ` Yi Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201020094420.GD1429635@T590 \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=lengchao@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.