Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Nilay Shroff <nilay@linux.ibm.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-nvme@lists.infradead.org, kbusch@kernel.org,
	sagi@grimberg.me, axboe@fb.com, chaitanyak@nvidia.com,
	dlemoal@kernel.org, gjoyce@linux.ibm.com
Subject: Re: [PATCH v3 2/3] nvme: make keep-alive synchronous operation
Date: Tue, 8 Oct 2024 23:16:45 +0530	[thread overview]
Message-ID: <98154482-8777-42cb-b470-ba0f0db1c59c@linux.ibm.com> (raw)
In-Reply-To: <20241008070459.GA22711@lst.de>



On 10/8/24 12:34, Christoph Hellwig wrote:
> On Tue, Oct 08, 2024 at 11:51:51AM +0530, Nilay Shroff wrote:
>> This fix helps avoid race by implementing keep-alive as a synchronous
>> operation so that admin queue-usage ref counter is decremented only
> 
> Please spell out q_usage_counter as requested in the first round.
> 
Yes sure, I will do it in the next patch revision.

>> after keep-alive command finish execution and returns its status. This
>> would ensure that we don't inadvertently destroy the fabric admin queue
>> until we finish processing of nvme keep-alive request and its status and
>> hence it's safe to delete the queue.
> 
> I still fail to see why this requires a synchronous operation vs just
> calling blk_mq_free_request and thus decrementing q_usage_counter
> afrer checking the controller state.
> 
> Maybe I'm just dumb and missing the obvious even after the last
> explanation, but then the commit log needs to be improved to explain
> it.
>
OK, I will update the commit log in the next patch revision.
 
BTW, I just tried experimenting with your suggestion of "removing the 
blk_mq_free_request call from nvme_keep_alive_finish function and returning 
RQ_END_IO_FREE instead of RQ_END_IO_NONE" and I could still hit the same issue. 

The issue here's that after nvme_keep_alive_finish returns back up to the 
block layer, the nvme keep-alive thread running the queue dispatcher operation 
(and hence accessing the queue resources) while this queue might have been 
destroyed on another cpu.  

nvme_keep_alive_work()
  ->blk_execute_rq_no_wait()
    ->blk_mq_run_hw_queue()
      ->blk_mq_sched_dispatch_requests()
        ->__blk_mq_sched_dispatch_requests()
          ->blk_mq_dispatch_rq_list()
            ->nvme_loop_queue_rq()
              ->nvme_fail_nonready_command() 
                ->nvme_complete_rq()
                  ->nvme_end_req()
                    ->blk_mq_end_request()
                      ->__blk_mq_end_request()  -- with your suggestion, we now decrement admin->q_usage_counter here
                        ->nvme_keep_alive_finish() 
 
When above call stack returns to __blk_mq_sched_dispatch_requests function, 
the admin queue might have been destroyed on another cpu however the 
__blk_mq_sched_dispatch_requests could still access the admin queue resources 
and causing the crash as reported in the cover letter.

> 
>> -static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
>> -						 blk_status_t status)
>> +static void nvme_keep_alive_finish(struct request *rq,
>> +				blk_status_t status,
>> +				struct nvme_ctrl *ctrl)
> 
> And as a nipick, this should be:
> 
> static void nvme_keep_alive_finish(struct request *rq, blk_status_t status,
> 		struct nvme_ctrl *ctrl)
> 
> 
Yes will do it the next patch.

Thanks,
--Nilay


  reply	other threads:[~2024-10-08 17:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-08  6:21 [PATCH v3 0/3] nvme: system fault while shutting down fabric controller Nilay Shroff
2024-10-08  6:21 ` [PATCH v3 1/3] nvme-loop: flush off pending I/O while shutting down loop controller Nilay Shroff
2024-10-08  6:21 ` [PATCH v3 2/3] nvme: make keep-alive synchronous operation Nilay Shroff
2024-10-08  7:04   ` Christoph Hellwig
2024-10-08 17:46     ` Nilay Shroff [this message]
2024-10-08  6:21 ` [PATCH v3 3/3] nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function Nilay Shroff
2024-10-08  7:06   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98154482-8777-42cb-b470-ba0f0db1c59c@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=axboe@fb.com \
    --cc=chaitanyak@nvidia.com \
    --cc=dlemoal@kernel.org \
    --cc=gjoyce@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox