public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Liu Song <liusong@linux.alibaba.com>,
	kbusch@kernel.org, hch@lst.de, sagi@grimberg.me
Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
Date: Mon, 19 Sep 2022 08:10:31 -0600	[thread overview]
Message-ID: <894e18a4-4504-df48-6429-a04c222ca064@kernel.dk> (raw)
In-Reply-To: <7b28925a-cbee-620f-fde7-d16f256836cc@linux.alibaba.com>

On 9/18/22 10:10 AM, Liu Song wrote:
> 
> On 2022/9/18 00:50, Jens Axboe wrote:
>> On 9/17/22 10:40 AM, Liu Song wrote:
>>> From: Liu Song <liusong@linux.alibaba.com>
>>>
>>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>>> request, so the corresponding process can be simplified.
>> If the worry is the call overhead of blk_mq_complete_request_remote(),
>> why don't we just make that available as an inline instead? That seems
>> vastly superior to providing a random shortcut in a driver to avoid
>> calling it.
> 
> Hi
> 
> This is what I think about it. If it is an SSD with only one hw queue,
> remote requests will appear occasionally. As a real multi-queue
> device, nvme usually does not have remote requests, so we don't need
> to care about it. So even if "blk_mq_complete_request_remote" is
> called, there is a high probability that it will return false, and the
> cost of changing the call to "blk_mq_complete_request_remote" to
> determine whether "req->mq_hctx->nr_ctx" is 1 is not only a function
> call, but more judgments in "blk_mq_complete_request_remote". If
> "blk_mq_complete_request_remote" is decorated as inline, it also
> depends on the compiler, there is uncertainty.

I'm not disagreeing with any of that, my point is just that you're
hacking around this in the nvme driver. This is problematic whenever
core changes happen, because now we have to touch individual drivers.
While the expectation is that there are no remote IPI completions for
NVMe, queue starved devices do exist and those do see remote
completions.

This optimization belongs in the blk-mq core, not in nvme. I do think it
makes sense, you just need to solve it in blk-mq rather than in the nvme
driver.

-- 
Jens Axboe

  reply	other threads:[~2022-09-19 14:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-17 16:40 [RFC PATCH] nvme: request remote is usually not involved for nvme devices Liu Song
2022-09-17 16:50 ` Jens Axboe
2022-09-18 16:10   ` Liu Song
2022-09-19 14:10     ` Jens Axboe [this message]
2022-09-19 14:35       ` Christoph Hellwig
2022-09-21  3:40         ` Liu Song
2022-09-21  3:32       ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
2022-09-22  6:20         ` Christoph Hellwig
2022-09-22  7:17           ` Liu Song
2022-09-22  8:27           ` [RFC PATCH v2] " Liu Song
2022-09-24 15:03         ` [RFC PATCH] " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=894e18a4-4504-df48-6429-a04c222ca064@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=liusong@linux.alibaba.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox