All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagig@dev.mellanox.co.il>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	Or Gerlitz <ogerlitz@mellanox.com>, Oren Duer <oren@mellanox.com>,
	Max Gurtovoy <maxg@mellanox.com>
Subject: Re: blk-mq queue selection and queue_rq preemption
Date: Wed, 09 Apr 2014 15:10:12 +0300	[thread overview]
Message-ID: <534538A4.6050103@dev.mellanox.co.il> (raw)
In-Reply-To: <53441879.3010909@kernel.dk>

On 4/8/2014 6:40 PM, Jens Axboe wrote:
> On 2014-04-08 05:10, Sagi Grimberg wrote:
>> On 4/7/2014 10:45 PM, Jens Axboe wrote:
>>> On 04/07/2014 06:44 AM, Sagi Grimberg wrote:
>>>> Hey Jens, Christoph & Co,
>>>>
>>>> I raised this question at LSF but didn't get a clear answer on this
>>>> matter.
>>>> So it seems to me that the hctx selection and the actual request
>>>> dispatch (queue_rq) are preemptive:
>>>> (1) blk_mq_get_ctx(q);
>>>> (2) map_queue(q, ctx->cpu);
>>>> ...
>>>> (3) blk_mq_put_ctx(ctx);
>>>> (4) blk_mq_run_hw_queue(hctx, async);
>>>>
>>>> It is possible that an MQ device driver may want to implement a 
>>>> lockless
>>>> scheme counting on (running) CPU <-> hctx attachment.
>>>> Generally speaking, I think that LLDs will be more comfortable knowing
>>>> that they are not preemptive in the dispatch flow.
>>>>
>>>> My question is, is this a must? if so can you please explain why?
>>>>
>>>> Is it possible to put the hctx (restoring preemption) after 
>>>> run_hw_queue
>>>> allowing to LLDs to be sure that the selected queue
>>>> match the running CPU?
>>>
>>> It's a good question, and one I have thought about before. As you
>>> note, in the existing code, the mappings are what I would refer to as
>>> "soft". Generally speaking, CPU X will always map to hardware queue Y,
>>> but there are no specific guarantees made to effect. It would be
>>> trivial to make this mapping hard, and I'd be very open to doing that.
>>> But so far I haven't seen cases where it would improve things. If you
>>> end up being preempted and moved to a different CPU, it doesn't really
>>> matter if this happens before or after you queued the IO - the
>>> completion will end up in the "wrong" location regardless.
>>>
>>> But if drivers can be simplified and improved through relying on hard
>>> mappings (and preempt hence disabled), then I would definitely provide
>>> that possibility as well. If it doesn't hurt by default, we can just
>>> switch to that model.
>>>
>>
>> Hey Jens, thanks for the quick response!
>>
>> So in my driver I would definitely want to rely on hard mappings.
>> The reason is that I maintain a context for IO completions (lets call it
>> "completer") and a submission queue percpu.
>> The completer handles IO completions and also peeks at the submission
>> queue to handle pending IOs.
>> I wish to keep mutual exclusion between the completer and the submission
>> context.
>> The driver is capable of setting the IO submission so that the
>> completion will end up on the same core.
>>
>> Hard mappings providing non-preemptive submission flows will guarantee
>> that the above scheme will work.
>> At the moment I do:
>> (1) queue = hctx->driver_data
>> process request...
>> (2) cpu = get_cpu()
>> (3) if (cpu != queue.id)
>>              queue = queues[cpu]
>> (4) submit_io(queue)
>> (5) put_cpu()
>>
>> So, adding hard_map indicator to blk_mq_reg will be great.
>>
>> As I mentioned in the general case, I think that LLDs will be more
>> comfortable with hard mappings in order to avoid/reduce
>> lock contention in the submission path and also possibly exploiting
>> cache/NUMA locality. Moreover I think that setting the device to 
>> generate
>> IO completion on the submission CPU is common practice for blk-mq
>> implementation. isn't it?
>>
>> I would definitely like to get more input from the driver folks on this.
>
> I've rolled up a set of changes to test this out, I have attached the 
> small series. As I originally stated, it was due to performance 
> concerns that I didn't do this originally. If this works better (or 
> the same) on cases where we don't necessarily care about the hard 
> mappings, then we should just do it. It's a cleaner model, and the 
> hardware that has as many queues as CPUs, it's definitely the right 
> (and obvious) thing to do.
>
> Note that the attached are TOTALLY untested.
>

Thanks Jens!

I'll give it a go...

Sagi.

  reply	other threads:[~2014-04-09 12:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-07 12:44 blk-mq queue selection and queue_rq preemption Sagi Grimberg
2014-04-07 19:45 ` Jens Axboe
2014-04-08 11:10   ` Sagi Grimberg
2014-04-08 15:40     ` Jens Axboe
2014-04-09 12:10       ` Sagi Grimberg [this message]
2014-04-09 13:37         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=534538A4.6050103@dev.mellanox.co.il \
    --to=sagig@dev.mellanox.co.il \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=maxg@mellanox.com \
    --cc=ogerlitz@mellanox.com \
    --cc=oren@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.