Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-27  2:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <CACVXFVPWL=izFmUPtpi-+RLrhNX0LFfw2m6eurgMC5GvfDWsxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 10/26/16 18:30, Ming Lei wrote:
> On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>> have finished. This function does *not* wait until all outstanding
>> requests have finished (this means invocation of request.end_io()).
>> The algorithm used by blk_mq_quiesce_queue() is as follows:
>> * Hold either an RCU read lock or an SRCU read lock around
>>   .queue_rq() calls. The former is used if .queue_rq() does not
>>   block and the latter if .queue_rq() may block.
>> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>>   synchronize_rcu() to wait for .queue_rq() invocations that
>>   started before blk_mq_quiesce_queue() was called.
>> * The blk_mq_hctx_stopped() calls that control whether or not
>>   .queue_rq() will be called are called with the (S)RCU read lock
>>   held. This is necessary to avoid race conditions against
>>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>>   sequence from another thread.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
>> Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
>> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
>> ---
>>  block/Kconfig          |  1 +
>>  block/blk-mq.c         | 69 +++++++++++++++++++++++++++++++++++++++++++++-----
>>  include/linux/blk-mq.h |  3 +++
>>  include/linux/blkdev.h |  1 +
>>  4 files changed, 67 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 1d4d624..0562ef9 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -5,6 +5,7 @@ menuconfig BLOCK
>>         bool "Enable the block layer" if EXPERT
>>         default y
>>         select SBITMAP
>> +       select SRCU
>>         help
>>          Provide block layer support for the kernel.
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 0cf21c2..4945437 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>>  }
>>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>>
>> +/**
>> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
>> + * @q: request queue.
>> + *
>> + * Note: this function does not prevent that the struct request end_io()
>> + * callback function is invoked. Additionally, it is not prevented that
>> + * new queue_rq() calls occur unless the queue has been stopped first.
>> + */
>> +void blk_mq_quiesce_queue(struct request_queue *q)
>> +{
>> +       struct blk_mq_hw_ctx *hctx;
>> +       unsigned int i;
>> +       bool rcu = false;
>
> Before synchronizing SRCU/RCU, we have to set a per-hctx flag
> or per-queue flag to block comming .queue_rq(), seems I mentioned
> that before:
>
>    https://www.spinics.net/lists/linux-rdma/msg41389.html

Hello Ming,

Thanks for having included an URL to an archived version of that 
discussion. What I remember about that discussion is that I proposed to 
use the existing flag BLK_MQ_S_STOPPED instead of introducing a
new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. 
See also https://www.spinics.net/lists/linux-rdma/msg41430.html.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [For help] rdma-roce build quesiton
From: oulijun @ 2016-10-27  2:15 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Linuxarm, linux-rdma
In-Reply-To: <20161026160902.GD24898-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/27 0:09, Jason Gunthorpe 写道:
> On Wed, Oct 26, 2016 at 03:23:38PM +0800, oulijun wrote:
> 
>> the build is fail and the print log as follows:
>>
>> error: size of unnamed array is negative
>> attr->cap.max_recv_wr = min(context->max_qp_wr, attr->cap.max_recv_wr);
> 
> It is telling you the types are not the same, and this is a source of bugs
> as C has some counter intuitive rules regarding type promotion.
> 
> 1) Audit max_qp_wr and max_recv_wr to see if they really should be
>    different types, if not fix context->max_qp_wr to match
> 
> 2) If they are legitimately different then use
> 
>   min_t(<desired type>, context->max_qp_wr, attr->cap.max_recv_wr);
> 
> Think carefully about what common type is used because both arguments
> will be casted, and the goal is to avoid a loss of precision or
> signdedness in the cast.
> 
> Jason
> 
> .
> 
thanks, I see it.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core 5/7] libhns: Add verbs of qp support
From: oulijun @ 2016-10-27  2:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161026162321.GF24898-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/27 0:23, Jason Gunthorpe 写道:
> On Wed, Oct 26, 2016 at 09:04:06PM +0800, Lijun Ou wrote:
> 
>> +#ifndef min
>> +#define min(a, b) \
>> +	({ typeof (a) _a = (a); \
>> +	typeof (b) _b = (b); \
>> +	_a < _b ? _a : _b; })
>> +#endif
> 
> Nope, use the ccan header
> 
> Jason
> 
thanks your reivew. I will fix it in next patch

thanks
Lijun Ou
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Ming Lei @ 2016-10-27  2:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <7690b469-f5b7-ab04-4b6f-fa0d679e0f18@acm.org>

On Thu, Oct 27, 2016 at 10:04 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> On 10/26/16 18:30, Ming Lei wrote:
>>
>> On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
>> <bart.vanassche@sandisk.com> wrote:
>>>
>>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>>> have finished. This function does *not* wait until all outstanding
>>> requests have finished (this means invocation of request.end_io()).
>>> The algorithm used by blk_mq_quiesce_queue() is as follows:
>>> * Hold either an RCU read lock or an SRCU read lock around
>>>   .queue_rq() calls. The former is used if .queue_rq() does not
>>>   block and the latter if .queue_rq() may block.
>>> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>>>   synchronize_rcu() to wait for .queue_rq() invocations that
>>>   started before blk_mq_quiesce_queue() was called.
>>> * The blk_mq_hctx_stopped() calls that control whether or not
>>>   .queue_rq() will be called are called with the (S)RCU read lock
>>>   held. This is necessary to avoid race conditions against
>>>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>>>   sequence from another thread.
>>>
>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Cc: Ming Lei <tom.leiming@gmail.com>
>>> Cc: Hannes Reinecke <hare@suse.com>
>>> Cc: Johannes Thumshirn <jthumshirn@suse.de>
>>> ---
>>>  block/Kconfig          |  1 +
>>>  block/blk-mq.c         | 69
>>> +++++++++++++++++++++++++++++++++++++++++++++-----
>>>  include/linux/blk-mq.h |  3 +++
>>>  include/linux/blkdev.h |  1 +
>>>  4 files changed, 67 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/block/Kconfig b/block/Kconfig
>>> index 1d4d624..0562ef9 100644
>>> --- a/block/Kconfig
>>> +++ b/block/Kconfig
>>> @@ -5,6 +5,7 @@ menuconfig BLOCK
>>>         bool "Enable the block layer" if EXPERT
>>>         default y
>>>         select SBITMAP
>>> +       select SRCU
>>>         help
>>>          Provide block layer support for the kernel.
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 0cf21c2..4945437 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>>>  }
>>>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>>>
>>> +/**
>>> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have
>>> finished
>>> + * @q: request queue.
>>> + *
>>> + * Note: this function does not prevent that the struct request end_io()
>>> + * callback function is invoked. Additionally, it is not prevented that
>>> + * new queue_rq() calls occur unless the queue has been stopped first.
>>> + */
>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>> +{
>>> +       struct blk_mq_hw_ctx *hctx;
>>> +       unsigned int i;
>>> +       bool rcu = false;
>>
>>
>> Before synchronizing SRCU/RCU, we have to set a per-hctx flag
>> or per-queue flag to block comming .queue_rq(), seems I mentioned
>> that before:
>>
>>    https://www.spinics.net/lists/linux-rdma/msg41389.html
>
>
> Hello Ming,
>
> Thanks for having included an URL to an archived version of that discussion.
> What I remember about that discussion is that I proposed to use the existing
> flag BLK_MQ_S_STOPPED instead of introducing a
> new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. See
> also https://www.spinics.net/lists/linux-rdma/msg41430.html.

Yes, I am fine with either one, but the flag need to set in
blk_mq_quiesce_queue(), doesnt't it?


Thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-27  2:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <CACVXFVNvdNXm4JXsXo2p1_pZLqDpCos0MTTR+svR98p_vw5PCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 10/26/16 19:31, Ming Lei wrote:
> On Thu, Oct 27, 2016 at 10:04 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>> On 10/26/16 18:30, Ming Lei wrote:
>>>
>>> On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
>>> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
>>>>
>>>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>>>> have finished. This function does *not* wait until all outstanding
>>>> requests have finished (this means invocation of request.end_io()).
>>>> The algorithm used by blk_mq_quiesce_queue() is as follows:
>>>> * Hold either an RCU read lock or an SRCU read lock around
>>>>   .queue_rq() calls. The former is used if .queue_rq() does not
>>>>   block and the latter if .queue_rq() may block.
>>>> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>>>>   synchronize_rcu() to wait for .queue_rq() invocations that
>>>>   started before blk_mq_quiesce_queue() was called.
>>>> * The blk_mq_hctx_stopped() calls that control whether or not
>>>>   .queue_rq() will be called are called with the (S)RCU read lock
>>>>   held. This is necessary to avoid race conditions against
>>>>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>>>>   sequence from another thread.
>>>>
>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>>> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
>>>> Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
>>>> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
>>>> ---
>>>>  block/Kconfig          |  1 +
>>>>  block/blk-mq.c         | 69
>>>> +++++++++++++++++++++++++++++++++++++++++++++-----
>>>>  include/linux/blk-mq.h |  3 +++
>>>>  include/linux/blkdev.h |  1 +
>>>>  4 files changed, 67 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/block/Kconfig b/block/Kconfig
>>>> index 1d4d624..0562ef9 100644
>>>> --- a/block/Kconfig
>>>> +++ b/block/Kconfig
>>>> @@ -5,6 +5,7 @@ menuconfig BLOCK
>>>>         bool "Enable the block layer" if EXPERT
>>>>         default y
>>>>         select SBITMAP
>>>> +       select SRCU
>>>>         help
>>>>          Provide block layer support for the kernel.
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 0cf21c2..4945437 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>>>>
>>>> +/**
>>>> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have
>>>> finished
>>>> + * @q: request queue.
>>>> + *
>>>> + * Note: this function does not prevent that the struct request end_io()
>>>> + * callback function is invoked. Additionally, it is not prevented that
>>>> + * new queue_rq() calls occur unless the queue has been stopped first.
>>>> + */
>>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>>> +{
>>>> +       struct blk_mq_hw_ctx *hctx;
>>>> +       unsigned int i;
>>>> +       bool rcu = false;
>>>
>>>
>>> Before synchronizing SRCU/RCU, we have to set a per-hctx flag
>>> or per-queue flag to block comming .queue_rq(), seems I mentioned
>>> that before:
>>>
>>>    https://www.spinics.net/lists/linux-rdma/msg41389.html
>>
>>
>> Hello Ming,
>>
>> Thanks for having included an URL to an archived version of that discussion.
>> What I remember about that discussion is that I proposed to use the existing
>> flag BLK_MQ_S_STOPPED instead of introducing a
>> new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. See
>> also https://www.spinics.net/lists/linux-rdma/msg41430.html.
>
> Yes, I am fine with either one, but the flag need to set in
> blk_mq_quiesce_queue(), doesnt't it?

Hello Ming,

If you have a look at the later patches in this series then you will see 
that the dm core and the NVMe driver have been modified such that
blk_mq_stop_hw_queues(q) is called immediately before 
blk_mq_quiesce_queue(q) is called.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Ming Lei @ 2016-10-27  2:48 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <16074653-5e31-27b0-90db-33d36a9df0bc@acm.org>

On Thu, Oct 27, 2016 at 10:40 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> On 10/26/16 19:31, Ming Lei wrote:
>>
>> On Thu, Oct 27, 2016 at 10:04 AM, Bart Van Assche <bvanassche@acm.org>
>> wrote:
>>>
>>> On 10/26/16 18:30, Ming Lei wrote:
>>>>
>>>>
>>>> On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
>>>> <bart.vanassche@sandisk.com> wrote:
>>>>>
>>>>>
>>>>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>>>>> have finished. This function does *not* wait until all outstanding
>>>>> requests have finished (this means invocation of request.end_io()).
>>>>> The algorithm used by blk_mq_quiesce_queue() is as follows:
>>>>> * Hold either an RCU read lock or an SRCU read lock around
>>>>>   .queue_rq() calls. The former is used if .queue_rq() does not
>>>>>   block and the latter if .queue_rq() may block.
>>>>> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>>>>>   synchronize_rcu() to wait for .queue_rq() invocations that
>>>>>   started before blk_mq_quiesce_queue() was called.
>>>>> * The blk_mq_hctx_stopped() calls that control whether or not
>>>>>   .queue_rq() will be called are called with the (S)RCU read lock
>>>>>   held. This is necessary to avoid race conditions against
>>>>>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>>>>>   sequence from another thread.
>>>>>
>>>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>>> Cc: Christoph Hellwig <hch@lst.de>
>>>>> Cc: Ming Lei <tom.leiming@gmail.com>
>>>>> Cc: Hannes Reinecke <hare@suse.com>
>>>>> Cc: Johannes Thumshirn <jthumshirn@suse.de>
>>>>> ---
>>>>>  block/Kconfig          |  1 +
>>>>>  block/blk-mq.c         | 69
>>>>> +++++++++++++++++++++++++++++++++++++++++++++-----
>>>>>  include/linux/blk-mq.h |  3 +++
>>>>>  include/linux/blkdev.h |  1 +
>>>>>  4 files changed, 67 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/block/Kconfig b/block/Kconfig
>>>>> index 1d4d624..0562ef9 100644
>>>>> --- a/block/Kconfig
>>>>> +++ b/block/Kconfig
>>>>> @@ -5,6 +5,7 @@ menuconfig BLOCK
>>>>>         bool "Enable the block layer" if EXPERT
>>>>>         default y
>>>>>         select SBITMAP
>>>>> +       select SRCU
>>>>>         help
>>>>>          Provide block layer support for the kernel.
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 0cf21c2..4945437 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue
>>>>> *q)
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>>>>>
>>>>> +/**
>>>>> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have
>>>>> finished
>>>>> + * @q: request queue.
>>>>> + *
>>>>> + * Note: this function does not prevent that the struct request
>>>>> end_io()
>>>>> + * callback function is invoked. Additionally, it is not prevented
>>>>> that
>>>>> + * new queue_rq() calls occur unless the queue has been stopped first.
>>>>> + */
>>>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>>>> +{
>>>>> +       struct blk_mq_hw_ctx *hctx;
>>>>> +       unsigned int i;
>>>>> +       bool rcu = false;
>>>>
>>>>
>>>>
>>>> Before synchronizing SRCU/RCU, we have to set a per-hctx flag
>>>> or per-queue flag to block comming .queue_rq(), seems I mentioned
>>>> that before:
>>>>
>>>>    https://www.spinics.net/lists/linux-rdma/msg41389.html
>>>
>>>
>>>
>>> Hello Ming,
>>>
>>> Thanks for having included an URL to an archived version of that
>>> discussion.
>>> What I remember about that discussion is that I proposed to use the
>>> existing
>>> flag BLK_MQ_S_STOPPED instead of introducing a
>>> new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. See
>>> also https://www.spinics.net/lists/linux-rdma/msg41430.html.
>>
>>
>> Yes, I am fine with either one, but the flag need to set in
>> blk_mq_quiesce_queue(), doesnt't it?
>
>
> Hello Ming,
>
> If you have a look at the later patches in this series then you will see
> that the dm core and the NVMe driver have been modified such that
> blk_mq_stop_hw_queues(q) is called immediately before
> blk_mq_quiesce_queue(q) is called.

Cause any current and future users of blk_mq_quiesce_queue(q)
have to set the flag via blk_mq_stop_hw_queues(q), why not set
the flag explicitly in blk_mq_quiesce_queue(q)?


thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-27  3:05 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <CACVXFVNyRnJFnsSO9TNksE5GNy7TJKL=iZnqrv38=u0XCrW7jA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 10/26/16 19:48, Ming Lei wrote:
> On Thu, Oct 27, 2016 at 10:40 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>> If you have a look at the later patches in this series then you will see
>> that the dm core and the NVMe driver have been modified such that
>> blk_mq_stop_hw_queues(q) is called immediately before
>> blk_mq_quiesce_queue(q) is called.
>
> Cause any current and future users of blk_mq_quiesce_queue(q)
> have to set the flag via blk_mq_stop_hw_queues(q), why not set
> the flag explicitly in blk_mq_quiesce_queue(q)?

Hello Ming,

I'll leave it to Jens to decide whether I should repost the patch series 
with this change integrated or whether to realize this change with a 
follow-up patch.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core 1/7] libhns: Add initial main frame
From: oulijun @ 2016-10-27  3:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161026162053.GE24898-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/27 0:20, Jason Gunthorpe 写道:
> On Wed, Oct 26, 2016 at 09:04:02PM +0800, Lijun Ou wrote:
>> +static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
>> +					       int abi_version)
>> +{
>> +	struct hns_roce_device  *dev;
>> +	char			 value[128];
>> +	int			 i;
>> +
>> +	if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
>> +				value, sizeof(value)) > 0)
>> +		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
>> +			if (!strcmp(value, acpi_table[i].hid))
>> +				goto found;
> 
> You shouldn't need to do both modalias and compatible, there should be
> an acceptable modalias for the DT version too.
> 
when startup by DT, the content of device/modalias is of:NinfinibandT<NULL>Chisilicon,hns-roce-v1.
it is long and complex. the content of device/of_node/compatible is hisilicon,hns-roce-v1
when startup by APCI, the content of device/modalias is acpi:HISI00D1:
Hence, we decide to adopt the above approach to  distinguish the device. when adding a device of pcie
in v2, we will add a condition branch to distinguish it, as follows:
   if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
			value, sizeof(value)) > 0)
		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
		if (!strcmp(value, acpi_table[i].hid))
			goto found;

....
  if (ibv_read_sysfs-file(uverbs_sys_path, "device/vendor", value, sizeof(value)) > 0)
	...
  if (ibv_read_sysfs-file(uverbs_sys_path, "device/vendor", value, sizeof(value)) > 0)
 	...

> But I wonder if this isn't generically better to be
> 
>  last_dir(readlink("device/driver")) == "hns"
> 
> instead?
> Jason
> 
I think it is not insteaded. because it will be find the hns in the path(device/driver)
As follows:
when startup by DT, the content of device/driver is:
root@(none)$ cat /sys/class/infiniband/hns_0/device/driver/
bind module/ unbind
c4000000.infiniband/ uevent
root@(none)$ cat /sys/class/infiniband/hns_0/device/driver/
>

when startup by ACPI, the content of device/driver is:
root@(none)$ cat /sys/class/infiniband/hns_0/device/driver/
HISI00D1:00/ bind module/ uevent unbind
root@(none)$ cat /sys/class/infiniband/hns_0/device/driver/
cat: read error: Is a directory

thanks
Lijun Ou
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 01/12] blk-mq: Do not invoke .queue_rq() for a stopped queue
From: Hannes Reinecke @ 2016-10-27  5:47 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lin,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <1debcf7f-c950-308b-d297-3e48a77e08d7@sandisk.com>

On 10/27/2016 12:50 AM, Bart Van Assche wrote:
> The meaning of the BLK_MQ_S_STOPPED flag is "do not call
> .queue_rq()". Hence modify blk_mq_make_request() such that requests
> are queued instead of issued if a queue has been stopped.
> 
> Reported-by: Ming Lei <tom.leiming@gmail.com>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Ming Lei <tom.leiming@gmail.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  block/blk-mq.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ddc2eed..b5dcafb 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1332,9 +1332,9 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  		blk_mq_put_ctx(data.ctx);
>  		if (!old_rq)
>  			goto done;
> -		if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -			goto done;
> -		blk_mq_insert_request(old_rq, false, true, true);
> +		if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
> +		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +			blk_mq_insert_request(old_rq, false, true, true);
>  		goto done;
>  	}
>  
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* Re: [PATCH 03/12] blk-mq: Introduce blk_mq_queue_stopped()
From: Hannes Reinecke @ 2016-10-27  5:49 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <f68b2997-8b0d-7aea-2859-5fbda4f6bf71@sandisk.com>

On 10/27/2016 12:52 AM, Bart Van Assche wrote:
> The function blk_queue_stopped() allows to test whether or not a
> traditional request queue has been stopped. Introduce a helper
> function that allows block drivers to query easily whether or not
> one or more hardware contexts of a blk-mq queue have been stopped.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-mq.c         | 20 ++++++++++++++++++++
>  include/linux/blk-mq.h |  1 +
>  2 files changed, 21 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* Re: [PATCH 04/12] blk-mq: Move more code into blk_mq_direct_issue_request()
From: Hannes Reinecke @ 2016-10-27  5:50 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <32b0bd88-cb8e-754a-89fc-b1825778b05a-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On 10/27/2016 12:52 AM, Bart Van Assche wrote:
> Move the "hctx stopped" test and the insert request calls into
> blk_mq_direct_issue_request(). Rename that function into
> blk_mq_try_issue_directly() to reflect its new semantics. Pass
> the hctx pointer to that function instead of looking it up a
> second time. These changes avoid that code has to be duplicated
> in the next patch.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
> ---
>  block/blk-mq.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare-l3A5Bk7waGM@public.gmane.org			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Hannes Reinecke @ 2016-10-27  5:52 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <5143c240-39af-9fe2-d3e6-ed69f9c20531-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On 10/27/2016 12:53 AM, Bart Van Assche wrote:
> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
> have finished. This function does *not* wait until all outstanding
> requests have finished (this means invocation of request.end_io()).
> The algorithm used by blk_mq_quiesce_queue() is as follows:
> * Hold either an RCU read lock or an SRCU read lock around
>   .queue_rq() calls. The former is used if .queue_rq() does not
>   block and the latter if .queue_rq() may block.
> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>   synchronize_rcu() to wait for .queue_rq() invocations that
>   started before blk_mq_quiesce_queue() was called.
> * The blk_mq_hctx_stopped() calls that control whether or not
>   .queue_rq() will be called are called with the (S)RCU read lock
>   held. This is necessary to avoid race conditions against
>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>   sequence from another thread.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
> ---
>  block/Kconfig          |  1 +
>  block/blk-mq.c         | 69 +++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |  3 +++
>  include/linux/blkdev.h |  1 +
>  4 files changed, 67 insertions(+), 7 deletions(-)
> 
Hmm. Can't say I like having to have two RCU structure for the same
purpose, but I guess that can't be helped.

Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare-l3A5Bk7waGM@public.gmane.org			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 8/8] IB/mlx5: Add helper mlx5_ib_post_send_wait
From: Leon Romanovsky @ 2016-10-27  6:05 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Arnd Bergmann,
	linux-rdma, Linux kernel mailing list
In-Reply-To: <CAHv-k__yPiQOeZxDjvaSzG6A8Qs32to9MnKiPpHAvbXMxqE5EA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

On Tue, Oct 25, 2016 at 06:46:58PM +0530, Binoy Jayan wrote:
> On 25 October 2016 at 17:56, Leon Romanovsky <leon@kernel.org> wrote:
> > On Tue, Oct 25, 2016 at 05:31:59PM +0530, Binoy Jayan wrote:
>
> > In case of success (err == 0), you will call to unmap_dma instead of
> > normal flow.
> >
> > NAK,
> > Leon Romanovsky <leonro@mellanox.com>
>
> Hi Loen,
>
> Even in the original code, the regular flow seems to reach 'unmap_dma' after
> returning from 'wait_for_completion'().

In original flow, the code executed assignments to mr->mmkey. In you
code, it is skipped.

http://lxr.free-electrons.com/source/drivers/infiniband/hw/mlx5/mr.c#L900

>
> -Binoy

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v3 8/9] IB/mlx5: Add helper mlx5_ib_post_send_wait
From: Leon Romanovsky @ 2016-10-27  6:09 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Arnd Bergmann,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477468874-16328-9-git-send-email-binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

On Wed, Oct 26, 2016 at 01:31:13PM +0530, Binoy Jayan wrote:
> Clean up the following common code (to post a list of work requests to the
> send queue of the specified QP) at various places and add a helper function
> 'mlx5_ib_post_send_wait' to implement the same.
>
>  - Initialize 'mlx5_ib_umr_context' on stack
>  - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
>  - Acquire the semaphore
>  - call ib_post_send with a single ib_send_wr
>  - wait_for_completion()
>  - Check for umr_context.status
>  - Release the semaphore
>
> As semaphores are going away in the future, moving all of these into the
> shared helper leaves only a single function using the semaphore, which
> can then be rewritten to use something else.
>
> Signed-off-by: Binoy Jayan <binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> ---
>  drivers/infiniband/hw/mlx5/mr.c | 115 +++++++++++-----------------------------
>  1 file changed, 32 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index d4ad672..19c292a 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -856,16 +856,40 @@ static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
>  	init_completion(&context->done);
>  }
>
> +static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
> +					 struct mlx5_umr_wr *umrwr)
> +{
> +	struct umr_common *umrc = &dev->umrc;
> +	struct ib_send_wr *bad;
> +	int err;
> +	struct mlx5_ib_umr_context umr_context;
> +
> +	mlx5_ib_init_umr_context(&umr_context);
> +	umrwr->wr.wr_cqe = &umr_context.cqe;
> +
> +	down(&umrc->sem);
> +	err = ib_post_send(umrc->qp, &umrwr->wr, &bad);
> +	if (err) {
> +		mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
> +	} else {
> +		wait_for_completion(&umr_context.done);
> +		if (umr_context.status != IB_WC_SUCCESS) {
> +			mlx5_ib_warn(dev, "reg umr failed (%u)\n",
> +				     umr_context.status);
> +			err = -EFAULT;
> +		}
> +	}
> +	up(&umrc->sem);
> +	return err;
> +}
> +
>  static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
>  				  u64 virt_addr, u64 len, int npages,
>  				  int page_shift, int order, int access_flags)
>  {
>  	struct mlx5_ib_dev *dev = to_mdev(pd->device);
>  	struct device *ddev = dev->ib_dev.dma_device;
> -	struct umr_common *umrc = &dev->umrc;
> -	struct mlx5_ib_umr_context umr_context;
>  	struct mlx5_umr_wr umrwr = {};
> -	struct ib_send_wr *bad;
>  	struct mlx5_ib_mr *mr;
>  	struct ib_sge sg;
>  	int size;
> @@ -894,24 +918,12 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
>  	if (err)
>  		goto free_mr;
>
> -	mlx5_ib_init_umr_context(&umr_context);
> -
> -	umrwr.wr.wr_cqe = &umr_context.cqe;
>  	prep_umr_reg_wqe(pd, &umrwr.wr, &sg, dma, npages, mr->mmkey.key,
>  			 page_shift, virt_addr, len, access_flags);
>
> -	down(&umrc->sem);
> -	err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
> -	if (err) {
> -		mlx5_ib_warn(dev, "post send failed, err %d\n", err);
> +	err = mlx5_ib_post_send_wait(dev, &umrwr);
> +	if (err != -EFAULT)
>  		goto unmap_dma;

NAK,
You are breaking driver.

it should be "if (err && err != -EFAULT)"

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v2 8/8] IB/mlx5: Add helper mlx5_ib_post_send_wait
From: Binoy Jayan @ 2016-10-27  6:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Arnd Bergmann,
	linux-rdma, Linux kernel mailing list
In-Reply-To: <20161027060546.GA3617@leon.nu>

On 27 October 2016 at 11:35, Leon Romanovsky <leon@kernel.org> wrote:
> On Tue, Oct 25, 2016 at 06:46:58PM +0530, Binoy Jayan wrote:
>> On 25 October 2016 at 17:56, Leon Romanovsky <leon@kernel.org> wrote:
>> > On Tue, Oct 25, 2016 at 05:31:59PM +0530, Binoy Jayan wrote:
>>
>> > In case of success (err == 0), you will call to unmap_dma instead of
>> > normal flow.
>> >
>> > NAK,
>> > Leon Romanovsky <leonro@mellanox.com>
>>
>> Hi Loen,
>>
>> Even in the original code, the regular flow seems to reach 'unmap_dma' after
>> returning from 'wait_for_completion'().
>
> In original flow, the code executed assignments to mr->mmkey. In you
> code, it is skipped.
>

Yes you are right, I just noted it. My bad. I've changed it now.

Thanks,
Binoy

^ permalink raw reply

* Re: A question regarding "multiple SGL"
From: Christoph Hellwig @ 2016-10-27  6:41 UTC (permalink / raw)
  To: 鑫愿
  Cc: Bart Van Assche, Jens Axboe,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	James Bottomley, Martin K. Petersen, Mike Snitzer,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ming Lei,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Keith Busch, Doug Ledford,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Laurence Oberman, Christoph Hellwig, tiger.zhao, qiuxin
In-Reply-To: <20161027005230.9904DC00097-2RFepEojUI2gQzYKMK1YzK/p1tWXv8elb9TvmfFkwKk@public.gmane.org>

Hi Robert,

There is no feature called "Multiple SGL in one NVMe capsule".  The
NVMe over Fabrics specification allows a controller to advertise how
many SGL descriptors it supports using the MSDBD Identify field:

"Maximum SGL Data Block Descriptors (MSDBD): This field indicates the
maximum number of (Keyed) SGL Data Block descriptors that a host is allowed to
place in a capsule. A value of 0h indicates no limit."

Setting this value to 1 is perfectly valid.  Similarly a host is free
to chose any number of SGL descriptors between 0 (only for command that
don't transfer data) to the limit imposed by the controller using the
MSDBD field.

There are no plans to support a MSDBD value larger than 1 in the Linux
NVMe target, and there are no plans to ever submit commands with multiple
SGLs from the host driver either.

Cheers,
	Christoph
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: A question regarding "multiple SGL"
From: Qiuxin (robert) @ 2016-10-27  6:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, Jens Axboe, linux-block@vger.kernel.org,
	James Bottomley, Martin K. Petersen, Mike Snitzer,
	linux-rdma@vger.kernel.org, Ming Lei,
	linux-nvme@lists.infradead.org, Keith Busch, Doug Ledford,
	linux-scsi@vger.kernel.org, Laurence Oberman, Tiger zhao
In-Reply-To: <20161027064115.GA5864@lst.de>

Hi Christoph,

Thanks , got it.

Could you please do me favor to let me know the background why we ONLY support " MSDBD ==1"?   I am NOT trying to resist or oppose anything , I just want to know the reason.  You know,  it is a little wired for me, as  "MSDBD ==1" does not fulfill all the use cases which is depicted in the spec.

Best,
Robert Qiuxin
________________________________________
Robert Qiuxin
华为技术有限公司 Huawei Technologies Co., Ltd.
Phone: +86-755-28420357
Fax: 
Mobile: +86 15986638429
Email: qiuxin@huawei.com
地址：深圳市龙岗区坂田华为基地 邮编：518129
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com 
________________________________________
本邮件及其附件含有华为公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中
的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！
This e-mail and its attachments contain confidential information from HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any use of the 
information contained herein in any way (including, but not limited to, total or partial 
disclosure, reproduction, or dissemination) by persons other than the intended 
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by 
phone or email immediately and delete it!
-----邮件原件-----
发件人: Christoph Hellwig [mailto:hch@lst.de] 
发送时间: 2016年10月27日 14:41
收件人: 鑫愿
抄送: Bart Van Assche; Jens Axboe; linux-block@vger.kernel.org; James Bottomley; Martin K. Petersen; Mike Snitzer; linux-rdma@vger.kernel.org; Ming Lei; linux-nvme@lists.infradead.org; Keith Busch; Doug Ledford; linux-scsi@vger.kernel.org; Laurence Oberman; Christoph Hellwig; Tiger zhao; Qiuxin (robert)
主题: Re: A question regarding "multiple SGL"

Hi Robert,

There is no feature called "Multiple SGL in one NVMe capsule".  The NVMe over Fabrics specification allows a controller to advertise how many SGL descriptors it supports using the MSDBD Identify field:

"Maximum SGL Data Block Descriptors (MSDBD): This field indicates the maximum number of (Keyed) SGL Data Block descriptors that a host is allowed to place in a capsule. A value of 0h indicates no limit."

Setting this value to 1 is perfectly valid.  Similarly a host is free to chose any number of SGL descriptors between 0 (only for command that don't transfer data) to the limit imposed by the controller using the MSDBD field.

There are no plans to support a MSDBD value larger than 1 in the Linux NVMe target, and there are no plans to ever submit commands with multiple SGLs from the host driver either.

Cheers,
	Christoph

^ permalink raw reply

* [PATCH v4 00/10] infiniband: Remove semaphores
From: Binoy Jayan @ 2016-10-27  6:59 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Binoy Jayan

Hi,

These are a set of patches [v4] which removes semaphores from infiniband.
These are part of a bigger effort to eliminate all semaphores from the
linux kernel.

v3 -> v4:

IB/mlx5: Added patch - Replace semaphore umr_common:sem with wait_event
IB/mlx5: Fixed a bug pointed out by Leon Romanovsky

Thanks,
Binoy

Binoy Jayan (10):
  IB/core: iwpm_nlmsg_request: Replace semaphore with completion
  IB/core: Replace semaphore sm_sem with an atomic wait
  IB/hns: Replace semaphore poll_sem with mutex
  IB/mthca: Replace semaphore poll_sem with mutex
  IB/isert: Replace semaphore sem with completion
  IB/hns: Replace counting semaphore event_sem with wait_event
  IB/mthca: Replace counting semaphore event_sem with wait_event
  IB/mlx5: Add helper mlx5_ib_post_send_wait
  IB/mlx5: Replace semaphore umr_common:sem with wait_event
  IB/mlx5: Simplify completion into a wait_event

 drivers/infiniband/core/iwpm_msg.c          |   8 +-
 drivers/infiniband/core/iwpm_util.c         |   7 +-
 drivers/infiniband/core/iwpm_util.h         |   3 +-
 drivers/infiniband/core/user_mad.c          |  20 +++--
 drivers/infiniband/hw/hns/hns_roce_cmd.c    |  57 ++++++++-----
 drivers/infiniband/hw/hns/hns_roce_device.h |   5 +-
 drivers/infiniband/hw/mlx5/main.c           |   6 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h        |   9 +-
 drivers/infiniband/hw/mlx5/mr.c             | 124 +++++++++-------------------
 drivers/infiniband/hw/mthca/mthca_cmd.c     |  57 ++++++++-----
 drivers/infiniband/hw/mthca/mthca_cmd.h     |   1 +
 drivers/infiniband/hw/mthca/mthca_dev.h     |   5 +-
 drivers/infiniband/ulp/isert/ib_isert.c     |   6 +-
 drivers/infiniband/ulp/isert/ib_isert.h     |   3 +-
 include/rdma/ib_verbs.h                     |   1 +
 15 files changed, 153 insertions(+), 159 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v4 01/10] IB/core: iwpm_nlmsg_request: Replace semaphore with completion
From: Binoy Jayan @ 2016-10-27  6:59 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma, linux-kernel, Binoy Jayan
In-Reply-To: <1477551554-30349-1-git-send-email-binoy.jayan@linaro.org>

Semaphore sem in iwpm_nlmsg_request is used as completion, so
convert it to a struct completion type. Semaphores are going
away in the future.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/core/iwpm_msg.c  | 8 ++++----
 drivers/infiniband/core/iwpm_util.c | 7 +++----
 drivers/infiniband/core/iwpm_util.h | 3 ++-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/iwpm_msg.c b/drivers/infiniband/core/iwpm_msg.c
index 1c41b95..761358f 100644
--- a/drivers/infiniband/core/iwpm_msg.c
+++ b/drivers/infiniband/core/iwpm_msg.c
@@ -394,7 +394,7 @@ int iwpm_register_pid_cb(struct sk_buff *skb, struct netlink_callback *cb)
 	/* always for found nlmsg_request */
 	kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
 	barrier();
-	up(&nlmsg_request->sem);
+	complete(&nlmsg_request->comp);
 	return 0;
 }
 EXPORT_SYMBOL(iwpm_register_pid_cb);
@@ -463,7 +463,7 @@ int iwpm_add_mapping_cb(struct sk_buff *skb, struct netlink_callback *cb)
 	/* always for found request */
 	kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
 	barrier();
-	up(&nlmsg_request->sem);
+	complete(&nlmsg_request->comp);
 	return 0;
 }
 EXPORT_SYMBOL(iwpm_add_mapping_cb);
@@ -555,7 +555,7 @@ int iwpm_add_and_query_mapping_cb(struct sk_buff *skb,
 	/* always for found request */
 	kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
 	barrier();
-	up(&nlmsg_request->sem);
+	complete(&nlmsg_request->comp);
 	return 0;
 }
 EXPORT_SYMBOL(iwpm_add_and_query_mapping_cb);
@@ -749,7 +749,7 @@ int iwpm_mapping_error_cb(struct sk_buff *skb, struct netlink_callback *cb)
 	/* always for found request */
 	kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
 	barrier();
-	up(&nlmsg_request->sem);
+	complete(&nlmsg_request->comp);
 	return 0;
 }
 EXPORT_SYMBOL(iwpm_mapping_error_cb);
diff --git a/drivers/infiniband/core/iwpm_util.c b/drivers/infiniband/core/iwpm_util.c
index ade71e7..08ddd2e 100644
--- a/drivers/infiniband/core/iwpm_util.c
+++ b/drivers/infiniband/core/iwpm_util.c
@@ -323,8 +323,7 @@ struct iwpm_nlmsg_request *iwpm_get_nlmsg_request(__u32 nlmsg_seq,
 	nlmsg_request->nl_client = nl_client;
 	nlmsg_request->request_done = 0;
 	nlmsg_request->err_code = 0;
-	sema_init(&nlmsg_request->sem, 1);
-	down(&nlmsg_request->sem);
+	init_completion(&nlmsg_request->comp);
 	return nlmsg_request;
 }
 
@@ -368,8 +367,8 @@ int iwpm_wait_complete_req(struct iwpm_nlmsg_request *nlmsg_request)
 {
 	int ret;
 
-	ret = down_timeout(&nlmsg_request->sem, IWPM_NL_TIMEOUT);
-	if (ret) {
+	ret = wait_for_completion_timeout(&nlmsg_request->comp, IWPM_NL_TIMEOUT);
+	if (!ret) {
 		ret = -EINVAL;
 		pr_info("%s: Timeout %d sec for netlink request (seq = %u)\n",
 			__func__, (IWPM_NL_TIMEOUT/HZ), nlmsg_request->nlmsg_seq);
diff --git a/drivers/infiniband/core/iwpm_util.h b/drivers/infiniband/core/iwpm_util.h
index af1fc14..ea6c299 100644
--- a/drivers/infiniband/core/iwpm_util.h
+++ b/drivers/infiniband/core/iwpm_util.h
@@ -43,6 +43,7 @@
 #include <linux/delay.h>
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
+#include <linux/completion.h>
 #include <linux/jhash.h>
 #include <linux/kref.h>
 #include <net/netlink.h>
@@ -69,7 +70,7 @@ struct iwpm_nlmsg_request {
 	u8	            nl_client;
 	u8                  request_done;
 	u16                 err_code;
-	struct semaphore    sem;
+	struct completion   comp;
 	struct kref         kref;
 };
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v4 02/10] IB/core: Replace semaphore sm_sem with an atomic wait
From: Binoy Jayan @ 2016-10-27  6:59 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Binoy Jayan
In-Reply-To: <1477551554-30349-1-git-send-email-binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

The semaphore 'sm_sem' is used for an exclusive ownership of the device
so model the same as an atomic variable with an associated wait_event.
Semaphores are going away in the future.

Signed-off-by: Binoy Jayan <binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/user_mad.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 415a318..6101c0a 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -67,6 +67,8 @@ enum {
 	IB_UMAD_MINOR_BASE = 0
 };
 
+#define UMAD_F_CLAIM	0x01
+
 /*
  * Our lifetime rules for these structs are the following:
  * device special file is opened, we take a reference on the
@@ -87,7 +89,8 @@ struct ib_umad_port {
 
 	struct cdev           sm_cdev;
 	struct device	      *sm_dev;
-	struct semaphore       sm_sem;
+	wait_queue_head_t     wq;
+	unsigned long         flags;
 
 	struct mutex	       file_mutex;
 	struct list_head       file_list;
@@ -1030,12 +1033,14 @@ static int ib_umad_sm_open(struct inode *inode, struct file *filp)
 	port = container_of(inode->i_cdev, struct ib_umad_port, sm_cdev);
 
 	if (filp->f_flags & O_NONBLOCK) {
-		if (down_trylock(&port->sm_sem)) {
+		if (test_and_set_bit(UMAD_F_CLAIM, &port->flags)) {
 			ret = -EAGAIN;
 			goto fail;
 		}
 	} else {
-		if (down_interruptible(&port->sm_sem)) {
+		if (wait_event_interruptible(port->wq,
+					     !test_and_set_bit(UMAD_F_CLAIM,
+					     &port->flags))) {
 			ret = -ERESTARTSYS;
 			goto fail;
 		}
@@ -1060,7 +1065,8 @@ static int ib_umad_sm_open(struct inode *inode, struct file *filp)
 	ib_modify_port(port->ib_dev, port->port_num, 0, &props);
 
 err_up_sem:
-	up(&port->sm_sem);
+	clear_bit(UMAD_F_CLAIM, &port->flags);
+	wake_up(&port->wq);
 
 fail:
 	return ret;
@@ -1079,7 +1085,8 @@ static int ib_umad_sm_close(struct inode *inode, struct file *filp)
 		ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props);
 	mutex_unlock(&port->file_mutex);
 
-	up(&port->sm_sem);
+	clear_bit(UMAD_F_CLAIM, &port->flags);
+	wake_up(&port->wq);
 
 	kobject_put(&port->umad_dev->kobj);
 
@@ -1177,7 +1184,8 @@ static int ib_umad_init_port(struct ib_device *device, int port_num,
 
 	port->ib_dev   = device;
 	port->port_num = port_num;
-	sema_init(&port->sm_sem, 1);
+	init_waitqueue_head(&port->wq);
+	__clear_bit(UMAD_F_CLAIM, &port->flags);
 	mutex_init(&port->file_mutex);
 	INIT_LIST_HEAD(&port->file_list);
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v4 03/10] IB/hns: Replace semaphore poll_sem with mutex
From: Binoy Jayan @ 2016-10-27  6:59 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma, linux-kernel, Binoy Jayan
In-Reply-To: <1477551554-30349-1-git-send-email-binoy.jayan@linaro.org>

The semaphore 'poll_sem' is a simple mutex, so it should be written as one.
Semaphores are going away in the future. So replace it with a mutex. Also,
remove mutex_[un]lock from mthca_cmd_use_events and mthca_cmd_use_polling
respectively.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c    | 11 ++++-------
 drivers/infiniband/hw/hns/hns_roce_device.h |  3 ++-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 2a0b6c0..51a0675 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -119,7 +119,7 @@ static int hns_roce_cmd_mbox_post_hw(struct hns_roce_dev *hr_dev, u64 in_param,
 	return ret;
 }
 
-/* this should be called with "poll_sem" */
+/* this should be called with "poll_mutex" */
 static int __hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param,
 				    u64 out_param, unsigned long in_modifier,
 				    u8 op_modifier, u16 op,
@@ -167,10 +167,10 @@ static int hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param,
 {
 	int ret;
 
-	down(&hr_dev->cmd.poll_sem);
+	mutex_lock(&hr_dev->cmd.poll_mutex);
 	ret = __hns_roce_cmd_mbox_poll(hr_dev, in_param, out_param, in_modifier,
 				       op_modifier, op, timeout);
-	up(&hr_dev->cmd.poll_sem);
+	mutex_unlock(&hr_dev->cmd.poll_mutex);
 
 	return ret;
 }
@@ -275,7 +275,7 @@ int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
 	struct device *dev = &hr_dev->pdev->dev;
 
 	mutex_init(&hr_dev->cmd.hcr_mutex);
-	sema_init(&hr_dev->cmd.poll_sem, 1);
+	mutex_init(&hr_dev->cmd.poll_mutex);
 	hr_dev->cmd.use_events = 0;
 	hr_dev->cmd.toggle = 1;
 	hr_dev->cmd.max_cmds = CMD_MAX_NUM;
@@ -319,8 +319,6 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
 	hr_cmd->token_mask = CMD_TOKEN_MASK;
 	hr_cmd->use_events = 1;
 
-	down(&hr_cmd->poll_sem);
-
 	return 0;
 }
 
@@ -335,7 +333,6 @@ void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
 		down(&hr_cmd->event_sem);
 
 	kfree(hr_cmd->context);
-	up(&hr_cmd->poll_sem);
 }
 
 struct hns_roce_cmd_mailbox
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3417315..2afe075 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -34,6 +34,7 @@
 #define _HNS_ROCE_DEVICE_H
 
 #include <rdma/ib_verbs.h>
+#include <linux/mutex.h>
 
 #define DRV_NAME "hns_roce"
 
@@ -358,7 +359,7 @@ struct hns_roce_cmdq {
 	struct dma_pool		*pool;
 	u8 __iomem		*hcr;
 	struct mutex		hcr_mutex;
-	struct semaphore	poll_sem;
+	struct mutex		poll_mutex;
 	/*
 	* Event mode: cmd register mutex protection,
 	* ensure to not exceed max_cmds and user use limit region
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v4 04/10] IB/mthca: Replace semaphore poll_sem with mutex
From: Binoy Jayan @ 2016-10-27  6:59 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Binoy Jayan
In-Reply-To: <1477551554-30349-1-git-send-email-binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

The semaphore 'poll_sem' is a simple mutex, so it should be written as one.
Semaphores are going away in the future. So replace it with a mutex. Also,
remove mutex_[un]lock from mthca_cmd_use_events and mthca_cmd_use_polling
respectively.

Signed-off-by: Binoy Jayan <binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mthca/mthca_cmd.c | 10 +++-------
 drivers/infiniband/hw/mthca/mthca_cmd.h |  1 +
 drivers/infiniband/hw/mthca/mthca_dev.h |  2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index c7f49bb..49c6e19 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -347,7 +347,7 @@ static int mthca_cmd_poll(struct mthca_dev *dev,
 	unsigned long end;
 	u8 status;
 
-	down(&dev->cmd.poll_sem);
+	mutex_lock(&dev->cmd.poll_mutex);
 
 	err = mthca_cmd_post(dev, in_param,
 			     out_param ? *out_param : 0,
@@ -382,7 +382,7 @@ static int mthca_cmd_poll(struct mthca_dev *dev,
 	}
 
 out:
-	up(&dev->cmd.poll_sem);
+	mutex_unlock(&dev->cmd.poll_mutex);
 	return err;
 }
 
@@ -520,7 +520,7 @@ static int mthca_cmd_imm(struct mthca_dev *dev,
 int mthca_cmd_init(struct mthca_dev *dev)
 {
 	mutex_init(&dev->cmd.hcr_mutex);
-	sema_init(&dev->cmd.poll_sem, 1);
+	mutex_init(&dev->cmd.poll_mutex);
 	dev->cmd.flags = 0;
 
 	dev->hcr = ioremap(pci_resource_start(dev->pdev, 0) + MTHCA_HCR_BASE,
@@ -582,8 +582,6 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
 
 	dev->cmd.flags |= MTHCA_CMD_USE_EVENTS;
 
-	down(&dev->cmd.poll_sem);
-
 	return 0;
 }
 
@@ -600,8 +598,6 @@ void mthca_cmd_use_polling(struct mthca_dev *dev)
 		down(&dev->cmd.event_sem);
 
 	kfree(dev->cmd.context);
-
-	up(&dev->cmd.poll_sem);
 }
 
 struct mthca_mailbox *mthca_alloc_mailbox(struct mthca_dev *dev,
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.h b/drivers/infiniband/hw/mthca/mthca_cmd.h
index d2e5b19..a7f197e 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.h
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.h
@@ -35,6 +35,7 @@
 #ifndef MTHCA_CMD_H
 #define MTHCA_CMD_H
 
+#include <linux/mutex.h>
 #include <rdma/ib_verbs.h>
 
 #define MTHCA_MAILBOX_SIZE 4096
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index 4393a02..87ab964 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -120,7 +120,7 @@ enum {
 struct mthca_cmd {
 	struct pci_pool          *pool;
 	struct mutex              hcr_mutex;
-	struct semaphore 	  poll_sem;
+	struct mutex		  poll_mutex;
 	struct semaphore 	  event_sem;
 	int              	  max_cmds;
 	spinlock_t                context_lock;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion
From: Binoy Jayan @ 2016-10-27  6:59 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Binoy Jayan
In-Reply-To: <1477551554-30349-1-git-send-email-binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

The semaphore 'sem' in isert_device is used as completion, so convert
it to struct completion. Semaphores are going away in the future.

Signed-off-by: Binoy Jayan <binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 6 +++---
 drivers/infiniband/ulp/isert/ib_isert.h | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 6dd43f6..de80f56 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -619,7 +619,7 @@
 	mutex_unlock(&isert_np->mutex);
 
 	isert_info("np %p: Allow accept_np to continue\n", isert_np);
-	up(&isert_np->sem);
+	complete(&isert_np->comp);
 }
 
 static void
@@ -2311,7 +2311,7 @@ struct rdma_cm_id *
 		isert_err("Unable to allocate struct isert_np\n");
 		return -ENOMEM;
 	}
-	sema_init(&isert_np->sem, 0);
+	init_completion(&isert_np->comp);
 	mutex_init(&isert_np->mutex);
 	INIT_LIST_HEAD(&isert_np->accepted);
 	INIT_LIST_HEAD(&isert_np->pending);
@@ -2427,7 +2427,7 @@ struct rdma_cm_id *
 	int ret;
 
 accept_wait:
-	ret = down_interruptible(&isert_np->sem);
+	ret = wait_for_completion_interruptible(&isert_np->comp);
 	if (ret)
 		return -ENODEV;
 
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index c02ada5..a1277c0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -3,6 +3,7 @@
 #include <linux/in6.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
+#include <linux/completion.h>
 #include <rdma/rw.h>
 #include <scsi/iser.h>
 
@@ -190,7 +191,7 @@ struct isert_device {
 
 struct isert_np {
 	struct iscsi_np         *np;
-	struct semaphore	sem;
+	struct completion	comp;
 	struct rdma_cm_id	*cm_id;
 	struct mutex		mutex;
 	struct list_head	accepted;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v4 06/10] IB/hns: Replace counting semaphore event_sem with wait_event
From: Binoy Jayan @ 2016-10-27  6:59 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma, linux-kernel, Binoy Jayan
In-Reply-To: <1477551554-30349-1-git-send-email-binoy.jayan@linaro.org>

Counting semaphores are going away in the future, so replace the semaphore
mthca_cmd::event_sem with a conditional wait_event.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c    | 46 ++++++++++++++++++++---------
 drivers/infiniband/hw/hns/hns_roce_device.h |  2 +-
 2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 51a0675..12ef3d8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -189,6 +189,34 @@ void hns_roce_cmd_event(struct hns_roce_dev *hr_dev, u16 token, u8 status,
 	complete(&context->done);
 }
 
+static inline struct hns_roce_cmd_context *
+hns_roce_try_get_context(struct hns_roce_cmdq *cmd)
+{
+	struct hns_roce_cmd_context *context = NULL;
+
+	spin_lock(&cmd->context_lock);
+
+	if (cmd->free_head < 0)
+		goto out;
+
+	context = &cmd->context[cmd->free_head];
+	context->token += cmd->token_mask + 1;
+	cmd->free_head = context->next;
+out:
+	spin_unlock(&cmd->context_lock);
+	return context;
+}
+
+/* wait for and acquire a free context */
+static inline struct hns_roce_cmd_context *
+hns_roce_get_free_context(struct hns_roce_cmdq *cmd)
+{
+	struct hns_roce_cmd_context *context;
+
+	wait_event(cmd->wq, (context = hns_roce_try_get_context(cmd)));
+	return context;
+}
+
 /* this should be called with "use_events" */
 static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 				    u64 out_param, unsigned long in_modifier,
@@ -200,13 +228,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 	struct hns_roce_cmd_context *context;
 	int ret = 0;
 
-	spin_lock(&cmd->context_lock);
-	WARN_ON(cmd->free_head < 0);
-	context = &cmd->context[cmd->free_head];
-	context->token += cmd->token_mask + 1;
-	cmd->free_head = context->next;
-	spin_unlock(&cmd->context_lock);
-
+	context = hns_roce_get_free_context(cmd);
 	init_completion(&context->done);
 
 	ret = hns_roce_cmd_mbox_post_hw(hr_dev, in_param, out_param,
@@ -238,6 +260,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 	context->next = cmd->free_head;
 	cmd->free_head = context - cmd->context;
 	spin_unlock(&cmd->context_lock);
+	wake_up(&cmd->wq);
 
 	return ret;
 }
@@ -248,10 +271,8 @@ static int hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 {
 	int ret = 0;
 
-	down(&hr_dev->cmd.event_sem);
 	ret = __hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
 				       in_modifier, op_modifier, op, timeout);
-	up(&hr_dev->cmd.event_sem);
 
 	return ret;
 }
@@ -313,7 +334,7 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
 	hr_cmd->context[hr_cmd->max_cmds - 1].next = -1;
 	hr_cmd->free_head = 0;
 
-	sema_init(&hr_cmd->event_sem, hr_cmd->max_cmds);
+	init_waitqueue_head(&hr_cmd->wq);
 	spin_lock_init(&hr_cmd->context_lock);
 
 	hr_cmd->token_mask = CMD_TOKEN_MASK;
@@ -325,12 +346,9 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
 void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
 {
 	struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
-	int i;
 
 	hr_cmd->use_events = 0;
-
-	for (i = 0; i < hr_cmd->max_cmds; ++i)
-		down(&hr_cmd->event_sem);
+	hr_cmd->free_head = -1;
 
 	kfree(hr_cmd->context);
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 2afe075..ac95f52 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -364,7 +364,7 @@ struct hns_roce_cmdq {
 	* Event mode: cmd register mutex protection,
 	* ensure to not exceed max_cmds and user use limit region
 	*/
-	struct semaphore	event_sem;
+	wait_queue_head_t       wq;
 	int			max_cmds;
 	spinlock_t		context_lock;
 	int			free_head;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v4 07/10] IB/mthca: Replace counting semaphore event_sem with wait_event
From: Binoy Jayan @ 2016-10-27  6:59 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock
  Cc: Arnd Bergmann, linux-rdma, linux-kernel, Binoy Jayan
In-Reply-To: <1477551554-30349-1-git-send-email-binoy.jayan@linaro.org>

Counting semaphores are going away in the future, so replace the semaphore
mthca_cmd::event_sem with a conditional wait_event.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/hw/mthca/mthca_cmd.c | 47 ++++++++++++++++++++++-----------
 drivers/infiniband/hw/mthca/mthca_dev.h |  3 ++-
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 49c6e19..d6a048a 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -405,6 +405,34 @@ void mthca_cmd_event(struct mthca_dev *dev,
 	complete(&context->done);
 }
 
+static inline struct mthca_cmd_context *
+mthca_try_get_context(struct mthca_cmd *cmd)
+{
+	struct mthca_cmd_context *context = NULL;
+
+	spin_lock(&cmd->context_lock);
+
+	if (cmd->free_head < 0)
+		goto out;
+
+	context = &cmd->context[cmd->free_head];
+	context->token += cmd->token_mask + 1;
+	cmd->free_head = context->next;
+out:
+	spin_unlock(&cmd->context_lock);
+	return context;
+}
+
+/* wait for and acquire a free context */
+static inline struct mthca_cmd_context *
+mthca_get_free_context(struct mthca_cmd *cmd)
+{
+	struct mthca_cmd_context *context;
+
+	wait_event(cmd->wq, (context = mthca_try_get_context(cmd)));
+	return context;
+}
+
 static int mthca_cmd_wait(struct mthca_dev *dev,
 			  u64 in_param,
 			  u64 *out_param,
@@ -417,15 +445,7 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
 	int err = 0;
 	struct mthca_cmd_context *context;
 
-	down(&dev->cmd.event_sem);
-
-	spin_lock(&dev->cmd.context_lock);
-	BUG_ON(dev->cmd.free_head < 0);
-	context = &dev->cmd.context[dev->cmd.free_head];
-	context->token += dev->cmd.token_mask + 1;
-	dev->cmd.free_head = context->next;
-	spin_unlock(&dev->cmd.context_lock);
-
+	context = mthca_get_free_context(&dev->cmd);
 	init_completion(&context->done);
 
 	err = mthca_cmd_post(dev, in_param,
@@ -458,8 +478,8 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
 	context->next = dev->cmd.free_head;
 	dev->cmd.free_head = context - dev->cmd.context;
 	spin_unlock(&dev->cmd.context_lock);
+	wake_up(&dev->cmd.wq);
 
-	up(&dev->cmd.event_sem);
 	return err;
 }
 
@@ -571,7 +591,7 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
 	dev->cmd.context[dev->cmd.max_cmds - 1].next = -1;
 	dev->cmd.free_head = 0;
 
-	sema_init(&dev->cmd.event_sem, dev->cmd.max_cmds);
+	init_waitqueue_head(&dev->cmd.wq);
 	spin_lock_init(&dev->cmd.context_lock);
 
 	for (dev->cmd.token_mask = 1;
@@ -590,12 +610,9 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
  */
 void mthca_cmd_use_polling(struct mthca_dev *dev)
 {
-	int i;
-
 	dev->cmd.flags &= ~MTHCA_CMD_USE_EVENTS;
 
-	for (i = 0; i < dev->cmd.max_cmds; ++i)
-		down(&dev->cmd.event_sem);
+	dev->cmd.free_head = -1;
 
 	kfree(dev->cmd.context);
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index 87ab964..2fc86db 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -46,6 +46,7 @@
 #include <linux/list.h>
 #include <linux/semaphore.h>
 
+#include <rdma/ib_sa.h>
 #include "mthca_provider.h"
 #include "mthca_doorbell.h"
 
@@ -121,7 +122,7 @@ struct mthca_cmd {
 	struct pci_pool          *pool;
 	struct mutex              hcr_mutex;
 	struct mutex		  poll_mutex;
-	struct semaphore 	  event_sem;
+	wait_queue_head_t	  wq;
 	int              	  max_cmds;
 	spinlock_t                context_lock;
 	int                       free_head;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox