Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: [PATCH rdma-rc] IB/IPoIB: Remove can't use GFP_NOIO warning
From: Kamal Heib @ 2016-11-18 10:34 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Leon Romanovsky, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Mel Gorman,
	Jiri Kosina
In-Reply-To: <CAJ3xEMgCFmSMA5F-7D89n248K2HovvwHnnMgke7-FSDbROFA4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Nov 18, 2016 at 12:30 AM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Thu, Nov 10, 2016 at 10:16 AM, Leon Romanovsky wrote:
>> From: Kamal Heib <kamalh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
>> each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.
>>
>> This print become more annoying when the IPoIB interface is configured
>> to work in connected mode.
>>
>> Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
>
> Hi Kamal,

Hi Or :)

>
> I don't think you're fixing a bug in this commit... you find the print
> annoying and you remove it, okay, maybe we can do that. Why not leave
> it in rate-limited manner? can you elaborate what was that deeply
> annoying with getting the warning?
>

It's really annoying when you have multiple ipoib interfaces and
pkeys, so the dmesg will look like the following:

[933236.858739] mlx5_ib0: can't use GFP_NOIO for QPs on device mlx5_0,
using GFP_KERNEL
[933407.571911] mlx5_ib0.8002: can't use GFP_NOIO for QPs on device
mlx5_0, using GFP_KERNEL
[933634.121955] mlx5_ib0.8006: can't use GFP_NOIO for QPs on device
mlx5_0, using GFP_KERNEL
[934167.851164] mlx5_ib0.8004: can't use GFP_NOIO for QPs on device
mlx5_0, using GFP_KERNEL
[934936.645002] mlx5_ib0.8002: can't use GFP_NOIO for QPs on device
mlx5_0, using GFP_KERNEL

>> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
>> @@ -1053,8 +1053,6 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
>>
>>         tx_qp = ib_create_qp(priv->pd, &attr);
>>         if (PTR_ERR(tx_qp) == -EINVAL) {
>> -               ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
>> -                          priv->ca->name);
>>                 attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
>>                 tx_qp = ib_create_qp(priv->pd, &attr);
>>         }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion
From: Binoy Jayan @ 2016-11-18  9:10 UTC (permalink / raw)
  To: Arnd Bergmann, Mark Brown
  Cc: Sagi Grimberg, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, Linux kernel mailing list
In-Reply-To: <3477625.cZS9dUaHfE@wuerfel>

Hi Arnd,

On 18 November 2016 at 14:28, Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday, November 18, 2016 12:27:32 PM CET Binoy Jayan wrote:
>> Hi Sagi,

> I think you are right. This is behavior is actuallly documented in
> Documentation/scheduler/completion.txt:

Thanking for having a look.

> However, this is fairly unusual behavior and I wasn't immediately aware
> of it either when I read Sagi's reply. While your patch looks correct,
> it's probably a good idea to point out the counting behavior of this
> completion as explicitly as possible, in the changelog text of the patch
> as well as in a code comment and perhaps in the naming of the completion.

Will mention this and resend the patch series.

Thanks,
Binoy

^ permalink raw reply

* Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion
From: Arnd Bergmann @ 2016-11-18  8:58 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Sagi Grimberg, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, Linux kernel mailing list
In-Reply-To: <CAHv-k__HHRzK7+AZ912GVw9ToxZWMw3OccPqgGt_NBf6+RXdzA@mail.gmail.com>

On Friday, November 18, 2016 12:27:32 PM CET Binoy Jayan wrote:
> Hi Sagi,
> 
> On 31 October 2016 at 02:42, Sagi Grimberg <sagi@grimberg.me> wrote:
> >> The semaphore 'sem' in isert_device is used as completion, so convert
> >> it to struct completion. Semaphores are going away in the future.
> >
> >
> > Umm, this is 100% *not* true. np->sem is designed as a counting to
> > sync the iscsi login thread with the connect requests coming from the
> > initiators. So this is actually a reliable bug insertion :(
> >
> > NAK from me...
> 
> Sorry for the late reply as I was held up in other activities.
> 
> I converted this to a wait_event() implementation but as I was doing it,
> I was wondering how it would have been different if it was a completion
> and not a semaphore.
> 
> File: drivers/infiniband/ulp/isert/ib_isert.c
> 
> If isert_connected_handler() is called multiple times, adding an entry to the
> list, and if that happens while we use completion, 'done' (part of struct
> completion) would be incremented by 1 each time 'complete' is called from
> isert_connected_handler. After 'n' iterations, done will be equal to 'n'. If we
> call wait_for_completion now from isert_accept_np, it would just decrement
> 'done' by one and continue without blocking, consuming one node at a time
> from the list 'isert_np->pending'.
> 
> Alternatively if "done" becomes zero, and the next time wait_for_completion is
> called, the API would add a node at the end of the wait queue 'wait' in 'struct
> completion' and block until "done" is nonzero. (Ref: do_wait_for_common)
> It exists the wait when a call to 'complete' turns 'done' back to 1.
> But if there
> are multiple waits called before calling complete, all the tasks
> calling the wait
> gets queued up and they will all would see "done" set to zero. When complete
> is called now, done turns 1 again and the first task in the queue is woken up
> as it is serialized as FIFO. Now the first wait returns and the done is
> decremented by 1 just before the return.
> 
> Am I missing something here?

I think you are right. This is behavior is actuallly documented in
Documentation/scheduler/completion.txt:

  If complete() is called multiple times then this will allow for that number
  of waiters to continue - each call to complete() will simply increment the
  done element. Calling complete_all() multiple times is a bug though. Both
  complete() and complete_all() can be called in hard-irq/atomic context
  safely.

However, this is fairly unusual behavior and I wasn't immediately aware
of it either when I read Sagi's reply. While your patch looks correct,
it's probably a good idea to point out the counting behavior of this
completion as explicitly as possible, in the changelog text of the patch
as well as in a code comment and perhaps in the naming of the completion.

	Arnd

^ permalink raw reply

* Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion
From: Binoy Jayan @ 2016-11-18  6:57 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Arnd Bergmann,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux kernel mailing list
In-Reply-To: <a267cc5e-4d4f-368e-a159-3eecc2187969-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

Hi Sagi,

On 31 October 2016 at 02:42, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
>> The semaphore 'sem' in isert_device is used as completion, so convert
>> it to struct completion. Semaphores are going away in the future.
>
>
> Umm, this is 100% *not* true. np->sem is designed as a counting to
> sync the iscsi login thread with the connect requests coming from the
> initiators. So this is actually a reliable bug insertion :(
>
> NAK from me...

Sorry for the late reply as I was held up in other activities.

I converted this to a wait_event() implementation but as I was doing it,
I was wondering how it would have been different if it was a completion
and not a semaphore.

File: drivers/infiniband/ulp/isert/ib_isert.c

If isert_connected_handler() is called multiple times, adding an entry to the
list, and if that happens while we use completion, 'done' (part of struct
completion) would be incremented by 1 each time 'complete' is called from
isert_connected_handler. After 'n' iterations, done will be equal to 'n'. If we
call wait_for_completion now from isert_accept_np, it would just decrement
'done' by one and continue without blocking, consuming one node at a time
from the list 'isert_np->pending'.

Alternatively if "done" becomes zero, and the next time wait_for_completion is
called, the API would add a node at the end of the wait queue 'wait' in 'struct
completion' and block until "done" is nonzero. (Ref: do_wait_for_common)
It exists the wait when a call to 'complete' turns 'done' back to 1.
But if there
are multiple waits called before calling complete, all the tasks
calling the wait
gets queued up and they will all would see "done" set to zero. When complete
is called now, done turns 1 again and the first task in the queue is woken up
as it is serialized as FIFO. Now the first wait returns and the done is
decremented by 1 just before the return.

Am I missing something here?

Thanks,
Binoy
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Целевые клиентские базы Skype: prodawez390 Whatsapp: +79139230330 Viber:  +79139230330 Telegram: +79139230330 Email: prodawez391@gmail.com
From: crowther.heine-KK0ffGbhmjU@public.gmane.org @ 2016-11-18  2:05 UTC (permalink / raw)
  To: testuser-O5WfVfzUwx8@public.gmane.org

Целевые клиентские базы Skype: prodawez390 Whatsapp: +79139230330 Viber:  +79139230330 Telegram: +79139230330 Email: prodawez391-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PULL REQUEST] Please pull rdma.git
From: Doug Ledford @ 2016-11-18  2:01 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma, Linux Kernel
In-Reply-To: <CAJ3xEMjXYYnhS6qUzM9F+yjtq8Aahn08MjsSU4OnLS66Cu7mgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1.1: Type: text/plain, Size: 4002 bytes --]

On 11/17/2016 5:24 PM, Or Gerlitz wrote:
> On Thu, Nov 17, 2016 at 9:44 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> On 11/17/16 1:49 PM, Leon Romanovsky wrote:
>>> On Thu, Nov 17, 2016 at 07:13:54AM -0500, Doug Ledford wrote:
>>>> Hi Linus,
>>>>
>>>> Due to various issues, I've been away and couldn't send a pull request
>>>> for about three weeks.  There were a number of -rc patches that built up
>>>> in the meantime (some where there already from the early -rc stages).
>>>> Obviously, there were way too many to send now, so I tried to pare the
>>>> list down to the more important patches for the -rc cycle.
> 
> Hi Doug,
> 
> Except for the hfi1 patches, all the rest (core, mlx5, mlx4 and rxe)
> are marked now as only 21 hours old in your 4.9-rc branch and they
> seems be made from you picking partial subsets of multiple series,

Correct.

> with none of them acked by you on the list.

I had an all day meeting today and had to get out the door early.  The
patches will be responded to.

> If you agree that I am describing things correctly - how are we
> expected to follow on your patch picking? I find it sort of impossible
> and error prone.

When I started this I said the official, canonical source of information
on patches like this is patchworks.  That still holds true.  In this
case, I pulled the full series of patches into a single bundle, then
reviewed every patch individually.  I checked for importance and
dependence on other patches.  Those that I thought could be moved to
4.10 were moved into a new bundle and then removed from the existing
bundle.  In this way, the patches were always in one or the other.  When
I was done, I used git am on the two bundles and one into the 4.9-rc and
the other into a -next branch.  In that way I made sure I didn't miss
any from the four series that I pulled.  Finally, I used the bundles to
mark the patches as accepted in patchworks.  By marking the entire
bundles as accepted, and not individual patches, it makes sure that what
I mark accepted is the same as what I ran git am on.  So, if the patch
shows in patchworks as accepted, then I got it.  If it doesn't, then I
missed it.

>>> Are you adding the rest to your for-next branch? We would like to have
>>> enough time to check that nothing is lost.
> 
>> Yes, it's already there in the mlx-next branch on github.
> 
> Re the patches there, this one
> 
> IB/mlx4: Set traffic class in AH
> 
> "Set traffic class within sl_tclass_flowlabel when create iboe AH.
> Without this the TOS value will be empty when running VLAN tagged
> traffic, because the TOS value is taken from the traffic class in the
> address handle attributes.
> 
> Fixes: 9106c41 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')"
> 
> claims to fix my commit, I have approached Leon and Co for
> clarifications/questions over the list on the patch and nothing was
> answered.

I agree with you.  It doesn't fix your patch.  The commit message can
still be fixed up.

> Please do not send it to Linus and wait for them to respond. I
> disagree that it fixes my commit b/c my commit was prior to when
> route-able RoCE  was introduced and on that time TOS had no relation.

I agree.  A better fix tag would be the commit that added RoCEv2 support.

> and this one
> 
> "IB/mlx4: Put non zero value in max_ah device attribute
> 
> Use INT_MAX since this is the max value the attribute can hold, though
> hardware capability is unlimited.
> 
> Fixes: 225c7b1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')"
> 
> does a tiny enhancement for a 10y old commit of Roland, why you think
> we need it in 4.9-rc6 or 7??

I don't, it's in the mlx-next branch which means I'll queue it up for
the 4.10 merge window.  I have no plan on sending that branch for 4.9-rc.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH rdma-rc 5/6] IB/core: Save QP in ib_flow structure
From: Or Gerlitz @ 2016-11-17 22:41 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAJ3xEMg=3eOqsKcrfcOoBkdpjiCmEAcPfNixHeLfVy45v_uvnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Nov 18, 2016 at 12:40 AM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Fixes: 319a441d1361 ("IB/core: Add receive flow steering support")
>
> same comment as the other two, probably applies some, most or all
> patches on your git hub branch, lets use what was submitted and not
> something else, okay?

Also, as I commented and you didn't respond, this fix protects against
**future** bug and has nothing to do with intree code. As such there's
no point to put it
into rc fix, it's a -next thing
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 5/6] IB/core: Save QP in ib_flow structure
From: Or Gerlitz @ 2016-11-17 22:40 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575391-20134-6-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

> Fixes: 319a441d1361 ("IB/core: Add receive flow steering support")

same comment as the other two, probably applies some, most or all
patches on your git hub branch, lets use what was submitted and not
something else, okay?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 12/12] IB/mlx5: Limit mkey page size to 2GB
From: Or Gerlitz @ 2016-11-17 22:38 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575407-20562-13-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

> Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')

the above line is different @ your git hub copy, bug as well?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 10/12] IB/mlx5: Fix reported max SGE calculation
From: Or Gerlitz @ 2016-11-17 22:36 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575407-20562-11-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Oct 27, 2016 at 3:36 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Add the 512 bytes limit of RDMA READ and the size of remote
> address to the max SGE calculation.
>
> Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')

Doug, for some reason the above line is different in the commit
present @ your git hub tree, bug somewhere?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/IPoIB: Remove can't use GFP_NOIO warning
From: Or Gerlitz @ 2016-11-17 22:30 UTC (permalink / raw)
  To: Kamal Heib, Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Mel Gorman, Jiri Kosina
In-Reply-To: <1478765808-12517-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Nov 10, 2016 at 10:16 AM, Leon Romanovsky wrote:
> From: Kamal Heib <kamalh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
> each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.
>
> This print become more annoying when the IPoIB interface is configured
> to work in connected mode.
>
> Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations')

Hi Kamal,

I don't think you're fixing a bug in this commit... you find the print
annoying and you remove it, okay, maybe we can do that. Why not leave
it in rate-limited manner? can you elaborate what was that deeply
annoying with getting the warning?

> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -1053,8 +1053,6 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
>
>         tx_qp = ib_create_qp(priv->pd, &attr);
>         if (PTR_ERR(tx_qp) == -EINVAL) {
> -               ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> -                          priv->ca->name);
>                 attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
>                 tx_qp = ib_create_qp(priv->pd, &attr);
>         }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PULL REQUEST] Please pull rdma.git
From: Or Gerlitz @ 2016-11-17 22:24 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma, Linux Kernel
In-Reply-To: <20161117200203.GQ4240-2ukJVAZIZ/Y@public.gmane.org>

On Thu, Nov 17, 2016 at 9:44 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 11/17/16 1:49 PM, Leon Romanovsky wrote:
>> On Thu, Nov 17, 2016 at 07:13:54AM -0500, Doug Ledford wrote:
>>> Hi Linus,
>>>
>>> Due to various issues, I've been away and couldn't send a pull request
>>> for about three weeks.  There were a number of -rc patches that built up
>>> in the meantime (some where there already from the early -rc stages).
>>> Obviously, there were way too many to send now, so I tried to pare the
>>> list down to the more important patches for the -rc cycle.

Hi Doug,

Except for the hfi1 patches, all the rest (core, mlx5, mlx4 and rxe)
are marked now as only 21 hours old in your 4.9-rc branch and they
seems be made from you picking partial subsets of multiple series,
with none of them acked by you on the list.

If you agree that I am describing things correctly - how are we
expected to follow on your patch picking? I find it sort of impossible
and error prone.

>> Are you adding the rest to your for-next branch? We would like to have
>> enough time to check that nothing is lost.

> Yes, it's already there in the mlx-next branch on github.

Re the patches there, this one

IB/mlx4: Set traffic class in AH

"Set traffic class within sl_tclass_flowlabel when create iboe AH.
Without this the TOS value will be empty when running VLAN tagged
traffic, because the TOS value is taken from the traffic class in the
address handle attributes.

Fixes: 9106c41 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')"

claims to fix my commit, I have approached Leon and Co for
clarifications/questions over the list on the patch and nothing was
answered.

Please do not send it to Linus and wait for them to respond. I
disagree that it fixes my commit b/c my commit was prior to when
route-able RoCE  was introduced and on that time TOS had no relation.

and this one

"IB/mlx4: Put non zero value in max_ah device attribute

Use INT_MAX since this is the max value the attribute can hold, though
hardware capability is unlimited.

Fixes: 225c7b1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')"

does a tiny enhancement for a 10y old commit of Roland, why you think
we need it in 4.9-rc6 or 7??

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: NFSD generic R/W API (sendto path) performance results
From: Chuck Lever @ 2016-11-17 20:42 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, Steve Wise, List Linux RDMA Mailing
In-Reply-To: <c6190e4c-9b8e-3937-ba38-7861eebeaaae-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

> On Nov 17, 2016, at 3:20 PM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> 
> 
>>>> Also did you try to always register for > max_sge
>>>> calls?  The code can already register all segments with the
>>>> rdma_rw_force_mr module option, so it would only need a small tweak for
>>>> that behavior.
>>> 
>>> For various reasons I decided the design should build one WR chain for
>>> each RDMA segment provided by the client. Good clients expose just
>>> one RDMA segment for the whole NFS READ payload.
>>> 
>>> Does force_mr make the generic API use FRWR with RDMA Write? I had
>>> assumed it changed only the behavior with RDMA Read. I'll try that
>>> too, if RDMA Write can easily be made to use FRWR.
>> 
>> Unfortunately, some RPC replies are formed from two or three
>> discontiguous buffers. The gap test in ib_sg_to_pages returns
>> a smaller number than sg_nents in this case, and rdma_rw_init_ctx
>> fails.
>> 
>> Thus with my current prototype I'm not able to test with FRWR.
>> 
>> I could fix this in my prototype, but it would be nicer for me if
>> rdma_rw_init_ctx handled this case the same for FRWR as it does
>> for physical addressing, which doesn't seem to have any problem
>> with a discontiguous SGL.
>> 
>> 
>>> But I'd like a better explanation for this result. Could be a bug
>>> in my implementation, my design, or in the driver. Continuing to
>>> investigate.
> 
> Hi Chuck, sorry for the late reply (have been busy lately..)
> 
> I think that the Call-to-first-Write phenomenon you are seeing makes
> perfect sense, the question is, is a QD=1 1M transfers latency that
> interesting? Did you see a positive effect on small (say 4k) transfers?
> both latency and iops scalability should be able to improve especially
> when serving multiple clients.
> 
> If indeed you feel that this is an interesting workload to optimize, I
> think we can come up with something.

I believe 1MB transfers are interesting: NFS is frequently used in
back-up scenarios, for example, and believe it or not, also for
non-linear editing and animation (4K video).

QD=1 exposes the individual components of latency. In this case, we
can clearly see the cost of preparing the data payload for transfer.
It's basically a tweak so we can debug the problem.

In the "Real World" I expect to see larger transfers, where several
1MB I/Os are dispatched in parallel. I don't reach fabric bandwidth
until 10 or more are in flight, which I think should be improved.

Wrt 4KB, I didn't see much change there, though I admit I didn't
expect much change since both cases have to DMA map a page, and post
one Write WR, so I haven't looked too closely. I'm already down
around 32us for a 4KB NFS READ, even without the server changes,
and 30us for 4KB NFS READ all-inline.

I agree, though, that there is a 3:1 reduction in ib_post_send calls
with the generic API in my test harness, and that can only be a good
thing.

> About the First-to-last-Write, thats weird, and sound like a bug
> somewhere. Maybe Mellanox folks can tell us if splitting 1M to multiple
> writes works better (although I cannot comprehend why).
> 
> Question, are the send and receive cqs still in IB_POLL_SOFTIRQ mode?

Yes, both are still in SOFTIRQ. A while back I played with using the
Work Queue mode, and noticed some slowdowns when the Send CQ was
changed to use Work Queue. I didn't explore it further.

Someday, these will need to be changed, so that the server runs
entirely in process context (as it does for TCP, AFAICT). That gets
rid of BH flapping, but adds heavy-weight context switching latencies.

--
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: NFSD generic R/W API (sendto path) performance results
From: Chuck Lever @ 2016-11-17 20:20 UTC (permalink / raw)
  To: Steve Wise; +Cc: Christoph Hellwig, Sagi Grimberg, List Linux RDMA Mailing
In-Reply-To: <01f001d2410d$b16c97f0$1445c7d0$@opengridcomputing.com>


> On Nov 17, 2016, at 3:03 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
> 
>>> 
>>> On Nov 17, 2016, at 10:04 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>> 
>>>> On Nov 17, 2016, at 7:46 AM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
>>>> Also did you try to always register for > max_sge
>>>> calls?  The code can already register all segments with the
>>>> rdma_rw_force_mr module option, so it would only need a small tweak for
>>>> that behavior.
>>> 
>>> For various reasons I decided the design should build one WR chain for
>>> each RDMA segment provided by the client. Good clients expose just
>>> one RDMA segment for the whole NFS READ payload.
>>> 
>>> Does force_mr make the generic API use FRWR with RDMA Write? I had
>>> assumed it changed only the behavior with RDMA Read. I'll try that
>>> too, if RDMA Write can easily be made to use FRWR.
>> 
>> Unfortunately, some RPC replies are formed from two or three
>> discontiguous buffers. The gap test in ib_sg_to_pages returns
>> a smaller number than sg_nents in this case, and rdma_rw_init_ctx
>> fails.
>> 
>> Thus with my current prototype I'm not able to test with FRWR.
>> 
>> I could fix this in my prototype, but it would be nicer for me if
>> rdma_rw_init_ctx handled this case the same for FRWR as it does
>> for physical addressing, which doesn't seem to have any problem
>> with a discontiguous SGL.
> 
> Just to make sure I'm understanding you, for rdma-rw to handle this,  it would
> have to use multiple REG_MR registrations, one for each contiguous area in the
> scatter list.  
> 
> Right?

Right, that's the approach the NFS client takes. See
net/sunrpc/xprtrdma/frwr_ops.c :: frwr_op_map.

If the passed-in memory list isn't contiguous, frwr_op_map stops
registering and returns to the caller, who allocates another
MR and calls in again with the remaining part of the list.

I think this would not apply to SG_GAP MRs, which should
already be able to handle discontiguous SGLs?

Note this doesn't apply to most NFS READs, where just the data
payload is going via RDMA Write, and the payload is already in
a contiguous piece of memory. But Reply chunks, which are used
for READDIRs and other requests, can be built from discontiguous
memory.

I haven't looked closely at the RDMA Read logic, but I think
it always reads into a contiguous set of pages, then builds
the xdr_buf out of that. It shouldn't have the same problem
(and it is already known to work with FRWR ;-).


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: NFSD generic R/W API (sendto path) performance results
From: Sagi Grimberg @ 2016-11-17 20:20 UTC (permalink / raw)
  To: Chuck Lever, Christoph Hellwig; +Cc: Steve Wise, List Linux RDMA Mailing
In-Reply-To: <676323E9-2F30-4DB0-AEF8-CDE38E8A0715-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>


>>> Also did you try to always register for > max_sge
>>> calls?  The code can already register all segments with the
>>> rdma_rw_force_mr module option, so it would only need a small tweak for
>>> that behavior.
>>
>> For various reasons I decided the design should build one WR chain for
>> each RDMA segment provided by the client. Good clients expose just
>> one RDMA segment for the whole NFS READ payload.
>>
>> Does force_mr make the generic API use FRWR with RDMA Write? I had
>> assumed it changed only the behavior with RDMA Read. I'll try that
>> too, if RDMA Write can easily be made to use FRWR.
>
> Unfortunately, some RPC replies are formed from two or three
> discontiguous buffers. The gap test in ib_sg_to_pages returns
> a smaller number than sg_nents in this case, and rdma_rw_init_ctx
> fails.
>
> Thus with my current prototype I'm not able to test with FRWR.
>
> I could fix this in my prototype, but it would be nicer for me if
> rdma_rw_init_ctx handled this case the same for FRWR as it does
> for physical addressing, which doesn't seem to have any problem
> with a discontiguous SGL.
>
>
>> But I'd like a better explanation for this result. Could be a bug
>> in my implementation, my design, or in the driver. Continuing to
>> investigate.

Hi Chuck, sorry for the late reply (have been busy lately..)

I think that the Call-to-first-Write phenomenon you are seeing makes
perfect sense, the question is, is a QD=1 1M transfers latency that
interesting? Did you see a positive effect on small (say 4k) transfers?
both latency and iops scalability should be able to improve especially
when serving multiple clients.

If indeed you feel that this is an interesting workload to optimize, I
think we can come up with something.

About the First-to-last-Write, thats weird, and sound like a bug
somewhere. Maybe Mellanox folks can tell us if splitting 1M to multiple
writes works better (although I cannot comprehend why).

Question, are the send and receive cqs still in IB_POLL_SOFTIRQ mode?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: NFSD generic R/W API (sendto path) performance results
From: Steve Wise @ 2016-11-17 20:03 UTC (permalink / raw)
  To: 'Chuck Lever', 'Christoph Hellwig'
  Cc: 'Sagi Grimberg', 'List Linux RDMA Mailing'
In-Reply-To: <676323E9-2F30-4DB0-AEF8-CDE38E8A0715-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

> 
> > On Nov 17, 2016, at 10:04 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> >
> >> On Nov 17, 2016, at 7:46 AM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> >> Also did you try to always register for > max_sge
> >> calls?  The code can already register all segments with the
> >> rdma_rw_force_mr module option, so it would only need a small tweak for
> >> that behavior.
> >
> > For various reasons I decided the design should build one WR chain for
> > each RDMA segment provided by the client. Good clients expose just
> > one RDMA segment for the whole NFS READ payload.
> >
> > Does force_mr make the generic API use FRWR with RDMA Write? I had
> > assumed it changed only the behavior with RDMA Read. I'll try that
> > too, if RDMA Write can easily be made to use FRWR.
> 
> Unfortunately, some RPC replies are formed from two or three
> discontiguous buffers. The gap test in ib_sg_to_pages returns
> a smaller number than sg_nents in this case, and rdma_rw_init_ctx
> fails.
> 
> Thus with my current prototype I'm not able to test with FRWR.
> 
> I could fix this in my prototype, but it would be nicer for me if
> rdma_rw_init_ctx handled this case the same for FRWR as it does
> for physical addressing, which doesn't seem to have any problem
> with a discontiguous SGL.

Just to make sure I'm understanding you, for rdma-rw to handle this,  it would
have to use multiple REG_MR registrations, one for each contiguous area in the
scatter list.  

Right?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PULL REQUEST] Please pull rdma.git
From: Leon Romanovsky @ 2016-11-17 20:02 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma
In-Reply-To: <582E089A.3040106-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]

On Thu, Nov 17, 2016 at 02:44:26PM -0500, Doug Ledford wrote:
> On 11/17/16 1:49 PM, Leon Romanovsky wrote:
> > On Thu, Nov 17, 2016 at 07:13:54AM -0500, Doug Ledford wrote:
> >> Hi Linus,
> >>
> >> Due to various issues, I've been away and couldn't send a pull request
> >> for about three weeks.  There were a number of -rc patches that built up
> >> in the meantime (some where there already from the early -rc stages).
> >> Obviously, there were way too many to send now, so I tried to pare the
> >> list down to the more important patches for the -rc cycle.
> >
> > Hi Doug,
> > Are you adding the rest to your for-next branch? We would like to have
> > enough time to check that nothing is lost.
> >
> > Thanks
> >
>
> Yes, it's already there in the mlx-next branch on github.

Thanks,
Do I understand you correctly that it is not final version and you
didn't upload general branch yet?

There is patch for the MAD which will be great to see in the list too.
https://patchwork.kernel.org/patch/9382745/
http://marc.info/?l=linux-rdma&m=147681123708587&w=2

Thanks

>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>    GPG Key ID: 0E572FDD
>   Red Hat, Inc.
>   100 E. Davie St
>   Raleigh, NC 27601 USA
>




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PULL REQUEST] Please pull rdma.git
From: Doug Ledford @ 2016-11-17 19:44 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma
In-Reply-To: <20161117184950.GP4240-2ukJVAZIZ/Y@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 866 bytes --]

On 11/17/16 1:49 PM, Leon Romanovsky wrote:
> On Thu, Nov 17, 2016 at 07:13:54AM -0500, Doug Ledford wrote:
>> Hi Linus,
>>
>> Due to various issues, I've been away and couldn't send a pull request
>> for about three weeks.  There were a number of -rc patches that built up
>> in the meantime (some where there already from the early -rc stages).
>> Obviously, there were way too many to send now, so I tried to pare the
>> list down to the more important patches for the -rc cycle.
> 
> Hi Doug,
> Are you adding the rest to your for-next branch? We would like to have
> enough time to check that nothing is lost.
> 
> Thanks
> 

Yes, it's already there in the mlx-next branch on github.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>    GPG Key ID: 0E572FDD
  Red Hat, Inc.
  100 E. Davie St
  Raleigh, NC 27601 USA


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 907 bytes --]

^ permalink raw reply

* Re: NFSD generic R/W API (sendto path) performance results
From: Chuck Lever @ 2016-11-17 19:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Steve Wise, Sagi Grimberg, List Linux RDMA Mailing
In-Reply-To: <84B43CFF-EBF7-4758-8751-8C97102C5BCF-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>


> On Nov 17, 2016, at 10:04 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> On Nov 17, 2016, at 7:46 AM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
>> Also did you try to always register for > max_sge
>> calls?  The code can already register all segments with the
>> rdma_rw_force_mr module option, so it would only need a small tweak for
>> that behavior.
> 
> For various reasons I decided the design should build one WR chain for
> each RDMA segment provided by the client. Good clients expose just
> one RDMA segment for the whole NFS READ payload.
> 
> Does force_mr make the generic API use FRWR with RDMA Write? I had
> assumed it changed only the behavior with RDMA Read. I'll try that
> too, if RDMA Write can easily be made to use FRWR.

Unfortunately, some RPC replies are formed from two or three
discontiguous buffers. The gap test in ib_sg_to_pages returns
a smaller number than sg_nents in this case, and rdma_rw_init_ctx
fails.

Thus with my current prototype I'm not able to test with FRWR.

I could fix this in my prototype, but it would be nicer for me if
rdma_rw_init_ctx handled this case the same for FRWR as it does
for physical addressing, which doesn't seem to have any problem
with a discontiguous SGL.


> But I'd like a better explanation for this result. Could be a bug
> in my implementation, my design, or in the driver. Continuing to
> investigate.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PULL REQUEST] Please pull rdma.git
From: Leon Romanovsky @ 2016-11-17 18:49 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma
In-Reply-To: <58466423-c87e-3921-101e-bffab8989fd8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 562 bytes --]

On Thu, Nov 17, 2016 at 07:13:54AM -0500, Doug Ledford wrote:
> Hi Linus,
>
> Due to various issues, I've been away and couldn't send a pull request
> for about three weeks.  There were a number of -rc patches that built up
> in the meantime (some where there already from the early -rc stages).
> Obviously, there were way too many to send now, so I tried to pare the
> list down to the more important patches for the -rc cycle.

Hi Doug,
Are you adding the rest to your for-next branch? We would like to have
enough time to check that nothing is lost.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: rdma-core release process questions
From: Nikolova, Tatyana E @ 2016-11-17 18:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161117102846.GJ4240-2ukJVAZIZ/Y@public.gmane.org>

Hi Leon and Jason,

Thank you for the information.

Tatyana

-----Original Message-----
From: Leon Romanovsky [mailto:leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org] 
Sent: Thursday, November 17, 2016 4:29 AM
To: Nikolova, Tatyana E <tatyana.e.nikolova-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>; Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: rdma-core release process questions

On Thu, Nov 17, 2016 at 04:56:45AM +0000, Nikolova, Tatyana E wrote:
> Hi,
>
> We are submitting patches to the kernel space driver i40iw and to the user space plugin libi40iw, which is currently part of rdma-core. Some of the changes need to be coordinated so that they appear in both kernel space and user space in corresponding releases. We have some questions regarding the process about submitting patches to rdma-core which have dependencies on kernel patches.
>
> 1) Can user space patches target a for-next rdma-core release, if the corresponding kernel patches are queued for the next kernel?

I don't see any problem with that, once the patches accepted for the -next by Doug, they can be accepted to the rdma-core too. Anyway these changes should be compatible with old kernel without such new feature.

> 2) How are ABI changes handled in rdma-core?

Do you have specific thing in mind?
Generally speaking, send to ML pass review and we will apply.

>
> 3) Could you explain the release process for rdma-core?

The process as agreed will be something like that:
a. Review/accept/decline patches in 1-2 weeks time frame.
b. Once kernel released, stop accepting new features.
c. Wait for 1-2 weeks to see no one complains. It is just to be on safe side, because the library is always ready for release and checked constantly.
d. Create new tag and push release.

>
> 4) Is each rdma-core release going to correspond to a specific kernel version?

As Jason wrote, It will be aligned in release time to the kernel, but library should remain backward compatible.

>
> Thank you,
> Tatyana
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-next 0/4] Add packet pacing support for IB verbs
From: Leon Romanovsky @ 2016-11-17 18:15 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477909297-14491-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2661 bytes --]

On Mon, Oct 31, 2016 at 12:21:33PM +0200, Leon Romanovsky wrote:
> When sending from a 10G host to a 1G host, it is easy to overrun the receiver,
> leading to packet loss and traffic backing off. Similar problems occur when
> a 10G host sends data to a sub-10G virtual circuit, or a 40G host sending
> to a 10G host. Packet pacing could control packet injection rate and reduces
> network congestion to maximize throughput & minimize network latency.
>
> Packet pacing is a rate limiting and shaping for a QP (SQ for RAW QP), set
> and change the rate is done by modifying QP. This series of patch made the
> following high level changes:
>  1. Report rate limit capabilities through user data. Reported capabilities
>     include: The maximum and minimum rate limit in kbps supported by packet
>     pacing; Bitmap showing which QP types are supported by packet pacing
>     operation.
>  2. Extend modify QP interface for growing attributes. Add rate limit support
>     to the extended interface.
>  3. Enable mlx5-based hardware to be able to update the rate limit for
>     RAW QP packet.
>
> Available in the "topic/packet_pacing" topic branch of this git repo:
> git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git
>
> Or for browsing:
> https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/packet_pacing

Hi Doug,

Please drop this patch series, we discovered an issue with proposed
modify_qp implementation and will respin it.

Sorry for the inconvenience.

Thanks

>
> Thanks,
>   Bodong & Leon
>
> Bodong Wang (4):
>   IB/mlx5: Report mlx5 packet pacing capabilities when querying device
>   IB/core: Support rate limit for packet pacing
>   IB/uverbs: Extend modify_qp and support packet pacing
>   IB/mlx5: Update the rate limit according to user setting for RAW QP
>
>  drivers/infiniband/core/uverbs.h      |   1 +
>  drivers/infiniband/core/uverbs_cmd.c  | 178 +++++++++++++++++++++-------------
>  drivers/infiniband/core/uverbs_main.c |   1 +
>  drivers/infiniband/core/verbs.c       |   2 +
>  drivers/infiniband/hw/mlx5/main.c     |  16 ++-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h  |   1 +
>  drivers/infiniband/hw/mlx5/qp.c       |  71 ++++++++++++--
>  include/rdma/ib_verbs.h               |   2 +
>  include/uapi/rdma/ib_user_verbs.h     |  12 +++
>  include/uapi/rdma/mlx5-abi.h          |  13 +++
>  10 files changed, 219 insertions(+), 78 deletions(-)
>
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [patch] IB/rxe: Remove unneeded cast in rxe_srq_from_attr()
From: Yuval Shaia @ 2016-11-17 16:38 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Moni Shoua, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161117121554.GA4292-Hxa29pjIrETlQW142y8m19+IiqhCXseY@public.gmane.org>

Besides the soft-aggressive commit message -:)
Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
 
On Thu, Nov 17, 2016 at 02:00:05PM +0300, Dan Carpenter wrote:
> It makes me nervous when we cast pointer parameters.  I would estimate
> that around 50% of the time, it indicates a bug.  Here the cast is not
> needed becaue u32 and and unsigned int are the same thing.  Removing the
> cast makes the code more robust and future proof in case any of the
> types change.
> 
> Signed-off-by: Dan Carpenter <dan.carpenter-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_srq.c b/drivers/infiniband/sw/rxe/rxe_srq.c
> index 2a6e3cd..efc832a 100644
> --- a/drivers/infiniband/sw/rxe/rxe_srq.c
> +++ b/drivers/infiniband/sw/rxe/rxe_srq.c
> @@ -169,7 +169,7 @@ int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
>  			}
>  		}
>  
> -		err = rxe_queue_resize(q, (unsigned int *)&attr->max_wr,
> +		err = rxe_queue_resize(q, &attr->max_wr,
>  				       rcv_wqe_size(srq->rq.max_sge),
>  				       srq->rq.queue->ip ?
>  						srq->rq.queue->ip->context :
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: NFSD generic R/W API (sendto path) performance results
From: Chuck Lever @ 2016-11-17 15:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Steve Wise, Sagi Grimberg, List Linux RDMA Mailing
In-Reply-To: <20161117124602.GA25821-jcswGhMUV9g@public.gmane.org>


> On Nov 17, 2016, at 7:46 AM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> 
> On Wed, Nov 16, 2016 at 02:45:33PM -0500, Chuck Lever wrote:
>> Out of curiosity, I hacked up my NFS client to limit the size of RDMA
>> segments to 30 pages (the server HCA's max_sge).
>> 
>> A 1MB NFS READ now takes 9 segments. That forces the after-conversion
>> server to build single-Write chains and use 9 post_send calls to
>> transmit the READ payload, just like the before-conversion server.
>> 
>> Performance of before- and after-conversion servers is now equivalent.
>> 
>>              kB  reclen    write  rewrite    read    reread
>>         2097152    1024  1061237  1141614  1961410  2000223                                                                                  
> 
> What HCA is this, btw?

ConnectX-3 Pro, f/w 2.31.5050


> Also did you try to always register for > max_sge
> calls?  The code can already register all segments with the
> rdma_rw_force_mr module option, so it would only need a small tweak for
> that behavior.

For various reasons I decided the design should build one WR chain for
each RDMA segment provided by the client. Good clients expose just
one RDMA segment for the whole NFS READ payload.

Does force_mr make the generic API use FRWR with RDMA Write? I had
assumed it changed only the behavior with RDMA Read. I'll try that
too, if RDMA Write can easily be made to use FRWR.

But I'd like a better explanation for this result. Could be a bug
in my implementation, my design, or in the driver. Continuing to
investigate.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch] IB/rxe: Remove unneeded cast in rxe_srq_from_attr()
From: Moni Shoua @ 2016-11-17 13:51 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, linux-rdma,
	kernel-janitors
In-Reply-To: <20161117110005.GB32143@mwanda>

On Thu, Nov 17, 2016 at 1:00 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> It makes me nervous when we cast pointer parameters.  I would estimate
> that around 50% of the time, it indicates a bug.  Here the cast is not
> needed becaue u32 and and unsigned int are the same thing.  Removing the
> cast makes the code more robust and future proof in case any of the
> types change.
>
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Moni Shoua <monis@mellanox.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox