Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: NFSD generic R/W API (sendto path) performance results
From: Chuck Lever @ 2016-11-17 20:20 UTC (permalink / raw)
  To: Steve Wise; +Cc: Christoph Hellwig, Sagi Grimberg, List Linux RDMA Mailing
In-Reply-To: <01f001d2410d$b16c97f0$1445c7d0$@opengridcomputing.com>


> On Nov 17, 2016, at 3:03 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
> 
>>> 
>>> On Nov 17, 2016, at 10:04 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>> 
>>>> On Nov 17, 2016, at 7:46 AM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
>>>> Also did you try to always register for > max_sge
>>>> calls?  The code can already register all segments with the
>>>> rdma_rw_force_mr module option, so it would only need a small tweak for
>>>> that behavior.
>>> 
>>> For various reasons I decided the design should build one WR chain for
>>> each RDMA segment provided by the client. Good clients expose just
>>> one RDMA segment for the whole NFS READ payload.
>>> 
>>> Does force_mr make the generic API use FRWR with RDMA Write? I had
>>> assumed it changed only the behavior with RDMA Read. I'll try that
>>> too, if RDMA Write can easily be made to use FRWR.
>> 
>> Unfortunately, some RPC replies are formed from two or three
>> discontiguous buffers. The gap test in ib_sg_to_pages returns
>> a smaller number than sg_nents in this case, and rdma_rw_init_ctx
>> fails.
>> 
>> Thus with my current prototype I'm not able to test with FRWR.
>> 
>> I could fix this in my prototype, but it would be nicer for me if
>> rdma_rw_init_ctx handled this case the same for FRWR as it does
>> for physical addressing, which doesn't seem to have any problem
>> with a discontiguous SGL.
> 
> Just to make sure I'm understanding you, for rdma-rw to handle this,  it would
> have to use multiple REG_MR registrations, one for each contiguous area in the
> scatter list.  
> 
> Right?

Right, that's the approach the NFS client takes. See
net/sunrpc/xprtrdma/frwr_ops.c :: frwr_op_map.

If the passed-in memory list isn't contiguous, frwr_op_map stops
registering and returns to the caller, who allocates another
MR and calls in again with the remaining part of the list.

I think this would not apply to SG_GAP MRs, which should
already be able to handle discontiguous SGLs?

Note this doesn't apply to most NFS READs, where just the data
payload is going via RDMA Write, and the payload is already in
a contiguous piece of memory. But Reply chunks, which are used
for READDIRs and other requests, can be built from discontiguous
memory.

I haven't looked closely at the RDMA Read logic, but I think
it always reads into a contiguous set of pages, then builds
the xdr_buf out of that. It shouldn't have the same problem
(and it is already known to work with FRWR ;-).


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: NFSD generic R/W API (sendto path) performance results
From: Chuck Lever @ 2016-11-17 20:42 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, Steve Wise, List Linux RDMA Mailing
In-Reply-To: <c6190e4c-9b8e-3937-ba38-7861eebeaaae-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

> On Nov 17, 2016, at 3:20 PM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> 
> 
>>>> Also did you try to always register for > max_sge
>>>> calls?  The code can already register all segments with the
>>>> rdma_rw_force_mr module option, so it would only need a small tweak for
>>>> that behavior.
>>> 
>>> For various reasons I decided the design should build one WR chain for
>>> each RDMA segment provided by the client. Good clients expose just
>>> one RDMA segment for the whole NFS READ payload.
>>> 
>>> Does force_mr make the generic API use FRWR with RDMA Write? I had
>>> assumed it changed only the behavior with RDMA Read. I'll try that
>>> too, if RDMA Write can easily be made to use FRWR.
>> 
>> Unfortunately, some RPC replies are formed from two or three
>> discontiguous buffers. The gap test in ib_sg_to_pages returns
>> a smaller number than sg_nents in this case, and rdma_rw_init_ctx
>> fails.
>> 
>> Thus with my current prototype I'm not able to test with FRWR.
>> 
>> I could fix this in my prototype, but it would be nicer for me if
>> rdma_rw_init_ctx handled this case the same for FRWR as it does
>> for physical addressing, which doesn't seem to have any problem
>> with a discontiguous SGL.
>> 
>> 
>>> But I'd like a better explanation for this result. Could be a bug
>>> in my implementation, my design, or in the driver. Continuing to
>>> investigate.
> 
> Hi Chuck, sorry for the late reply (have been busy lately..)
> 
> I think that the Call-to-first-Write phenomenon you are seeing makes
> perfect sense, the question is, is a QD=1 1M transfers latency that
> interesting? Did you see a positive effect on small (say 4k) transfers?
> both latency and iops scalability should be able to improve especially
> when serving multiple clients.
> 
> If indeed you feel that this is an interesting workload to optimize, I
> think we can come up with something.

I believe 1MB transfers are interesting: NFS is frequently used in
back-up scenarios, for example, and believe it or not, also for
non-linear editing and animation (4K video).

QD=1 exposes the individual components of latency. In this case, we
can clearly see the cost of preparing the data payload for transfer.
It's basically a tweak so we can debug the problem.

In the "Real World" I expect to see larger transfers, where several
1MB I/Os are dispatched in parallel. I don't reach fabric bandwidth
until 10 or more are in flight, which I think should be improved.

Wrt 4KB, I didn't see much change there, though I admit I didn't
expect much change since both cases have to DMA map a page, and post
one Write WR, so I haven't looked too closely. I'm already down
around 32us for a 4KB NFS READ, even without the server changes,
and 30us for 4KB NFS READ all-inline.

I agree, though, that there is a 3:1 reduction in ib_post_send calls
with the generic API in my test harness, and that can only be a good
thing.

> About the First-to-last-Write, thats weird, and sound like a bug
> somewhere. Maybe Mellanox folks can tell us if splitting 1M to multiple
> writes works better (although I cannot comprehend why).
> 
> Question, are the send and receive cqs still in IB_POLL_SOFTIRQ mode?

Yes, both are still in SOFTIRQ. A while back I played with using the
Work Queue mode, and noticed some slowdowns when the Send CQ was
changed to use Work Queue. I didn't explore it further.

Someday, these will need to be changed, so that the server runs
entirely in process context (as it does for TCP, AFAICT). That gets
rid of BH flapping, but adds heavy-weight context switching latencies.

--
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PULL REQUEST] Please pull rdma.git
From: Or Gerlitz @ 2016-11-17 22:24 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma, Linux Kernel
In-Reply-To: <20161117200203.GQ4240-2ukJVAZIZ/Y@public.gmane.org>

On Thu, Nov 17, 2016 at 9:44 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 11/17/16 1:49 PM, Leon Romanovsky wrote:
>> On Thu, Nov 17, 2016 at 07:13:54AM -0500, Doug Ledford wrote:
>>> Hi Linus,
>>>
>>> Due to various issues, I've been away and couldn't send a pull request
>>> for about three weeks.  There were a number of -rc patches that built up
>>> in the meantime (some where there already from the early -rc stages).
>>> Obviously, there were way too many to send now, so I tried to pare the
>>> list down to the more important patches for the -rc cycle.

Hi Doug,

Except for the hfi1 patches, all the rest (core, mlx5, mlx4 and rxe)
are marked now as only 21 hours old in your 4.9-rc branch and they
seems be made from you picking partial subsets of multiple series,
with none of them acked by you on the list.

If you agree that I am describing things correctly - how are we
expected to follow on your patch picking? I find it sort of impossible
and error prone.

>> Are you adding the rest to your for-next branch? We would like to have
>> enough time to check that nothing is lost.

> Yes, it's already there in the mlx-next branch on github.

Re the patches there, this one

IB/mlx4: Set traffic class in AH

"Set traffic class within sl_tclass_flowlabel when create iboe AH.
Without this the TOS value will be empty when running VLAN tagged
traffic, because the TOS value is taken from the traffic class in the
address handle attributes.

Fixes: 9106c41 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')"

claims to fix my commit, I have approached Leon and Co for
clarifications/questions over the list on the patch and nothing was
answered.

Please do not send it to Linus and wait for them to respond. I
disagree that it fixes my commit b/c my commit was prior to when
route-able RoCE  was introduced and on that time TOS had no relation.

and this one

"IB/mlx4: Put non zero value in max_ah device attribute

Use INT_MAX since this is the max value the attribute can hold, though
hardware capability is unlimited.

Fixes: 225c7b1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')"

does a tiny enhancement for a 10y old commit of Roland, why you think
we need it in 4.9-rc6 or 7??

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/IPoIB: Remove can't use GFP_NOIO warning
From: Or Gerlitz @ 2016-11-17 22:30 UTC (permalink / raw)
  To: Kamal Heib, Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Mel Gorman, Jiri Kosina
In-Reply-To: <1478765808-12517-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Nov 10, 2016 at 10:16 AM, Leon Romanovsky wrote:
> From: Kamal Heib <kamalh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
> each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.
>
> This print become more annoying when the IPoIB interface is configured
> to work in connected mode.
>
> Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations')

Hi Kamal,

I don't think you're fixing a bug in this commit... you find the print
annoying and you remove it, okay, maybe we can do that. Why not leave
it in rate-limited manner? can you elaborate what was that deeply
annoying with getting the warning?

> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -1053,8 +1053,6 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
>
>         tx_qp = ib_create_qp(priv->pd, &attr);
>         if (PTR_ERR(tx_qp) == -EINVAL) {
> -               ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> -                          priv->ca->name);
>                 attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
>                 tx_qp = ib_create_qp(priv->pd, &attr);
>         }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 10/12] IB/mlx5: Fix reported max SGE calculation
From: Or Gerlitz @ 2016-11-17 22:36 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575407-20562-11-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Oct 27, 2016 at 3:36 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Add the 512 bytes limit of RDMA READ and the size of remote
> address to the max SGE calculation.
>
> Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')

Doug, for some reason the above line is different in the commit
present @ your git hub tree, bug somewhere?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 12/12] IB/mlx5: Limit mkey page size to 2GB
From: Or Gerlitz @ 2016-11-17 22:38 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575407-20562-13-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

> Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')

the above line is different @ your git hub copy, bug as well?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 5/6] IB/core: Save QP in ib_flow structure
From: Or Gerlitz @ 2016-11-17 22:40 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575391-20134-6-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

> Fixes: 319a441d1361 ("IB/core: Add receive flow steering support")

same comment as the other two, probably applies some, most or all
patches on your git hub branch, lets use what was submitted and not
something else, okay?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 5/6] IB/core: Save QP in ib_flow structure
From: Or Gerlitz @ 2016-11-17 22:41 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAJ3xEMg=3eOqsKcrfcOoBkdpjiCmEAcPfNixHeLfVy45v_uvnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Nov 18, 2016 at 12:40 AM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Fixes: 319a441d1361 ("IB/core: Add receive flow steering support")
>
> same comment as the other two, probably applies some, most or all
> patches on your git hub branch, lets use what was submitted and not
> something else, okay?

Also, as I commented and you didn't respond, this fix protects against
**future** bug and has nothing to do with intree code. As such there's
no point to put it
into rc fix, it's a -next thing
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PULL REQUEST] Please pull rdma.git
From: Doug Ledford @ 2016-11-18  2:01 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma, Linux Kernel
In-Reply-To: <CAJ3xEMjXYYnhS6qUzM9F+yjtq8Aahn08MjsSU4OnLS66Cu7mgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1.1: Type: text/plain, Size: 4002 bytes --]

On 11/17/2016 5:24 PM, Or Gerlitz wrote:
> On Thu, Nov 17, 2016 at 9:44 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> On 11/17/16 1:49 PM, Leon Romanovsky wrote:
>>> On Thu, Nov 17, 2016 at 07:13:54AM -0500, Doug Ledford wrote:
>>>> Hi Linus,
>>>>
>>>> Due to various issues, I've been away and couldn't send a pull request
>>>> for about three weeks.  There were a number of -rc patches that built up
>>>> in the meantime (some where there already from the early -rc stages).
>>>> Obviously, there were way too many to send now, so I tried to pare the
>>>> list down to the more important patches for the -rc cycle.
> 
> Hi Doug,
> 
> Except for the hfi1 patches, all the rest (core, mlx5, mlx4 and rxe)
> are marked now as only 21 hours old in your 4.9-rc branch and they
> seems be made from you picking partial subsets of multiple series,

Correct.

> with none of them acked by you on the list.

I had an all day meeting today and had to get out the door early.  The
patches will be responded to.

> If you agree that I am describing things correctly - how are we
> expected to follow on your patch picking? I find it sort of impossible
> and error prone.

When I started this I said the official, canonical source of information
on patches like this is patchworks.  That still holds true.  In this
case, I pulled the full series of patches into a single bundle, then
reviewed every patch individually.  I checked for importance and
dependence on other patches.  Those that I thought could be moved to
4.10 were moved into a new bundle and then removed from the existing
bundle.  In this way, the patches were always in one or the other.  When
I was done, I used git am on the two bundles and one into the 4.9-rc and
the other into a -next branch.  In that way I made sure I didn't miss
any from the four series that I pulled.  Finally, I used the bundles to
mark the patches as accepted in patchworks.  By marking the entire
bundles as accepted, and not individual patches, it makes sure that what
I mark accepted is the same as what I ran git am on.  So, if the patch
shows in patchworks as accepted, then I got it.  If it doesn't, then I
missed it.

>>> Are you adding the rest to your for-next branch? We would like to have
>>> enough time to check that nothing is lost.
> 
>> Yes, it's already there in the mlx-next branch on github.
> 
> Re the patches there, this one
> 
> IB/mlx4: Set traffic class in AH
> 
> "Set traffic class within sl_tclass_flowlabel when create iboe AH.
> Without this the TOS value will be empty when running VLAN tagged
> traffic, because the TOS value is taken from the traffic class in the
> address handle attributes.
> 
> Fixes: 9106c41 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')"
> 
> claims to fix my commit, I have approached Leon and Co for
> clarifications/questions over the list on the patch and nothing was
> answered.

I agree with you.  It doesn't fix your patch.  The commit message can
still be fixed up.

> Please do not send it to Linus and wait for them to respond. I
> disagree that it fixes my commit b/c my commit was prior to when
> route-able RoCE  was introduced and on that time TOS had no relation.

I agree.  A better fix tag would be the commit that added RoCEv2 support.

> and this one
> 
> "IB/mlx4: Put non zero value in max_ah device attribute
> 
> Use INT_MAX since this is the max value the attribute can hold, though
> hardware capability is unlimited.
> 
> Fixes: 225c7b1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')"
> 
> does a tiny enhancement for a 10y old commit of Roland, why you think
> we need it in 4.9-rc6 or 7??

I don't, it's in the mlx-next branch which means I'll queue it up for
the 4.10 merge window.  I have no plan on sending that branch for 4.9-rc.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Целевые клиентские базы Skype: prodawez390 Whatsapp: +79139230330 Viber:  +79139230330 Telegram: +79139230330 Email: prodawez391@gmail.com
From: crowther.heine-KK0ffGbhmjU@public.gmane.org @ 2016-11-18  2:05 UTC (permalink / raw)
  To: testuser-O5WfVfzUwx8@public.gmane.org

Целевые клиентские базы Skype: prodawez390 Whatsapp: +79139230330 Viber:  +79139230330 Telegram: +79139230330 Email: prodawez391-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion
From: Binoy Jayan @ 2016-11-18  6:57 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Arnd Bergmann,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux kernel mailing list
In-Reply-To: <a267cc5e-4d4f-368e-a159-3eecc2187969-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

Hi Sagi,

On 31 October 2016 at 02:42, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
>> The semaphore 'sem' in isert_device is used as completion, so convert
>> it to struct completion. Semaphores are going away in the future.
>
>
> Umm, this is 100% *not* true. np->sem is designed as a counting to
> sync the iscsi login thread with the connect requests coming from the
> initiators. So this is actually a reliable bug insertion :(
>
> NAK from me...

Sorry for the late reply as I was held up in other activities.

I converted this to a wait_event() implementation but as I was doing it,
I was wondering how it would have been different if it was a completion
and not a semaphore.

File: drivers/infiniband/ulp/isert/ib_isert.c

If isert_connected_handler() is called multiple times, adding an entry to the
list, and if that happens while we use completion, 'done' (part of struct
completion) would be incremented by 1 each time 'complete' is called from
isert_connected_handler. After 'n' iterations, done will be equal to 'n'. If we
call wait_for_completion now from isert_accept_np, it would just decrement
'done' by one and continue without blocking, consuming one node at a time
from the list 'isert_np->pending'.

Alternatively if "done" becomes zero, and the next time wait_for_completion is
called, the API would add a node at the end of the wait queue 'wait' in 'struct
completion' and block until "done" is nonzero. (Ref: do_wait_for_common)
It exists the wait when a call to 'complete' turns 'done' back to 1.
But if there
are multiple waits called before calling complete, all the tasks
calling the wait
gets queued up and they will all would see "done" set to zero. When complete
is called now, done turns 1 again and the first task in the queue is woken up
as it is serialized as FIFO. Now the first wait returns and the done is
decremented by 1 just before the return.

Am I missing something here?

Thanks,
Binoy
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion
From: Arnd Bergmann @ 2016-11-18  8:58 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Sagi Grimberg, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, Linux kernel mailing list
In-Reply-To: <CAHv-k__HHRzK7+AZ912GVw9ToxZWMw3OccPqgGt_NBf6+RXdzA@mail.gmail.com>

On Friday, November 18, 2016 12:27:32 PM CET Binoy Jayan wrote:
> Hi Sagi,
> 
> On 31 October 2016 at 02:42, Sagi Grimberg <sagi@grimberg.me> wrote:
> >> The semaphore 'sem' in isert_device is used as completion, so convert
> >> it to struct completion. Semaphores are going away in the future.
> >
> >
> > Umm, this is 100% *not* true. np->sem is designed as a counting to
> > sync the iscsi login thread with the connect requests coming from the
> > initiators. So this is actually a reliable bug insertion :(
> >
> > NAK from me...
> 
> Sorry for the late reply as I was held up in other activities.
> 
> I converted this to a wait_event() implementation but as I was doing it,
> I was wondering how it would have been different if it was a completion
> and not a semaphore.
> 
> File: drivers/infiniband/ulp/isert/ib_isert.c
> 
> If isert_connected_handler() is called multiple times, adding an entry to the
> list, and if that happens while we use completion, 'done' (part of struct
> completion) would be incremented by 1 each time 'complete' is called from
> isert_connected_handler. After 'n' iterations, done will be equal to 'n'. If we
> call wait_for_completion now from isert_accept_np, it would just decrement
> 'done' by one and continue without blocking, consuming one node at a time
> from the list 'isert_np->pending'.
> 
> Alternatively if "done" becomes zero, and the next time wait_for_completion is
> called, the API would add a node at the end of the wait queue 'wait' in 'struct
> completion' and block until "done" is nonzero. (Ref: do_wait_for_common)
> It exists the wait when a call to 'complete' turns 'done' back to 1.
> But if there
> are multiple waits called before calling complete, all the tasks
> calling the wait
> gets queued up and they will all would see "done" set to zero. When complete
> is called now, done turns 1 again and the first task in the queue is woken up
> as it is serialized as FIFO. Now the first wait returns and the done is
> decremented by 1 just before the return.
> 
> Am I missing something here?

I think you are right. This is behavior is actuallly documented in
Documentation/scheduler/completion.txt:

  If complete() is called multiple times then this will allow for that number
  of waiters to continue - each call to complete() will simply increment the
  done element. Calling complete_all() multiple times is a bug though. Both
  complete() and complete_all() can be called in hard-irq/atomic context
  safely.

However, this is fairly unusual behavior and I wasn't immediately aware
of it either when I read Sagi's reply. While your patch looks correct,
it's probably a good idea to point out the counting behavior of this
completion as explicitly as possible, in the changelog text of the patch
as well as in a code comment and perhaps in the naming of the completion.

	Arnd

^ permalink raw reply

* Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion
From: Binoy Jayan @ 2016-11-18  9:10 UTC (permalink / raw)
  To: Arnd Bergmann, Mark Brown
  Cc: Sagi Grimberg, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, Linux kernel mailing list
In-Reply-To: <3477625.cZS9dUaHfE@wuerfel>

Hi Arnd,

On 18 November 2016 at 14:28, Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday, November 18, 2016 12:27:32 PM CET Binoy Jayan wrote:
>> Hi Sagi,

> I think you are right. This is behavior is actuallly documented in
> Documentation/scheduler/completion.txt:

Thanking for having a look.

> However, this is fairly unusual behavior and I wasn't immediately aware
> of it either when I read Sagi's reply. While your patch looks correct,
> it's probably a good idea to point out the counting behavior of this
> completion as explicitly as possible, in the changelog text of the patch
> as well as in a code comment and perhaps in the naming of the completion.

Will mention this and resend the patch series.

Thanks,
Binoy

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/IPoIB: Remove can't use GFP_NOIO warning
From: Kamal Heib @ 2016-11-18 10:34 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Leon Romanovsky, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Mel Gorman,
	Jiri Kosina
In-Reply-To: <CAJ3xEMgCFmSMA5F-7D89n248K2HovvwHnnMgke7-FSDbROFA4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Nov 18, 2016 at 12:30 AM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Thu, Nov 10, 2016 at 10:16 AM, Leon Romanovsky wrote:
>> From: Kamal Heib <kamalh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
>> each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.
>>
>> This print become more annoying when the IPoIB interface is configured
>> to work in connected mode.
>>
>> Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
>
> Hi Kamal,

Hi Or :)

>
> I don't think you're fixing a bug in this commit... you find the print
> annoying and you remove it, okay, maybe we can do that. Why not leave
> it in rate-limited manner? can you elaborate what was that deeply
> annoying with getting the warning?
>

It's really annoying when you have multiple ipoib interfaces and
pkeys, so the dmesg will look like the following:

[933236.858739] mlx5_ib0: can't use GFP_NOIO for QPs on device mlx5_0,
using GFP_KERNEL
[933407.571911] mlx5_ib0.8002: can't use GFP_NOIO for QPs on device
mlx5_0, using GFP_KERNEL
[933634.121955] mlx5_ib0.8006: can't use GFP_NOIO for QPs on device
mlx5_0, using GFP_KERNEL
[934167.851164] mlx5_ib0.8004: can't use GFP_NOIO for QPs on device
mlx5_0, using GFP_KERNEL
[934936.645002] mlx5_ib0.8002: can't use GFP_NOIO for QPs on device
mlx5_0, using GFP_KERNEL

>> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
>> @@ -1053,8 +1053,6 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
>>
>>         tx_qp = ib_create_qp(priv->pd, &attr);
>>         if (PTR_ERR(tx_qp) == -EINVAL) {
>> -               ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
>> -                          priv->ca->name);
>>                 attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
>>                 tx_qp = ib_create_qp(priv->pd, &attr);
>>         }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] cxgb4: Allocate Tx queues dynamically
From: Atul Gupta @ 2016-11-18 11:07 UTC (permalink / raw)
  To: netdev, linux-scsi, target-devel, linux-rdma, linux-crypto
  Cc: davem, nab, jejb, martin.petersen, dledford, herbert, leedom,
	nirranjan, varun, swise, hariprasad, Atul Gupta

From: Hariprasad Shenai <hariprasad@chelsio.com>

Allocate resources dynamically for Upper layer driver's (ULD) like
cxgbit, iw_cxgb4, cxgb4i and chcr. The resources allocated include Tx
queues which are allocated when ULD register with cxgb4 driver and freed
while un-registering. The Tx queues which are shared by ULD shall be
allocated by first registering driver and un-allocated by last
unregistering driver.

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
---
 drivers/crypto/chelsio/chcr_algo.c                 |  16 +--
 drivers/crypto/chelsio/chcr_core.c                 |   3 +-
 drivers/infiniband/hw/cxgb4/device.c               |   1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |  19 +++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  12 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |  64 +++++++----
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c     | 114 +++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h     |  17 +++
 drivers/net/ethernet/chelsio/cxgb4/sge.c           | 121 +++++++++++++++------
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c                 |   1 +
 drivers/target/iscsi/cxgbit/cxgbit_main.c          |   1 +
 11 files changed, 287 insertions(+), 82 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.c b/drivers/crypto/chelsio/chcr_algo.c
index e4ddb921d7b3..56b153805462 100644
--- a/drivers/crypto/chelsio/chcr_algo.c
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -592,16 +592,18 @@ static int chcr_aes_cbc_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
 
 static int cxgb4_is_crypto_q_full(struct net_device *dev, unsigned int idx)
 {
-	int ret = 0;
-	struct sge_ofld_txq *q;
 	struct adapter *adap = netdev2adap(dev);
+	struct sge_uld_txq_info *txq_info =
+		adap->sge.uld_txq_info[CXGB4_TX_CRYPTO];
+	struct sge_uld_txq *txq;
+	int ret = 0;
 
 	local_bh_disable();
-	q = &adap->sge.ofldtxq[idx];
-	spin_lock(&q->sendq.lock);
-	if (q->full)
+	txq = &txq_info->uldtxq[idx];
+	spin_lock(&txq->sendq.lock);
+	if (txq->full)
 		ret = -1;
-	spin_unlock(&q->sendq.lock);
+	spin_unlock(&txq->sendq.lock);
 	local_bh_enable();
 	return ret;
 }
@@ -674,11 +676,11 @@ static int chcr_device_init(struct chcr_context *ctx)
 		}
 		u_ctx = ULD_CTX(ctx);
 		rxq_perchan = u_ctx->lldi.nrxq / u_ctx->lldi.nchan;
-		ctx->dev->tx_channel_id = 0;
 		rxq_idx = ctx->dev->tx_channel_id * rxq_perchan;
 		rxq_idx += id % rxq_perchan;
 		spin_lock(&ctx->dev->lock_chcr_dev);
 		ctx->tx_channel_id = rxq_idx;
+		ctx->dev->tx_channel_id = !ctx->dev->tx_channel_id;
 		spin_unlock(&ctx->dev->lock_chcr_dev);
 	}
 out:
diff --git a/drivers/crypto/chelsio/chcr_core.c b/drivers/crypto/chelsio/chcr_core.c
index fb5f9bbfa09c..4d7f6700fd7e 100644
--- a/drivers/crypto/chelsio/chcr_core.c
+++ b/drivers/crypto/chelsio/chcr_core.c
@@ -42,6 +42,7 @@ static chcr_handler_func work_handlers[NUM_CPL_CMDS] = {
 static struct cxgb4_uld_info chcr_uld_info = {
 	.name = DRV_MODULE_NAME,
 	.nrxq = MAX_ULD_QSETS,
+	.ntxq = MAX_ULD_QSETS,
 	.rxq_size = 1024,
 	.add = chcr_uld_add,
 	.state_change = chcr_uld_state_change,
@@ -126,7 +127,7 @@ static int cpl_fw6_pld_handler(struct chcr_dev *dev,
 
 int chcr_send_wr(struct sk_buff *skb)
 {
-	return cxgb4_ofld_send(skb->dev, skb);
+	return cxgb4_crypto_send(skb->dev, skb);
 }
 
 static void *chcr_uld_add(const struct cxgb4_lld_info *lld)
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 93e3d270a98a..4e5baf4fe15e 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -1481,6 +1481,7 @@ static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
 static struct cxgb4_uld_info c4iw_uld_info = {
 	.name = DRV_NAME,
 	.nrxq = MAX_ULD_QSETS,
+	.ntxq = MAX_ULD_QSETS,
 	.rxq_size = 511,
 	.ciq = true,
 	.lro = false,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 2125903043fb..0bce1bf9ca0f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -635,6 +635,7 @@ struct tx_sw_desc;
 
 struct sge_txq {
 	unsigned int  in_use;       /* # of in-use Tx descriptors */
+	unsigned int  q_type;	    /* Q type Eth/Ctrl/Ofld */
 	unsigned int  size;         /* # of descriptors */
 	unsigned int  cidx;         /* SW consumer index */
 	unsigned int  pidx;         /* producer index */
@@ -665,7 +666,7 @@ struct sge_eth_txq {                /* state for an SGE Ethernet Tx queue */
 	unsigned long mapping_err;  /* # of I/O MMU packet mapping errors */
 } ____cacheline_aligned_in_smp;
 
-struct sge_ofld_txq {               /* state for an SGE offload Tx queue */
+struct sge_uld_txq {               /* state for an SGE offload Tx queue */
 	struct sge_txq q;
 	struct adapter *adap;
 	struct sk_buff_head sendq;  /* list of backpressured packets */
@@ -693,14 +694,20 @@ struct sge_uld_rxq_info {
 	u8 uld;			/* uld type */
 };
 
+struct sge_uld_txq_info {
+	struct sge_uld_txq *uldtxq; /* Txq's for ULD */
+	atomic_t users;		/* num users */
+	u16 ntxq;		/* # of egress uld queues */
+};
+
 struct sge {
 	struct sge_eth_txq ethtxq[MAX_ETH_QSETS];
-	struct sge_ofld_txq ofldtxq[MAX_OFLD_QSETS];
 	struct sge_ctrl_txq ctrlq[MAX_CTRL_QUEUES];
 
 	struct sge_eth_rxq ethrxq[MAX_ETH_QSETS];
 	struct sge_rspq fw_evtq ____cacheline_aligned_in_smp;
 	struct sge_uld_rxq_info **uld_rxq_info;
+	struct sge_uld_txq_info **uld_txq_info;
 
 	struct sge_rspq intrq ____cacheline_aligned_in_smp;
 	spinlock_t intrq_lock;
@@ -1298,8 +1305,9 @@ int t4_sge_alloc_ctrl_txq(struct adapter *adap, struct sge_ctrl_txq *txq,
 			  unsigned int cmplqid);
 int t4_sge_mod_ctrl_txq(struct adapter *adap, unsigned int eqid,
 			unsigned int cmplqid);
-int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
-			  struct net_device *dev, unsigned int iqid);
+int t4_sge_alloc_uld_txq(struct adapter *adap, struct sge_uld_txq *txq,
+			 struct net_device *dev, unsigned int iqid,
+			 unsigned int uld_type);
 irqreturn_t t4_sge_intr_msix(int irq, void *cookie);
 int t4_sge_init(struct adapter *adap);
 void t4_sge_start(struct adapter *adap);
@@ -1661,4 +1669,7 @@ int t4_uld_mem_alloc(struct adapter *adap);
 void t4_uld_clean_up(struct adapter *adap);
 void t4_register_netevent_notifier(void);
 void free_rspq_fl(struct adapter *adap, struct sge_rspq *rq, struct sge_fl *fl);
+void free_tx_desc(struct adapter *adap, struct sge_txq *q,
+		  unsigned int n, bool unmap);
+void free_txq(struct adapter *adap, struct sge_txq *q);
 #endif /* __CXGB4_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 20455d082cb8..acc231293e4d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2512,18 +2512,6 @@ do { \
 		RL("FLLow:", fl.low);
 		RL("FLStarving:", fl.starving);
 
-	} else if (ofld_idx < ofld_entries) {
-		const struct sge_ofld_txq *tx =
-			&adap->sge.ofldtxq[ofld_idx * 4];
-		int n = min(4, adap->sge.ofldqsets - 4 * ofld_idx);
-
-		S("QType:", "OFLD-Txq");
-		T("TxQ ID:", q.cntxt_id);
-		T("TxQ size:", q.size);
-		T("TxQ inuse:", q.in_use);
-		T("TxQ CIDX:", q.cidx);
-		T("TxQ PIDX:", q.pidx);
-
 	} else if (ctrl_idx < ctrl_entries) {
 		const struct sge_ctrl_txq *tx = &adap->sge.ctrlq[ctrl_idx * 4];
 		int n = min(4, adap->params.nports - 4 * ctrl_idx);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index c0cc2ee77be7..449884f8dd67 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -530,15 +530,15 @@ static int fwevtq_handler(struct sge_rspq *q, const __be64 *rsp,
 
 		txq = q->adap->sge.egr_map[qid - q->adap->sge.egr_start];
 		txq->restarts++;
-		if ((u8 *)txq < (u8 *)q->adap->sge.ofldtxq) {
+		if (txq->q_type == CXGB4_TXQ_ETH) {
 			struct sge_eth_txq *eq;
 
 			eq = container_of(txq, struct sge_eth_txq, q);
 			netif_tx_wake_queue(eq->txq);
 		} else {
-			struct sge_ofld_txq *oq;
+			struct sge_uld_txq *oq;
 
-			oq = container_of(txq, struct sge_ofld_txq, q);
+			oq = container_of(txq, struct sge_uld_txq, q);
 			tasklet_schedule(&oq->qresume_tsk);
 		}
 	} else if (opcode == CPL_FW6_MSG || opcode == CPL_FW4_MSG) {
@@ -885,15 +885,6 @@ static int setup_sge_queues(struct adapter *adap)
 		}
 	}
 
-	j = s->ofldqsets / adap->params.nports; /* iscsi queues per channel */
-	for_each_ofldtxq(s, i) {
-		err = t4_sge_alloc_ofld_txq(adap, &s->ofldtxq[i],
-					    adap->port[i / j],
-					    s->fw_evtq.cntxt_id);
-		if (err)
-			goto freeout;
-	}
-
 	for_each_port(adap, i) {
 		/* Note that cmplqid below is 0 if we don't
 		 * have RDMA queues, and that's the right value.
@@ -1922,8 +1913,18 @@ static void disable_dbs(struct adapter *adap)
 
 	for_each_ethrxq(&adap->sge, i)
 		disable_txq_db(&adap->sge.ethtxq[i].q);
-	for_each_ofldtxq(&adap->sge, i)
-		disable_txq_db(&adap->sge.ofldtxq[i].q);
+	if (is_offload(adap)) {
+		struct sge_uld_txq_info *txq_info =
+			adap->sge.uld_txq_info[CXGB4_TX_OFLD];
+
+		if (txq_info) {
+			for_each_ofldtxq(&adap->sge, i) {
+				struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+				disable_txq_db(&txq->q);
+			}
+		}
+	}
 	for_each_port(adap, i)
 		disable_txq_db(&adap->sge.ctrlq[i].q);
 }
@@ -1934,8 +1935,18 @@ static void enable_dbs(struct adapter *adap)
 
 	for_each_ethrxq(&adap->sge, i)
 		enable_txq_db(adap, &adap->sge.ethtxq[i].q);
-	for_each_ofldtxq(&adap->sge, i)
-		enable_txq_db(adap, &adap->sge.ofldtxq[i].q);
+	if (is_offload(adap)) {
+		struct sge_uld_txq_info *txq_info =
+			adap->sge.uld_txq_info[CXGB4_TX_OFLD];
+
+		if (txq_info) {
+			for_each_ofldtxq(&adap->sge, i) {
+				struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+				enable_txq_db(adap, &txq->q);
+			}
+		}
+	}
 	for_each_port(adap, i)
 		enable_txq_db(adap, &adap->sge.ctrlq[i].q);
 }
@@ -2006,8 +2017,17 @@ static void recover_all_queues(struct adapter *adap)
 
 	for_each_ethrxq(&adap->sge, i)
 		sync_txq_pidx(adap, &adap->sge.ethtxq[i].q);
-	for_each_ofldtxq(&adap->sge, i)
-		sync_txq_pidx(adap, &adap->sge.ofldtxq[i].q);
+	if (is_offload(adap)) {
+		struct sge_uld_txq_info *txq_info =
+			adap->sge.uld_txq_info[CXGB4_TX_OFLD];
+		if (txq_info) {
+			for_each_ofldtxq(&adap->sge, i) {
+				struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+				sync_txq_pidx(adap, &txq->q);
+			}
+		}
+	}
 	for_each_port(adap, i)
 		sync_txq_pidx(adap, &adap->sge.ctrlq[i].q);
 }
@@ -3991,7 +4011,7 @@ static inline bool is_x_10g_port(const struct link_config *lc)
 static void cfg_queues(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
-	int i, n10g = 0, qidx = 0;
+	int i = 0, n10g = 0, qidx = 0;
 #ifndef CONFIG_CHELSIO_T4_DCB
 	int q10g = 0;
 #endif
@@ -4006,8 +4026,7 @@ static void cfg_queues(struct adapter *adap)
 		adap->params.crypto = 0;
 	}
 
-	for_each_port(adap, i)
-		n10g += is_x_10g_port(&adap2pinfo(adap, i)->link_cfg);
+	n10g += is_x_10g_port(&adap2pinfo(adap, i)->link_cfg);
 #ifdef CONFIG_CHELSIO_T4_DCB
 	/* For Data Center Bridging support we need to be able to support up
 	 * to 8 Traffic Priorities; each of which will be assigned to its
@@ -4075,9 +4094,6 @@ static void cfg_queues(struct adapter *adap)
 	for (i = 0; i < ARRAY_SIZE(s->ctrlq); i++)
 		s->ctrlq[i].q.size = 512;
 
-	for (i = 0; i < ARRAY_SIZE(s->ofldtxq); i++)
-		s->ofldtxq[i].q.size = 1024;
-
 	init_rspq(adap, &s->fw_evtq, 0, 1, 1024, 64);
 	init_rspq(adap, &s->intrq, 0, 1, 512, 64);
 }
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
index 2471ff465d5c..565a6c6bfeaf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
@@ -447,6 +447,106 @@ static void quiesce_rx_uld(struct adapter *adap, unsigned int uld_type)
 		quiesce_rx(adap, &rxq_info->uldrxq[idx].rspq);
 }
 
+static void
+free_sge_txq_uld(struct adapter *adap, struct sge_uld_txq_info *txq_info)
+{
+	int nq = txq_info->ntxq;
+	int i;
+
+	for (i = 0; i < nq; i++) {
+		struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+		if (txq && txq->q.desc) {
+			tasklet_kill(&txq->qresume_tsk);
+			t4_ofld_eq_free(adap, adap->mbox, adap->pf, 0,
+					txq->q.cntxt_id);
+			free_tx_desc(adap, &txq->q, txq->q.in_use, false);
+			kfree(txq->q.sdesc);
+			__skb_queue_purge(&txq->sendq);
+			free_txq(adap, &txq->q);
+		}
+	}
+}
+
+static int
+alloc_sge_txq_uld(struct adapter *adap, struct sge_uld_txq_info *txq_info,
+		  unsigned int uld_type)
+{
+	struct sge *s = &adap->sge;
+	int nq = txq_info->ntxq;
+	int i, j, err;
+
+	j = nq / adap->params.nports;
+	for (i = 0; i < nq; i++) {
+		struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+		txq->q.size = 1024;
+		err = t4_sge_alloc_uld_txq(adap, txq, adap->port[i / j],
+					   s->fw_evtq.cntxt_id, uld_type);
+		if (err)
+			goto freeout;
+	}
+	return 0;
+freeout:
+	free_sge_txq_uld(adap, txq_info);
+	return err;
+}
+
+static void
+release_sge_txq_uld(struct adapter *adap, unsigned int uld_type)
+{
+	struct sge_uld_txq_info *txq_info = NULL;
+	int tx_uld_type = TX_ULD(uld_type);
+
+	txq_info = adap->sge.uld_txq_info[tx_uld_type];
+
+	if (txq_info && atomic_dec_and_test(&txq_info->users)) {
+		free_sge_txq_uld(adap, txq_info);
+		kfree(txq_info->uldtxq);
+		kfree(txq_info);
+		adap->sge.uld_txq_info[tx_uld_type] = NULL;
+	}
+}
+
+static int
+setup_sge_txq_uld(struct adapter *adap, unsigned int uld_type,
+		  const struct cxgb4_uld_info *uld_info)
+{
+	struct sge_uld_txq_info *txq_info = NULL;
+	int tx_uld_type, i;
+
+	tx_uld_type = TX_ULD(uld_type);
+	txq_info = adap->sge.uld_txq_info[tx_uld_type];
+
+	if ((tx_uld_type == CXGB4_TX_OFLD) && txq_info &&
+	    (atomic_inc_return(&txq_info->users) > 1))
+		return 0;
+
+	txq_info = kzalloc(sizeof(*txq_info), GFP_KERNEL);
+	if (!txq_info)
+		return -ENOMEM;
+
+	i = min_t(int, uld_info->ntxq, num_online_cpus());
+	txq_info->ntxq = roundup(i, adap->params.nports);
+
+	txq_info->uldtxq = kcalloc(txq_info->ntxq, sizeof(struct sge_uld_txq),
+				   GFP_KERNEL);
+	if (!txq_info->uldtxq) {
+		kfree(txq_info->uldtxq);
+		return -ENOMEM;
+	}
+
+	if (alloc_sge_txq_uld(adap, txq_info, tx_uld_type)) {
+		kfree(txq_info->uldtxq);
+		kfree(txq_info);
+		return -ENOMEM;
+	}
+
+	atomic_inc(&txq_info->users);
+	adap->sge.uld_txq_info[tx_uld_type] = txq_info;
+	return 0;
+}
+
 static void uld_queue_init(struct adapter *adap, unsigned int uld_type,
 			   struct cxgb4_lld_info *lli)
 {
@@ -472,7 +572,15 @@ int t4_uld_mem_alloc(struct adapter *adap)
 	if (!s->uld_rxq_info)
 		goto err_uld;
 
+	s->uld_txq_info = kzalloc(CXGB4_TX_MAX *
+				  sizeof(struct sge_uld_txq_info *),
+				  GFP_KERNEL);
+	if (!s->uld_txq_info)
+		goto err_uld_rx;
 	return 0;
+
+err_uld_rx:
+	kfree(s->uld_rxq_info);
 err_uld:
 	kfree(adap->uld);
 	return -ENOMEM;
@@ -482,6 +590,7 @@ void t4_uld_mem_free(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
 
+	kfree(s->uld_txq_info);
 	kfree(s->uld_rxq_info);
 	kfree(adap->uld);
 }
@@ -616,6 +725,9 @@ int cxgb4_register_uld(enum cxgb4_uld type,
 			ret = -EBUSY;
 			goto free_irq;
 		}
+		ret = setup_sge_txq_uld(adap, type, p);
+		if (ret)
+			goto free_irq;
 		adap->uld[type] = *p;
 		uld_attach(adap, type);
 		adap_idx++;
@@ -644,6 +756,7 @@ int cxgb4_register_uld(enum cxgb4_uld type,
 			break;
 		adap->uld[type].handle = NULL;
 		adap->uld[type].add = NULL;
+		release_sge_txq_uld(adap, type);
 		if (adap->flags & FULL_INIT_DONE)
 			quiesce_rx_uld(adap, type);
 		if (adap->flags & USING_MSIX)
@@ -679,6 +792,7 @@ int cxgb4_unregister_uld(enum cxgb4_uld type)
 			continue;
 		adap->uld[type].handle = NULL;
 		adap->uld[type].add = NULL;
+		release_sge_txq_uld(adap, type);
 		if (adap->flags & FULL_INIT_DONE)
 			quiesce_rx_uld(adap, type);
 		if (adap->flags & USING_MSIX)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 2996793b1aaa..4c856605fdfa 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -77,6 +77,8 @@ enum {
 
 /* Special asynchronous notification message */
 #define CXGB4_MSG_AN ((void *)1)
+#define TX_ULD(uld)(((uld) != CXGB4_ULD_CRYPTO) ? CXGB4_TX_OFLD :\
+		      CXGB4_TX_CRYPTO)
 
 struct serv_entry {
 	void *data;
@@ -223,6 +225,19 @@ enum cxgb4_uld {
 	CXGB4_ULD_MAX
 };
 
+enum cxgb4_tx_uld {
+	CXGB4_TX_OFLD,
+	CXGB4_TX_CRYPTO,
+	CXGB4_TX_MAX
+};
+
+enum cxgb4_txq_type {
+	CXGB4_TXQ_ETH,
+	CXGB4_TXQ_ULD,
+	CXGB4_TXQ_CTRL,
+	CXGB4_TXQ_MAX
+};
+
 enum cxgb4_state {
 	CXGB4_STATE_UP,
 	CXGB4_STATE_START_RECOVERY,
@@ -316,6 +331,7 @@ struct cxgb4_uld_info {
 	void *handle;
 	unsigned int nrxq;
 	unsigned int rxq_size;
+	unsigned int ntxq;
 	bool ciq;
 	bool lro;
 	void *(*add)(const struct cxgb4_lld_info *p);
@@ -333,6 +349,7 @@ struct cxgb4_uld_info {
 int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
 int cxgb4_unregister_uld(enum cxgb4_uld type);
 int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb);
+int cxgb4_crypto_send(struct net_device *dev, struct sk_buff *skb);
 unsigned int cxgb4_dbfifo_count(const struct net_device *dev, int lpfifo);
 unsigned int cxgb4_port_chan(const struct net_device *dev);
 unsigned int cxgb4_port_viid(const struct net_device *dev);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 1e74fd6085df..b7d0753b9242 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -377,8 +377,8 @@ unmap:			dma_unmap_page(dev, be64_to_cpu(p->addr[0]),
  *	Reclaims Tx descriptors from an SGE Tx queue and frees the associated
  *	Tx buffers.  Called with the Tx queue lock held.
  */
-static void free_tx_desc(struct adapter *adap, struct sge_txq *q,
-			 unsigned int n, bool unmap)
+void free_tx_desc(struct adapter *adap, struct sge_txq *q,
+		  unsigned int n, bool unmap)
 {
 	struct tx_sw_desc *d;
 	unsigned int cidx = q->cidx;
@@ -1543,7 +1543,7 @@ static inline unsigned int calc_tx_flits_ofld(const struct sk_buff *skb)
  *	inability to map packets.  A periodic timer attempts to restart
  *	queues so marked.
  */
-static void txq_stop_maperr(struct sge_ofld_txq *q)
+static void txq_stop_maperr(struct sge_uld_txq *q)
 {
 	q->mapping_err++;
 	q->q.stops++;
@@ -1559,7 +1559,7 @@ static void txq_stop_maperr(struct sge_ofld_txq *q)
  *	Stops an offload Tx queue that has become full and modifies the packet
  *	being written to request a wakeup.
  */
-static void ofldtxq_stop(struct sge_ofld_txq *q, struct sk_buff *skb)
+static void ofldtxq_stop(struct sge_uld_txq *q, struct sk_buff *skb)
 {
 	struct fw_wr_hdr *wr = (struct fw_wr_hdr *)skb->data;
 
@@ -1586,7 +1586,7 @@ static void ofldtxq_stop(struct sge_ofld_txq *q, struct sk_buff *skb)
  *	boolean "service_ofldq_running" to make sure that only one instance
  *	is ever running at a time ...
  */
-static void service_ofldq(struct sge_ofld_txq *q)
+static void service_ofldq(struct sge_uld_txq *q)
 {
 	u64 *pos, *before, *end;
 	int credits;
@@ -1706,7 +1706,7 @@ static void service_ofldq(struct sge_ofld_txq *q)
  *
  *	Send an offload packet through an SGE offload queue.
  */
-static int ofld_xmit(struct sge_ofld_txq *q, struct sk_buff *skb)
+static int ofld_xmit(struct sge_uld_txq *q, struct sk_buff *skb)
 {
 	skb->priority = calc_tx_flits_ofld(skb);       /* save for restart */
 	spin_lock(&q->sendq.lock);
@@ -1735,7 +1735,7 @@ static int ofld_xmit(struct sge_ofld_txq *q, struct sk_buff *skb)
  */
 static void restart_ofldq(unsigned long data)
 {
-	struct sge_ofld_txq *q = (struct sge_ofld_txq *)data;
+	struct sge_uld_txq *q = (struct sge_uld_txq *)data;
 
 	spin_lock(&q->sendq.lock);
 	q->full = 0;            /* the queue actually is completely empty now */
@@ -1767,17 +1767,23 @@ static inline unsigned int is_ctrl_pkt(const struct sk_buff *skb)
 	return skb->queue_mapping & 1;
 }
 
-static inline int ofld_send(struct adapter *adap, struct sk_buff *skb)
+static inline int uld_send(struct adapter *adap, struct sk_buff *skb,
+			   unsigned int tx_uld_type)
 {
+	struct sge_uld_txq_info *txq_info;
+	struct sge_uld_txq *txq;
 	unsigned int idx = skb_txq(skb);
 
+	txq_info = adap->sge.uld_txq_info[tx_uld_type];
+	txq = &txq_info->uldtxq[idx];
+
 	if (unlikely(is_ctrl_pkt(skb))) {
 		/* Single ctrl queue is a requirement for LE workaround path */
 		if (adap->tids.nsftids)
 			idx = 0;
 		return ctrl_xmit(&adap->sge.ctrlq[idx], skb);
 	}
-	return ofld_xmit(&adap->sge.ofldtxq[idx], skb);
+	return ofld_xmit(txq, skb);
 }
 
 /**
@@ -1794,7 +1800,7 @@ int t4_ofld_send(struct adapter *adap, struct sk_buff *skb)
 	int ret;
 
 	local_bh_disable();
-	ret = ofld_send(adap, skb);
+	ret = uld_send(adap, skb, CXGB4_TX_OFLD);
 	local_bh_enable();
 	return ret;
 }
@@ -1813,6 +1819,39 @@ int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(cxgb4_ofld_send);
 
+/**
+ *	t4_crypto_send - send crypto packet
+ *	@adap: the adapter
+ *	@skb: the packet
+ *
+ *	Sends crypto packet.  We use the packet queue_mapping to select the
+ *	appropriate Tx queue as follows: bit 0 indicates whether the packet
+ *	should be sent as regular or control, bits 1-15 select the queue.
+ */
+static int t4_crypto_send(struct adapter *adap, struct sk_buff *skb)
+{
+	int ret;
+
+	local_bh_disable();
+	ret = uld_send(adap, skb, CXGB4_TX_CRYPTO);
+	local_bh_enable();
+	return ret;
+}
+
+/**
+ *	cxgb4_crypto_send - send crypto packet
+ *	@dev: the net device
+ *	@skb: the packet
+ *
+ *	Sends crypto packet.  This is an exported version of @t4_crypto_send,
+ *	intended for ULDs.
+ */
+int cxgb4_crypto_send(struct net_device *dev, struct sk_buff *skb)
+{
+	return t4_crypto_send(netdev2adap(dev), skb);
+}
+EXPORT_SYMBOL(cxgb4_crypto_send);
+
 static inline void copy_frags(struct sk_buff *skb,
 			      const struct pkt_gl *gl, unsigned int offset)
 {
@@ -2479,7 +2518,7 @@ static void sge_tx_timer_cb(unsigned long data)
 	for (i = 0; i < BITS_TO_LONGS(s->egr_sz); i++)
 		for (m = s->txq_maperr[i]; m; m &= m - 1) {
 			unsigned long id = __ffs(m) + i * BITS_PER_LONG;
-			struct sge_ofld_txq *txq = s->egr_map[id];
+			struct sge_uld_txq *txq = s->egr_map[id];
 
 			clear_bit(id, s->txq_maperr);
 			tasklet_schedule(&txq->qresume_tsk);
@@ -2799,6 +2838,7 @@ int t4_sge_alloc_eth_txq(struct adapter *adap, struct sge_eth_txq *txq,
 		return ret;
 	}
 
+	txq->q.q_type = CXGB4_TXQ_ETH;
 	init_txq(adap, &txq->q, FW_EQ_ETH_CMD_EQID_G(ntohl(c.eqid_pkd)));
 	txq->txq = netdevq;
 	txq->tso = txq->tx_cso = txq->vlan_ins = 0;
@@ -2852,6 +2892,7 @@ int t4_sge_alloc_ctrl_txq(struct adapter *adap, struct sge_ctrl_txq *txq,
 		return ret;
 	}
 
+	txq->q.q_type = CXGB4_TXQ_CTRL;
 	init_txq(adap, &txq->q, FW_EQ_CTRL_CMD_EQID_G(ntohl(c.cmpliqid_eqid)));
 	txq->adap = adap;
 	skb_queue_head_init(&txq->sendq);
@@ -2872,13 +2913,15 @@ int t4_sge_mod_ctrl_txq(struct adapter *adap, unsigned int eqid,
 	return t4_set_params(adap, adap->mbox, adap->pf, 0, 1, &param, &val);
 }
 
-int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
-			  struct net_device *dev, unsigned int iqid)
+int t4_sge_alloc_uld_txq(struct adapter *adap, struct sge_uld_txq *txq,
+			 struct net_device *dev, unsigned int iqid,
+			 unsigned int uld_type)
 {
 	int ret, nentries;
 	struct fw_eq_ofld_cmd c;
 	struct sge *s = &adap->sge;
 	struct port_info *pi = netdev_priv(dev);
+	int cmd = FW_EQ_OFLD_CMD;
 
 	/* Add status entries */
 	nentries = txq->q.size + s->stat_len / sizeof(struct tx_desc);
@@ -2891,7 +2934,9 @@ int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
 		return -ENOMEM;
 
 	memset(&c, 0, sizeof(c));
-	c.op_to_vfn = htonl(FW_CMD_OP_V(FW_EQ_OFLD_CMD) | FW_CMD_REQUEST_F |
+	if (unlikely(uld_type == CXGB4_TX_CRYPTO))
+		cmd = FW_EQ_CTRL_CMD;
+	c.op_to_vfn = htonl(FW_CMD_OP_V(cmd) | FW_CMD_REQUEST_F |
 			    FW_CMD_WRITE_F | FW_CMD_EXEC_F |
 			    FW_EQ_OFLD_CMD_PFN_V(adap->pf) |
 			    FW_EQ_OFLD_CMD_VFN_V(0));
@@ -2919,6 +2964,7 @@ int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
 		return ret;
 	}
 
+	txq->q.q_type = CXGB4_TXQ_ULD;
 	init_txq(adap, &txq->q, FW_EQ_OFLD_CMD_EQID_G(ntohl(c.eqid_pkd)));
 	txq->adap = adap;
 	skb_queue_head_init(&txq->sendq);
@@ -2928,7 +2974,7 @@ int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
 	return 0;
 }
 
-static void free_txq(struct adapter *adap, struct sge_txq *q)
+void free_txq(struct adapter *adap, struct sge_txq *q)
 {
 	struct sge *s = &adap->sge;
 
@@ -3026,21 +3072,6 @@ void t4_free_sge_resources(struct adapter *adap)
 		}
 	}
 
-	/* clean up offload Tx queues */
-	for (i = 0; i < ARRAY_SIZE(adap->sge.ofldtxq); i++) {
-		struct sge_ofld_txq *q = &adap->sge.ofldtxq[i];
-
-		if (q->q.desc) {
-			tasklet_kill(&q->qresume_tsk);
-			t4_ofld_eq_free(adap, adap->mbox, adap->pf, 0,
-					q->q.cntxt_id);
-			free_tx_desc(adap, &q->q, q->q.in_use, false);
-			kfree(q->q.sdesc);
-			__skb_queue_purge(&q->sendq);
-			free_txq(adap, &q->q);
-		}
-	}
-
 	/* clean up control Tx queues */
 	for (i = 0; i < ARRAY_SIZE(adap->sge.ctrlq); i++) {
 		struct sge_ctrl_txq *cq = &adap->sge.ctrlq[i];
@@ -3093,12 +3124,34 @@ void t4_sge_stop(struct adapter *adap)
 	if (s->tx_timer.function)
 		del_timer_sync(&s->tx_timer);
 
-	for (i = 0; i < ARRAY_SIZE(s->ofldtxq); i++) {
-		struct sge_ofld_txq *q = &s->ofldtxq[i];
+	if (is_offload(adap)) {
+		struct sge_uld_txq_info *txq_info;
+
+		txq_info = adap->sge.uld_txq_info[CXGB4_TX_OFLD];
+		if (txq_info) {
+			struct sge_uld_txq *txq = txq_info->uldtxq;
 
-		if (q->q.desc)
-			tasklet_kill(&q->qresume_tsk);
+			for_each_ofldtxq(&adap->sge, i) {
+				if (txq->q.desc)
+					tasklet_kill(&txq->qresume_tsk);
+			}
+		}
 	}
+
+	if (is_pci_uld(adap)) {
+		struct sge_uld_txq_info *txq_info;
+
+		txq_info = adap->sge.uld_txq_info[CXGB4_TX_CRYPTO];
+		if (txq_info) {
+			struct sge_uld_txq *txq = txq_info->uldtxq;
+
+			for_each_ofldtxq(&adap->sge, i) {
+				if (txq->q.desc)
+					tasklet_kill(&txq->qresume_tsk);
+			}
+		}
+	}
+
 	for (i = 0; i < ARRAY_SIZE(s->ctrlq); i++) {
 		struct sge_ctrl_txq *cq = &s->ctrlq[i];
 
diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 0039bebaa9e2..4655a9f9dcea 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -85,6 +85,7 @@ static inline int send_tx_flowc_wr(struct cxgbi_sock *);
 static const struct cxgb4_uld_info cxgb4i_uld_info = {
 	.name = DRV_MODULE_NAME,
 	.nrxq = MAX_ULD_QSETS,
+	.ntxq = MAX_ULD_QSETS,
 	.rxq_size = 1024,
 	.lro = false,
 	.add = t4_uld_add,
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_main.c b/drivers/target/iscsi/cxgbit/cxgbit_main.c
index ad26b9372f10..96eedfc49c94 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_main.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_main.c
@@ -653,6 +653,7 @@ static struct iscsit_transport cxgbit_transport = {
 static struct cxgb4_uld_info cxgbit_uld_info = {
 	.name		= DRV_NAME,
 	.nrxq		= MAX_ULD_QSETS,
+	.ntxq		= MAX_ULD_QSETS,
 	.rxq_size	= 1024,
 	.lro		= true,
 	.add		= cxgbit_uld_add,
-- 
2.3.4

^ permalink raw reply related

* [PATCH] infiniband: remove WARN that is not kernel bug
From: Dmitry Vyukov @ 2016-11-18 11:24 UTC (permalink / raw)
  To: dledford, sean.hefty, hal.rosenstock, leon
  Cc: Dmitry Vyukov, linux-rdma, linux-kernel, syzkaller

WARNINGs mean kernel bugs.
The one in ucma_write() points to user programming error
or a malicious attempt. This is not a kernel bug, remove it.

BUG/WARNs that are not kernel bugs hinder automated testing effots.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: syzkaller@googlegroups.com
---
 drivers/infiniband/core/ucma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 9520154..ff60373 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1584,7 +1584,7 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
 	struct rdma_ucm_cmd_hdr hdr;
 	ssize_t ret;
 
-	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
+	if (!ib_safe_file_access(filp))
 		return -EACCES;
 
 	if (len < sizeof(hdr))
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH 1/7] IB/rxe: Allocate enough space for an IPv6 addr
From: Andrew Boyer @ 2016-11-18 14:36 UTC (permalink / raw)
  To: monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Boyer

Avoid smashing the stack when an ICRC error occurs on an IPv6 network.

Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
 drivers/infiniband/sw/rxe/rxe_recv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 46f0628..b40ab8d 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -391,7 +391,7 @@ int rxe_rcv(struct sk_buff *skb)
 			     payload_size(pkt));
 	calc_icrc = cpu_to_be32(~calc_icrc);
 	if (unlikely(calc_icrc != pack_icrc)) {
-		char saddr[sizeof(struct in6_addr)];
+		char saddr[64];
 
 		if (skb->protocol == htons(ETH_P_IPV6))
 			sprintf(saddr, "%pI6", &ipv6_hdr(skb)->saddr);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 2/7] IB/rxe: Advance the consumer pointer before posting the CQE
From: Andrew Boyer @ 2016-11-18 14:36 UTC (permalink / raw)
  To: monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Boyer
In-Reply-To: <1479479809-10798-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>

A simple userspace application might poll the CQ, find a completion,
and then attempt to post a new WQE to the SQ. A spurious error can
occur if the userspace application detects a full SQ in the instant
before the kernel is able to advance the SQ consumer pointer.

This is noticeable when using single-entry SQs with ibv_rc_pingpong()
if lots of kernel and userspace library debugging is enabled.

Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 6c5e29d..d46c49b 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -420,11 +420,12 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	    (wqe->wr.send_flags & IB_SEND_SIGNALED) ||
 	    (qp->req.state == QP_STATE_ERROR)) {
 		make_send_cqe(qp, wqe, &cqe);
+		advance_consumer(qp->sq.queue);
 		rxe_cq_post(qp->scq, &cqe, 0);
+	} else {
+		advance_consumer(qp->sq.queue);
 	}
 
-	advance_consumer(qp->sq.queue);
-
 	/*
 	 * we completed something so let req run again
 	 * if it is trying to fence
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 3/7] IB/rxe: Don't update the response PSN unless it's going forwards
From: Andrew Boyer @ 2016-11-18 14:36 UTC (permalink / raw)
  To: monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Boyer
In-Reply-To: <1479479809-10798-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>

A client might post a read followed by a send. The partner receives
and acknowledges both transactions, posting an RCQ entry for the
send, but something goes wrong with the read ACK. When the client
retries the read, the partner's responder processes the duplicate
read but incorrectly resets the PSN to the value preceding the
original send. When the duplicate send arrives, the responder cannot
tell that it is a duplicate, so the responder generates a duplicate
RCQ entry, confusing the client.

Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index dd3d88a..cb3fd4c 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -742,7 +742,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	} else {
 		qp->resp.res = NULL;
 		qp->resp.opcode = -1;
-		qp->resp.psn = res->cur_psn;
+		if (psn_compare(res->cur_psn, qp->resp.psn) >= 0)
+			qp->resp.psn = res->cur_psn;
 		state = RESPST_CLEANUP;
 	}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 4/7] IB/rxe: Unblock loopback by moving skb_out increment
From: Andrew Boyer @ 2016-11-18 14:36 UTC (permalink / raw)
  To: monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Boyer
In-Reply-To: <1479479809-10798-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>

skb_out is decremented in rxe_skb_tx_dtor(), which is not called in the
loopback() path. Move the increment to the send() path rather than
rxe_xmit_packet().

Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
 drivers/infiniband/sw/rxe/rxe_loc.h | 2 --
 drivers/infiniband/sw/rxe/rxe_net.c | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 73849a5a..efe4c6a 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -266,8 +266,6 @@ static inline int rxe_xmit_packet(struct rxe_dev *rxe, struct rxe_qp *qp,
 		return err;
 	}
 
-	atomic_inc(&qp->skb_out);
-
 	if ((qp_type(qp) != IB_QPT_RC) &&
 	    (pkt->mask & RXE_END_MASK)) {
 		pkt->wqe->state = wqe_state_done;
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index ffff5a5..332ce52 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -455,6 +455,8 @@ static int send(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
 		return -EAGAIN;
 	}
 
+	if (pkt->qp)
+		atomic_inc(&pkt->qp->skb_out);
 	kfree_skb(skb);
 
 	return 0;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 5/7] IB/rxe: Add support for zero-byte operations
From: Andrew Boyer @ 2016-11-18 14:36 UTC (permalink / raw)
  To: monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Boyer
In-Reply-To: <1479479809-10798-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>

The last_psn algorithm fails in the zero-byte case: it calculates
first_psn = N, last_psn = N-1. This makes the operation unretryable since
the res structure will fail the (first_psn <= psn <= last_psn) test in
find_resource().

While here, use BTH_PSN_MASK to mask the calculated last_psn.

Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
 drivers/infiniband/sw/rxe/rxe_mr.c   |  3 +++
 drivers/infiniband/sw/rxe/rxe_resp.c | 18 +++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 1869152..d0faca2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -355,6 +355,9 @@ int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr, int length,
 	size_t			offset;
 	u32			crc = crcp ? (*crcp) : 0;
 
+	if (length == 0)
+		return 0;
+
 	if (mem->type == RXE_MEM_TYPE_DMA) {
 		u8 *src, *dest;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index cb3fd4c..a5e9ce3 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -444,6 +444,13 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 		return RESPST_EXECUTE;
 	}
 
+	/* A zero-byte op is not required to set an addr or rkey. */
+	if ((pkt->mask & (RXE_READ_MASK | RXE_WRITE_OR_SEND)) &&
+	    (pkt->mask & RXE_RETH_MASK) &&
+	    reth_len(pkt) == 0) {
+		return RESPST_EXECUTE;
+	}
+
 	va	= qp->resp.va;
 	rkey	= qp->resp.rkey;
 	resid	= qp->resp.resid;
@@ -680,9 +687,14 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 		res->read.va_org	= qp->resp.va;
 
 		res->first_psn		= req_pkt->psn;
-		res->last_psn		= req_pkt->psn +
-					  (reth_len(req_pkt) + mtu - 1) /
-					  mtu - 1;
+
+		if (reth_len(req_pkt)) {
+			res->last_psn	= (req_pkt->psn +
+					   (reth_len(req_pkt) + mtu - 1) /
+					   mtu - 1) & BTH_PSN_MASK;
+		} else {
+			res->last_psn	= res->first_psn;
+		}
 		res->cur_psn		= req_pkt->psn;
 
 		res->read.resid		= qp->resp.resid;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 6/7] IB/rxe: Avoid missed completions in the CM/MAD
From: Andrew Boyer @ 2016-11-18 14:36 UTC (permalink / raw)
  To: monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Boyer
In-Reply-To: <1479479809-10798-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>

The MAD code uses the IB_CQ_REPORT_MISSED_EVENTS flag to avoid a
race between posting CQEs and arming the CQ. Without this fix, the
last completion might be left on the CQ, hanging the kthread
waiting on MAD to complete.
See ib_cq_poll_work().

The console backtraces look like this:

[ 4199.911284] Call Trace:
[ 4199.911401]  [<ffffffff9657fe95>] schedule+0x35/0x80
[ 4199.911556]  [<ffffffff965830df>] schedule_timeout+0x22f/0x2c0
[ 4199.911727]  [<ffffffff9657f7a8>] ? __schedule+0x368/0xa20
[ 4199.911891]  [<ffffffff96580903>] wait_for_completion+0xb3/0x130
[ 4199.912067]  [<ffffffff960a17e0>] ? wake_up_q+0x70/0x70
[ 4199.912243]  [<ffffffffc074a06d>] cm_destroy_id+0x13d/0x450 [ib_cm]
[ 4199.912422]  [<ffffffff961615d5>] ? printk+0x57/0x73
[ 4199.912578]  [<ffffffffc074a390>] ib_destroy_cm_id+0x10/0x20 [ib_cm]
[ 4199.912759]  [<ffffffffc076098c>] rdma_destroy_id+0xac/0x340 [rdma_cm]
[ 4199.912941]  [<ffffffffc076f2cc>] 0xffffffffc076f2cc

Peek at the CQ after arming it so that we can return a hint.

Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 19841c8..de39b0a 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1007,11 +1007,19 @@ static int rxe_peek_cq(struct ib_cq *ibcq, int wc_cnt)
 static int rxe_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
 {
 	struct rxe_cq *cq = to_rcq(ibcq);
+	unsigned long irq_flags;
+	int ret = 0;
 
+	spin_lock_irqsave(&cq->cq_lock, irq_flags);
 	if (cq->notify != IB_CQ_NEXT_COMP)
 		cq->notify = flags & IB_CQ_SOLICITED_MASK;
 
-	return 0;
+	if ((flags & IB_CQ_REPORT_MISSED_EVENTS) && !queue_empty(cq->queue))
+		ret = 1;
+
+	spin_unlock_irqrestore(&cq->cq_lock, irq_flags);
+
+	return ret;
 }
 
 static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 7/7] IB/rxe: Fix ref leaks in duplicate_request() and rxe_create_qp()
From: Andrew Boyer @ 2016-11-18 14:36 UTC (permalink / raw)
  To: monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Boyer
In-Reply-To: <1479479809-10798-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>

In duplicate_request(), a ref was added after the call to skb_clone().

In rxe_create_qp(), the udata->inlen error path needs to clean up the ref
added by rxe_alloc().

Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
 drivers/infiniband/sw/rxe/rxe_resp.c  | 1 +
 drivers/infiniband/sw/rxe/rxe_verbs.c | 7 ++++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index a5e9ce3..8643797 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -1145,6 +1145,7 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
 					     pkt, skb_copy);
 			if (rc) {
 				pr_err("Failed resending result. This flow is not handled - skb ignored\n");
+				rxe_drop_ref(qp);
 				kfree_skb(skb_copy);
 				rc = RESPST_CLEANUP;
 				goto out;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index de39b0a..071430c 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -564,7 +564,7 @@ static struct ib_qp *rxe_create_qp(struct ib_pd *ibpd,
 	if (udata) {
 		if (udata->inlen) {
 			err = -EINVAL;
-			goto err1;
+			goto err2;
 		}
 		qp->is_user = 1;
 	}
@@ -573,12 +573,13 @@ static struct ib_qp *rxe_create_qp(struct ib_pd *ibpd,
 
 	err = rxe_qp_from_init(rxe, qp, pd, init, udata, ibpd);
 	if (err)
-		goto err2;
+		goto err3;
 
 	return &qp->ibqp;
 
-err2:
+err3:
 	rxe_drop_index(qp);
+err2:
 	rxe_drop_ref(qp);
 err1:
 	return ERR_PTR(err);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 00/12] client-side NFS/RDMA patches proposed for v4.10
From: Chuck Lever @ 2016-11-18 14:58 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

The following patch series makes these changes:

- Support for devices that support SG_GAP
- Several bug fixes
- A number of clean-ups and optimizations

SG_GAP devices allow the client transport implementation to register
non-contiguous memory regions using one offset and handle. Scatter
and gather from these regions are then managed entirely in hardware.

Normally a Reply chunk for a READDIR or GETACL is built of multiple
RDMA segments. Now, when our client wants to build a Reply chunk out
of the xdr_buf's head, page list, and tail, it appears to the server
as a single contiguous RDMA segment if the device supports SG_GAP.

Instead of a separate Write WR for each of several RDMA segments,
the server can then post a single Write WR to transmit the whole RPC
Reply (assuming the Reply is smaller than the server RNIC's max_sge)
and can invalidate the whole Reply chunk remotely.

SG_GAP is currently supported by mlx5 devices when using the FRWR
registration mechanism.

Available in the "nfs-rdma-for-4.10" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git

Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.10

Client changes since v1:
 - Rebase on v4.9-rc5
 - "xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect" was merged
 - "xprtrdma: Simplify synopsis of rpcrdma_ep_connect()" was postponed
 - Clean-ups re-ordered to the end of the series
 - Editorial changes to patch descriptions

---

Chuck Lever (12):
      xprtrdma: Cap size of callback buffer resources
      xprtrdma: Make FRWR send queue entry accounting more accurate
      xprtrdma: Support for SG_GAP devices
      SUNRPC: Proper metric accounting when RPC is not transmitted
      xprtrdma: Address coverity complaint about wait_for_completion()
      xprtrdma: Avoid calls to ro_unmap_safe()
      xprtrdma: Refactor FRMR invalidation
      xprtrdma: Update documenting comment
      xprtrdma: Squelch "max send, max recv" messages at connect time
      xprtrdma: Shorten QP access error message
      xprtrdma: Update dprintk in rpcrdma_count_chunks
      xprtrdma: Relocate connection helper functions

 net/sunrpc/stats.c                |   10 ++--
 net/sunrpc/xprtrdma/backchannel.c |    4 +-
 net/sunrpc/xprtrdma/frwr_ops.c    |   94 +++++++++++++++++--------------------
 net/sunrpc/xprtrdma/rpc_rdma.c    |   36 --------------
 net/sunrpc/xprtrdma/transport.c   |   34 +++++++++++++
 net/sunrpc/xprtrdma/verbs.c       |   37 ++++++++-------
 net/sunrpc/xprtrdma/xprt_rdma.h   |   31 +++++++++---
 7 files changed, 128 insertions(+), 118 deletions(-)

--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2 01/12] xprtrdma: Cap size of callback buffer resources
From: Chuck Lever @ 2016-11-18 14:59 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161118145547.10592.66470.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

When the inline threshold size is set to large values (say, 32KB)
any NFSv4.1 CB request from the server gets a reply with status
NFS4ERR_RESOURCE.

Looks like there are some upper layer assumptions about the maximum
size of a reply (for example, in process_op). Cap the size of the
NFSv4 client's reply resources at a page.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/backchannel.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index 2c472e1..24fedd4 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -55,7 +55,8 @@ static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt,
 	if (IS_ERR(rb))
 		goto out_fail;
 	req->rl_sendbuf = rb;
-	xdr_buf_init(&rqst->rq_snd_buf, rb->rg_base, size);
+	xdr_buf_init(&rqst->rq_snd_buf, rb->rg_base,
+		     min_t(size_t, size, PAGE_SIZE));
 	rpcrdma_set_xprtdata(rqst, req);
 	return 0;
 
@@ -191,6 +192,7 @@ size_t xprt_rdma_bc_maxpayload(struct rpc_xprt *xprt)
 	size_t maxmsg;
 
 	maxmsg = min_t(unsigned int, cdata->inline_rsize, cdata->inline_wsize);
+	maxmsg = min_t(unsigned int, maxmsg, PAGE_SIZE);
 	return maxmsg - RPCRDMA_HDRLEN_MIN;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox