Re: [PATCH v3 25/25] IB/mlx4: Workaround for mlx4_alloc_priv_pages() array allocator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Leon Romanovsky <leon@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v3 25/25] IB/mlx4: Workaround for mlx4_alloc_priv_pages() array allocator
Date: Wed, 22 Jun 2016 18:50:03 +0300	[thread overview]
Message-ID: <20160622155003.GI9762@leon.nu> (raw)
In-Reply-To: <A9F49204-8E84-4B58-BAA4-5B4B360FD22F@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 2471 bytes --]

On Wed, Jun 22, 2016 at 10:47:27AM -0400, Chuck Lever wrote:
> 
> > On Jun 22, 2016, at 10:04 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> > 
> > 
> >> +    /* This is overkill, but hardware requires that the
> >> +     * PBL array begins at a properly aligned address and
> >> +     * never occupies the last 8 bytes of a page.
> >> +     */
> >> +    mr->pages = (__be64 *)get_zeroed_page(GFP_KERNEL);
> >> +    if (!mr->pages)
> >>          return -ENOMEM;
> > 
> > Again, I'm not convinced that this is a better choice then allocating
> > the exact needed size as dma coherent, but given that the dma coherent
> > allocations are always page aligned I wander if it's not the same
> > effect...
> 
> My concerns with DMA coherent were:
> 
> 1. That pool may be a somewhat limited resource?
> 
> 2. IMO DMA-API.txt suggests DMA coherent will perform less
> well in some cases. Macro benchmarks I ran seemed to show
> there was a slight performance hit with that approach, though
> it was nearly in the noise.
> 
> I agree that the over-allocation in the streaming solution is a
> concern. But as you say, there may be little we can do about it.

According to [1] dma_alloc_coherent doesn't allocate from pool, but
calls to the __get_free_page().

"A DMA pool is an allocation mechanism for small, coherent DMA mappings.
Mappings obtained from dma_alloc_coherent may have a minimum size of one
page."

> 
> Wrt to Or's comment, the device's maximum page list depth
> is advertised to consumers via the device's attributes. However,
> it would be defensive if there was a sanity check added in
> mlx4_alloc_priv_pages to ensure that the max_pages argument
> is a reasonable value (ie, that the calculated array size does
> indeed fit into a page).
> 
> > In any event, we can move forward with this for now:
> > 
> > Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> 
> Thanks, I'll add that! Though as before, I'm happy to drop this
> patch if there is a different preferred official fix.

We submitted your version of patch with minor changes in
comments and commit message together with Sagi's ROB tag [2].

[1] http://www.makelinux.net/ldd3/chp-15-sect-4
[2] https://patchwork.kernel.org/patch/9193075/

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

WARNING: multiple messages have this Message-ID (diff)

From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v3 25/25] IB/mlx4: Workaround for mlx4_alloc_priv_pages() array allocator
Date: Wed, 22 Jun 2016 18:50:03 +0300	[thread overview]
Message-ID: <20160622155003.GI9762@leon.nu> (raw)
In-Reply-To: <A9F49204-8E84-4B58-BAA4-5B4B360FD22F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]

On Wed, Jun 22, 2016 at 10:47:27AM -0400, Chuck Lever wrote:
> 
> > On Jun 22, 2016, at 10:04 AM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> > 
> > 
> >> +    /* This is overkill, but hardware requires that the
> >> +     * PBL array begins at a properly aligned address and
> >> +     * never occupies the last 8 bytes of a page.
> >> +     */
> >> +    mr->pages = (__be64 *)get_zeroed_page(GFP_KERNEL);
> >> +    if (!mr->pages)
> >>          return -ENOMEM;
> > 
> > Again, I'm not convinced that this is a better choice then allocating
> > the exact needed size as dma coherent, but given that the dma coherent
> > allocations are always page aligned I wander if it's not the same
> > effect...
> 
> My concerns with DMA coherent were:
> 
> 1. That pool may be a somewhat limited resource?
> 
> 2. IMO DMA-API.txt suggests DMA coherent will perform less
> well in some cases. Macro benchmarks I ran seemed to show
> there was a slight performance hit with that approach, though
> it was nearly in the noise.
> 
> I agree that the over-allocation in the streaming solution is a
> concern. But as you say, there may be little we can do about it.

According to [1] dma_alloc_coherent doesn't allocate from pool, but
calls to the __get_free_page().

"A DMA pool is an allocation mechanism for small, coherent DMA mappings.
Mappings obtained from dma_alloc_coherent may have a minimum size of one
page."

> 
> Wrt to Or's comment, the device's maximum page list depth
> is advertised to consumers via the device's attributes. However,
> it would be defensive if there was a sanity check added in
> mlx4_alloc_priv_pages to ensure that the max_pages argument
> is a reasonable value (ie, that the calculated array size does
> indeed fit into a page).
> 
> > In any event, we can move forward with this for now:
> > 
> > Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> 
> Thanks, I'll add that! Though as before, I'm happy to drop this
> patch if there is a different preferred official fix.

We submitted your version of patch with minor changes in
comments and commit message together with Sagi's ROB tag [2].

[1] http://www.makelinux.net/ldd3/chp-15-sect-4
[2] https://patchwork.kernel.org/patch/9193075/

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

next prev parent reply	other threads:[~2016-06-22 16:00 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-20 16:08 [PATCH v3 00/25] NFS/RDMA client patches proposed for v4.8 Chuck Lever
2016-06-20 16:08 ` Chuck Lever
2016-06-20 16:08 ` [PATCH v3 01/25] xprtrdma: Remove FMRs from the unmap list after unmapping Chuck Lever
2016-06-20 16:08   ` Chuck Lever
2016-06-27 17:47   ` Anna Schumaker
2016-06-27 17:47     ` Anna Schumaker
2016-06-28 20:53     ` Chuck Lever
2016-06-28 20:53       ` Chuck Lever
2016-06-20 16:08 ` [PATCH v3 02/25] xprtrdma: Create common scatterlist fields in rpcrdma_mw Chuck Lever
2016-06-20 16:08   ` Chuck Lever
2016-06-20 16:08 ` [PATCH v3 03/25] xprtrdma: Move init and release helpers Chuck Lever
2016-06-20 16:08   ` Chuck Lever
2016-06-20 16:09 ` [PATCH v3 04/25] xprtrdma: Rename fields in rpcrdma_fmr Chuck Lever
2016-06-20 16:09   ` Chuck Lever
2016-06-20 16:09 ` [PATCH v3 05/25] xprtrdma: Use scatterlist for DMA mapping and unmapping under FMR Chuck Lever
2016-06-20 16:09   ` Chuck Lever
2016-06-20 16:09 ` [PATCH v3 06/25] xprtrdma: Refactor MR recovery work queues Chuck Lever
2016-06-20 16:09   ` Chuck Lever
2016-06-20 16:09 ` [PATCH v3 07/25] xprtrdma: Do not leak an MW during a DMA map failure Chuck Lever
2016-06-20 16:09   ` Chuck Lever
2016-06-20 16:09 ` [PATCH v3 08/25] xprtrdma: Remove ALLPHYSICAL memory registration mode Chuck Lever
2016-06-20 16:09   ` Chuck Lever
2016-06-20 16:09 ` [PATCH v3 09/25] xprtrdma: Remove rpcrdma_map_one() and friends Chuck Lever
2016-06-20 16:09   ` Chuck Lever
2016-06-20 16:09 ` [PATCH v3 10/25] xprtrdma: Clean up device capability detection Chuck Lever
2016-06-20 16:09   ` Chuck Lever
2016-06-20 16:10 ` [PATCH v3 11/25] xprtrdma: Reply buffer exhaustion can be catastrophic Chuck Lever
2016-06-20 16:10   ` Chuck Lever
2016-06-20 16:10 ` [PATCH v3 12/25] xprtrdma: Honor ->send_request API contract Chuck Lever
2016-06-20 16:10   ` Chuck Lever
2016-06-20 16:10 ` [PATCH v3 13/25] xprtrdma: Chunk list encoders must not return zero Chuck Lever
2016-06-20 16:10   ` Chuck Lever
2016-06-20 16:10 ` [PATCH v3 14/25] xprtrdma: Allocate MRs on demand Chuck Lever
2016-06-20 16:10   ` Chuck Lever
2016-06-20 16:10 ` [PATCH v3 15/25] xprtrdma: Release orphaned MRs immediately Chuck Lever
2016-06-20 16:10   ` Chuck Lever
2016-06-20 16:10 ` [PATCH v3 16/25] xprtrdma: Place registered MWs on a per-req list Chuck Lever
2016-06-20 16:10   ` Chuck Lever
2016-06-20 16:10 ` [PATCH v3 17/25] xprtrdma: Chunk list encoders no longer share one rl_segments array Chuck Lever
2016-06-20 16:10   ` Chuck Lever
2016-06-20 16:11 ` [PATCH v3 18/25] xprtrdma: rpcrdma_inline_fixup() overruns the receive page list Chuck Lever
2016-06-20 16:11   ` Chuck Lever
2016-06-20 16:11 ` [PATCH v3 19/25] xprtrdma: Do not update {head, tail}.iov_len in rpcrdma_inline_fixup() Chuck Lever
2016-06-20 16:11   ` Chuck Lever
2016-06-20 16:11 ` [PATCH v3 20/25] xprtrdma: Update only specific fields in private receive buffer Chuck Lever
2016-06-20 16:11   ` Chuck Lever
2016-06-20 16:11 ` [PATCH v3 21/25] xprtrdma: Clean up fixup_copy_count accounting Chuck Lever
2016-06-20 16:11   ` Chuck Lever
2016-06-20 16:11 ` [PATCH v3 22/25] xprtrdma: No direct data placement with krb5i and krb5p Chuck Lever
2016-06-20 16:11   ` Chuck Lever
2016-06-20 16:11 ` [PATCH v3 23/25] svc: Avoid garbage replies when pc_func() returns rpc_drop_reply Chuck Lever
2016-06-20 16:11   ` Chuck Lever
2016-06-20 16:11 ` [PATCH v3 24/25] NFS: Don't drop CB requests with invalid principals Chuck Lever
2016-06-20 16:11   ` Chuck Lever
2016-06-20 16:12 ` [PATCH v3 25/25] IB/mlx4: Workaround for mlx4_alloc_priv_pages() array allocator Chuck Lever
2016-06-20 16:12   ` Chuck Lever
2016-06-21  5:52   ` Or Gerlitz
2016-06-21  5:52     ` Or Gerlitz
2016-06-22 13:29     ` Sagi Grimberg
2016-06-22 13:29       ` Sagi Grimberg
2016-06-22 13:47       ` Or Gerlitz
2016-06-22 13:47         ` Or Gerlitz
2016-06-22 14:02         ` Sagi Grimberg
2016-06-22 14:02           ` Sagi Grimberg
2016-06-22 11:56   ` Sagi Grimberg
2016-06-22 11:56     ` Sagi Grimberg
2016-06-22 14:04   ` Sagi Grimberg
2016-06-22 14:04     ` Sagi Grimberg
2016-06-22 14:09     ` Leon Romanovsky
2016-06-22 14:09       ` Leon Romanovsky
2016-06-22 14:47     ` Chuck Lever
2016-06-22 14:47       ` Chuck Lever
2016-06-22 15:50       ` Leon Romanovsky [this message]
2016-06-22 15:50         ` Leon Romanovsky
2016-06-22 16:20         ` Christoph Hellwig
2016-06-22 16:20           ` Christoph Hellwig
2016-06-20 18:53 ` [PATCH v3 00/25] NFS/RDMA client patches proposed for v4.8 Steve Wise
2016-06-20 18:53   ` Steve Wise
2016-06-20 19:07   ` Chuck Lever
2016-06-20 19:07     ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160622155003.GI9762@leon.nu \
    --to=leon@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.