All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: Wendy Cheng <s.wendy.cheng@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	Yan Burman <yanb@mellanox.com>,
	"Atchley, Scott" <atchleyes@ornl.gov>,
	Tom Tucker <tom@opengridcomputing.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	Or Gerlitz <ogerlitz@mellanox.com>
Subject: Re: NFS over RDMA benchmark
Date: Wed, 24 Apr 2013 14:26:40 -0400	[thread overview]
Message-ID: <517823E0.4000402@talpey.com> (raw)
In-Reply-To: <CABgxfbHpNgQyEjd2OVNMgJoLpt_VyLiOL5hMCLwotMd5kincwg@mail.gmail.com>

On 4/24/2013 2:04 PM, Wendy Cheng wrote:
> On Wed, Apr 24, 2013 at 9:27 AM, Wendy Cheng <s.wendy.cheng@gmail.com> wrote:
>> On Wed, Apr 24, 2013 at 8:26 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> On Wed, Apr 24, 2013 at 11:05:40AM -0400, J. Bruce Fields wrote:
>>>> On Wed, Apr 24, 2013 at 12:35:03PM +0000, Yan Burman wrote:
>>>>>
>>>>>
>>>>>
>>>>> Perf top for the CPU with high tasklet count gives:
>>>>>
>>>>>               samples  pcnt         RIP        function                    DSO
>>>>>               _______ _____ ________________ ___________________________ ___________________________________________________________________
>>>>>
>>>>>               2787.00 24.1% ffffffff81062a00 mutex_spin_on_owner         /root/vmlinux
>>>>
>>>> I guess that means lots of contention on some mutex?  If only we knew
>>>> which one.... perf should also be able to collect stack statistics, I
>>>> forget how.
>>>
>>> Googling around....  I think we want:
>>>
>>>          perf record -a --call-graph
>>>          (give it a chance to collect some samples, then ^C)
>>>          perf report --call-graph --stdio
>>>
>>
>> I have not looked at NFS RDMA (and 3.x kernel) source yet. But see
>> that "rb_prev" up in the #7 spot ? Do we have Red Black tree somewhere
>> in the paths ? Trees like that requires extensive lockings.
>>
>
> So I did a quick read on sunrpc/xprtrdma source (based on OFA 1.5.4.1
> tar ball) ... Here is a random thought (not related to the rb tree
> comment).....
>
> The inflight packet count seems to be controlled by
> xprt_rdma_slot_table_entries that is currently hard-coded as
> RPCRDMA_DEF_SLOT_TABLE (32) (?).  I'm wondering whether it could help
> with the bandwidth number if we pump it up, say 64 instead ? Not sure
> whether FMR pool size needs to get adjusted accordingly though.

1)

The client slot count is not hard-coded, it can easily be changed by
writing a value to /proc and initiating a new mount. But I doubt that
increasing the slot table will improve performance much, unless this is
a small-random-read, and spindle-limited workload.

2)

The observation appears to be that the bandwidth is server CPU limited.
Increasing the load offered by the client probably won't move the needle,
until that's addressed.


>
> In short, if anyone has benchmark setup handy, bumping up the slot
> table size as the following might be interesting:
>
> --- ofa_kernel-1.5.4.1.orig/include/linux/sunrpc/xprtrdma.h
> 2013-03-21 09:19:36.233006570 -0700
> +++ ofa_kernel-1.5.4.1/include/linux/sunrpc/xprtrdma.h  2013-04-24
> 10:52:20.934781304 -0700
> @@ -59,7 +59,7 @@
>    * a single chunk type per message is supported currently.
>    */
>   #define RPCRDMA_MIN_SLOT_TABLE (2U)
> -#define RPCRDMA_DEF_SLOT_TABLE (32U)
> +#define RPCRDMA_DEF_SLOT_TABLE (64U)
>   #define RPCRDMA_MAX_SLOT_TABLE (256U)
>
>   #define RPCRDMA_DEF_INLINE  (1024)     /* default inline max */
>
> -- Wendy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

WARNING: multiple messages have this Message-ID (diff)
From: Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
To: Wendy Cheng <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: "J. Bruce Fields"
	<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>,
	Yan Burman <yanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	"Atchley, Scott" <atchleyes-1Heg1YXhbW8@public.gmane.org>,
	Tom Tucker
	<tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: NFS over RDMA benchmark
Date: Wed, 24 Apr 2013 14:26:40 -0400	[thread overview]
Message-ID: <517823E0.4000402@talpey.com> (raw)
In-Reply-To: <CABgxfbHpNgQyEjd2OVNMgJoLpt_VyLiOL5hMCLwotMd5kincwg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 4/24/2013 2:04 PM, Wendy Cheng wrote:
> On Wed, Apr 24, 2013 at 9:27 AM, Wendy Cheng <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Wed, Apr 24, 2013 at 8:26 AM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
>>> On Wed, Apr 24, 2013 at 11:05:40AM -0400, J. Bruce Fields wrote:
>>>> On Wed, Apr 24, 2013 at 12:35:03PM +0000, Yan Burman wrote:
>>>>>
>>>>>
>>>>>
>>>>> Perf top for the CPU with high tasklet count gives:
>>>>>
>>>>>               samples  pcnt         RIP        function                    DSO
>>>>>               _______ _____ ________________ ___________________________ ___________________________________________________________________
>>>>>
>>>>>               2787.00 24.1% ffffffff81062a00 mutex_spin_on_owner         /root/vmlinux
>>>>
>>>> I guess that means lots of contention on some mutex?  If only we knew
>>>> which one.... perf should also be able to collect stack statistics, I
>>>> forget how.
>>>
>>> Googling around....  I think we want:
>>>
>>>          perf record -a --call-graph
>>>          (give it a chance to collect some samples, then ^C)
>>>          perf report --call-graph --stdio
>>>
>>
>> I have not looked at NFS RDMA (and 3.x kernel) source yet. But see
>> that "rb_prev" up in the #7 spot ? Do we have Red Black tree somewhere
>> in the paths ? Trees like that requires extensive lockings.
>>
>
> So I did a quick read on sunrpc/xprtrdma source (based on OFA 1.5.4.1
> tar ball) ... Here is a random thought (not related to the rb tree
> comment).....
>
> The inflight packet count seems to be controlled by
> xprt_rdma_slot_table_entries that is currently hard-coded as
> RPCRDMA_DEF_SLOT_TABLE (32) (?).  I'm wondering whether it could help
> with the bandwidth number if we pump it up, say 64 instead ? Not sure
> whether FMR pool size needs to get adjusted accordingly though.

1)

The client slot count is not hard-coded, it can easily be changed by
writing a value to /proc and initiating a new mount. But I doubt that
increasing the slot table will improve performance much, unless this is
a small-random-read, and spindle-limited workload.

2)

The observation appears to be that the bandwidth is server CPU limited.
Increasing the load offered by the client probably won't move the needle,
until that's addressed.


>
> In short, if anyone has benchmark setup handy, bumping up the slot
> table size as the following might be interesting:
>
> --- ofa_kernel-1.5.4.1.orig/include/linux/sunrpc/xprtrdma.h
> 2013-03-21 09:19:36.233006570 -0700
> +++ ofa_kernel-1.5.4.1/include/linux/sunrpc/xprtrdma.h  2013-04-24
> 10:52:20.934781304 -0700
> @@ -59,7 +59,7 @@
>    * a single chunk type per message is supported currently.
>    */
>   #define RPCRDMA_MIN_SLOT_TABLE (2U)
> -#define RPCRDMA_DEF_SLOT_TABLE (32U)
> +#define RPCRDMA_DEF_SLOT_TABLE (64U)
>   #define RPCRDMA_MAX_SLOT_TABLE (256U)
>
>   #define RPCRDMA_DEF_INLINE  (1024)     /* default inline max */
>
> -- Wendy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2013-04-24 18:33 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-17 14:36 NFS over RDMA benchmark Yan Burman
2013-04-17 14:36 ` Yan Burman
2013-04-17 17:15 ` Wendy Cheng
2013-04-17 17:15   ` Wendy Cheng
2013-04-17 17:32   ` Atchley, Scott
2013-04-17 17:32     ` Atchley, Scott
2013-04-17 18:06     ` Wendy Cheng
2013-04-17 18:06       ` Wendy Cheng
2013-04-18 12:47       ` Yan Burman
2013-04-18 12:47         ` Yan Burman
2013-04-18 16:16         ` Wendy Cheng
2013-04-18 16:16           ` Wendy Cheng
2013-04-23 21:06         ` J. Bruce Fields
2013-04-23 21:06           ` J. Bruce Fields
2013-04-24 12:35           ` Yan Burman
2013-04-24 12:35             ` Yan Burman
2013-04-24 15:05             ` J. Bruce Fields
2013-04-24 15:05               ` J. Bruce Fields
2013-04-24 15:26               ` J. Bruce Fields
2013-04-24 15:26                 ` J. Bruce Fields
2013-04-24 16:27                 ` Wendy Cheng
2013-04-24 16:27                   ` Wendy Cheng
2013-04-24 18:04                   ` Wendy Cheng
2013-04-24 18:04                     ` Wendy Cheng
2013-04-24 18:26                     ` Tom Talpey [this message]
2013-04-24 18:26                       ` Tom Talpey
2013-04-25 17:18                       ` Wendy Cheng
2013-04-25 17:18                         ` Wendy Cheng
2013-04-25 19:01                         ` Phil Pishioneri
2013-04-25 19:01                           ` Phil Pishioneri
2013-04-25 20:14                           ` Tom Talpey
2013-04-25 20:14                             ` Tom Talpey
2013-04-25 20:04                         ` Tom Talpey
2013-04-25 20:04                           ` Tom Talpey
2013-04-25 21:17                           ` Tom Tucker
2013-04-25 21:17                             ` Tom Tucker
2013-04-25 21:58                             ` Wendy Cheng
2013-04-25 21:58                               ` Wendy Cheng
2013-04-25 22:26                               ` Wendy Cheng
2013-04-25 22:26                                 ` Wendy Cheng
2013-04-28  6:28                 ` Yan Burman
2013-04-28  6:28                   ` Yan Burman
2013-04-28 14:42                   ` J. Bruce Fields
2013-04-28 14:42                     ` J. Bruce Fields
2013-04-29  5:34                     ` Wendy Cheng
2013-04-29  5:34                       ` Wendy Cheng
2013-04-29 12:16                       ` Yan Burman
2013-04-29 12:16                         ` Yan Burman
2013-04-29 13:05                         ` Tom Tucker
2013-04-29 13:05                           ` Tom Tucker
2013-04-29 13:07                           ` Tom Tucker
2013-04-29 13:07                             ` Tom Tucker
2013-04-30  5:09                     ` Yan Burman
2013-04-30  5:09                       ` Yan Burman
2013-04-30 13:05                       ` Tom Talpey
2013-04-30 13:05                         ` Tom Talpey
2013-04-30 14:23                         ` Yan Burman
2013-04-30 14:23                           ` Yan Burman
2013-04-30 14:44                           ` Tom Talpey
2013-04-30 14:44                             ` Tom Talpey
2013-04-30 14:20                       ` Tom Talpey
2013-04-30 14:20                         ` Tom Talpey
2013-04-30 14:38                         ` Yan Burman
2013-04-30 14:38                           ` Yan Burman
2013-04-30 18:58                           ` Tom Tucker
2013-04-30 18:58                             ` Tom Tucker
     [not found]                             ` <CALsNU1MsjH5=p4Wtj2aJ5+odC7y7-5oTGhrzOL-=15pXaYYUZw@mail.gmail.com>
     [not found]                               ` <CABgxfbFhZTBO81WC5BcRRfQB_YBjE4N=sfS+G9eAzaFHYC_dWw@mail.gmail.com>
2013-06-20 14:56                                 ` Or Gerlitz
2013-06-20 14:56                                   ` Or Gerlitz
2013-04-30 16:24                       ` Wendy Cheng
2013-04-30 16:24                         ` Wendy Cheng
2013-04-30 13:38                     ` J. Bruce Fields
2013-04-30 13:38                       ` J. Bruce Fields
2013-04-19  2:27 ` Peng Tao
2013-04-19  2:27   ` Peng Tao
2013-04-22 11:07   ` Yan Burman
2013-04-22 11:07     ` Yan Burman
     [not found] <51703280.03e9440a.06a6.3f9f@mx.google.com>
2013-04-18 19:15 ` Wendy Cheng
2013-04-18 19:15   ` Wendy Cheng
2013-04-19  1:03   ` Atchley, Scott
2013-04-19  1:03     ` Atchley, Scott
2013-04-19  3:35     ` Spencer
2013-04-19  3:35       ` Spencer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=517823E0.4000402@talpey.com \
    --to=tom@talpey.com \
    --cc=atchleyes@ornl.gov \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=s.wendy.cheng@gmail.com \
    --cc=tom@opengridcomputing.com \
    --cc=yanb@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.