From: "J. Bruce Fields" <bfields@fieldses.org>
To: Yan Burman <yanb@mellanox.com>
Cc: Wendy Cheng <s.wendy.cheng@gmail.com>,
"Atchley, Scott" <atchleyes@ornl.gov>,
Tom Tucker <tom@opengridcomputing.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
Or Gerlitz <ogerlitz@mellanox.com>
Subject: Re: NFS over RDMA benchmark
Date: Tue, 23 Apr 2013 17:06:07 -0400 [thread overview]
Message-ID: <20130423210607.GJ3676@fieldses.org> (raw)
In-Reply-To: <0EE9A1CDC8D6434DB00095CD7DB873462CF9715B@MTLDAG01.mtl.com>
On Thu, Apr 18, 2013 at 12:47:09PM +0000, Yan Burman wrote:
>
>
> > -----Original Message-----
> > From: Wendy Cheng [mailto:s.wendy.cheng@gmail.com]
> > Sent: Wednesday, April 17, 2013 21:06
> > To: Atchley, Scott
> > Cc: Yan Burman; J. Bruce Fields; Tom Tucker; linux-rdma@vger.kernel.org;
> > linux-nfs@vger.kernel.org
> > Subject: Re: NFS over RDMA benchmark
> >
> > On Wed, Apr 17, 2013 at 10:32 AM, Atchley, Scott <atchleyes@ornl.gov>
> > wrote:
> > > On Apr 17, 2013, at 1:15 PM, Wendy Cheng <s.wendy.cheng@gmail.com>
> > wrote:
> > >
> > >> On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman <yanb@mellanox.com>
> > wrote:
> > >>> Hi.
> > >>>
> > >>> I've been trying to do some benchmarks for NFS over RDMA and I seem to
> > only get about half of the bandwidth that the HW can give me.
> > >>> My setup consists of 2 servers each with 16 cores, 32Gb of memory, and
> > Mellanox ConnectX3 QDR card over PCI-e gen3.
> > >>> These servers are connected to a QDR IB switch. The backing storage on
> > the server is tmpfs mounted with noatime.
> > >>> I am running kernel 3.5.7.
> > >>>
> > >>> When running ib_send_bw, I get 4.3-4.5 GB/sec for block sizes 4-512K.
> > >>> When I run fio over rdma mounted nfs, I get 260-2200MB/sec for the
> > same block sizes (4-512K). running over IPoIB-CM, I get 200-980MB/sec.
> > >
> > > Yan,
> > >
> > > Are you trying to optimize single client performance or server performance
> > with multiple clients?
> > >
>
> I am trying to get maximum performance from a single server - I used 2 processes in fio test - more than 2 did not show any performance boost.
> I tried running fio from 2 different PCs on 2 different files, but the sum of the two is more or less the same as running from single client PC.
>
> What I did see is that server is sweating a lot more than the clients and more than that, it has 1 core (CPU5) in 100% softirq tasklet:
> cat /proc/softirqs
Would any profiling help figure out which code it's spending time in?
(E.g. something simple as "perf top" might have useful output.)
--b.
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
> HI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> TIMER: 418767 46596 43515 44547 50099 34815 40634 40337 39551 93442 73733 42631 42509 41592 40351 61793
> NET_TX: 28719 309 1421 1294 1730 1243 832 937 11 44 41 20 26 19 15 29
> NET_RX: 612070 19 22 21 6 235 3 2 9 6 17 16 20 13 16 10
> BLOCK: 5941 0 0 0 0 0 0 0 519 259 1238 272 253 174 215 2618
> BLOCK_IOPOLL: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> TASKLET: 28 1 1 1 1 1540653 1 1 29 1 1 1 1 1 1 2
> SCHED: 364965 26547 16807 18403 22919 8678 14358 14091 16981 64903 47141 18517 19179 18036 17037 38261
> HRTIMER: 13 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1
> RCU: 945823 841546 715281 892762 823564 42663 863063 841622 333577 389013 393501 239103 221524 258159 313426 234030
> > >
> > >> Remember there are always gaps between wire speed (that ib_send_bw
> > >> measures) and real world applications.
>
> I realize that, but I don't expect the difference to be more than twice.
>
> > >>
> > >> That being said, does your server use default export (sync) option ?
> > >> Export the share with "async" option can bring you closer to wire
> > >> speed. However, the practice (async) is generally not recommended in
> > >> a real production system - as it can cause data integrity issues, e.g.
> > >> you have more chances to lose data when the boxes crash.
>
> I am running with async export option, but that should not matter too much, since my backing storage is tmpfs mounted with noatime.
>
> > >>
> > >> -- Wendy
> > >
> > >
> > > Wendy,
> > >
> > > It has a been a few years since I looked at RPCRDMA, but I seem to
> > remember that RPCs were limited to 32KB which means that you have to
> > pipeline them to get linerate. In addition to requiring pipelining, the
> > argument from the authors was that the goal was to maximize server
> > performance and not single client performance.
> > >
>
> What I see is that performance increases almost linearly up to block size 256K and falls a little at block size 512K
>
> > > Scott
> > >
> >
> > That (client count) brings up a good point ...
> >
> > FIO is really not a good benchmark for NFS. Does anyone have SPECsfs
> > numbers on NFS over RDMA to share ?
> >
> > -- Wendy
>
> What do you suggest for benchmarking NFS?
>
> Yan
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
To: Yan Burman <yanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Wendy Cheng
<s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
"Atchley, Scott" <atchleyes-1Heg1YXhbW8@public.gmane.org>,
Tom Tucker
<tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: NFS over RDMA benchmark
Date: Tue, 23 Apr 2013 17:06:07 -0400 [thread overview]
Message-ID: <20130423210607.GJ3676@fieldses.org> (raw)
In-Reply-To: <0EE9A1CDC8D6434DB00095CD7DB873462CF9715B-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
On Thu, Apr 18, 2013 at 12:47:09PM +0000, Yan Burman wrote:
>
>
> > -----Original Message-----
> > From: Wendy Cheng [mailto:s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
> > Sent: Wednesday, April 17, 2013 21:06
> > To: Atchley, Scott
> > Cc: Yan Burman; J. Bruce Fields; Tom Tucker; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> > linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: NFS over RDMA benchmark
> >
> > On Wed, Apr 17, 2013 at 10:32 AM, Atchley, Scott <atchleyes-1Heg1YXhbW8@public.gmane.org>
> > wrote:
> > > On Apr 17, 2013, at 1:15 PM, Wendy Cheng <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > wrote:
> > >
> > >> On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman <yanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > wrote:
> > >>> Hi.
> > >>>
> > >>> I've been trying to do some benchmarks for NFS over RDMA and I seem to
> > only get about half of the bandwidth that the HW can give me.
> > >>> My setup consists of 2 servers each with 16 cores, 32Gb of memory, and
> > Mellanox ConnectX3 QDR card over PCI-e gen3.
> > >>> These servers are connected to a QDR IB switch. The backing storage on
> > the server is tmpfs mounted with noatime.
> > >>> I am running kernel 3.5.7.
> > >>>
> > >>> When running ib_send_bw, I get 4.3-4.5 GB/sec for block sizes 4-512K.
> > >>> When I run fio over rdma mounted nfs, I get 260-2200MB/sec for the
> > same block sizes (4-512K). running over IPoIB-CM, I get 200-980MB/sec.
> > >
> > > Yan,
> > >
> > > Are you trying to optimize single client performance or server performance
> > with multiple clients?
> > >
>
> I am trying to get maximum performance from a single server - I used 2 processes in fio test - more than 2 did not show any performance boost.
> I tried running fio from 2 different PCs on 2 different files, but the sum of the two is more or less the same as running from single client PC.
>
> What I did see is that server is sweating a lot more than the clients and more than that, it has 1 core (CPU5) in 100% softirq tasklet:
> cat /proc/softirqs
Would any profiling help figure out which code it's spending time in?
(E.g. something simple as "perf top" might have useful output.)
--b.
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
> HI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> TIMER: 418767 46596 43515 44547 50099 34815 40634 40337 39551 93442 73733 42631 42509 41592 40351 61793
> NET_TX: 28719 309 1421 1294 1730 1243 832 937 11 44 41 20 26 19 15 29
> NET_RX: 612070 19 22 21 6 235 3 2 9 6 17 16 20 13 16 10
> BLOCK: 5941 0 0 0 0 0 0 0 519 259 1238 272 253 174 215 2618
> BLOCK_IOPOLL: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> TASKLET: 28 1 1 1 1 1540653 1 1 29 1 1 1 1 1 1 2
> SCHED: 364965 26547 16807 18403 22919 8678 14358 14091 16981 64903 47141 18517 19179 18036 17037 38261
> HRTIMER: 13 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1
> RCU: 945823 841546 715281 892762 823564 42663 863063 841622 333577 389013 393501 239103 221524 258159 313426 234030
> > >
> > >> Remember there are always gaps between wire speed (that ib_send_bw
> > >> measures) and real world applications.
>
> I realize that, but I don't expect the difference to be more than twice.
>
> > >>
> > >> That being said, does your server use default export (sync) option ?
> > >> Export the share with "async" option can bring you closer to wire
> > >> speed. However, the practice (async) is generally not recommended in
> > >> a real production system - as it can cause data integrity issues, e.g.
> > >> you have more chances to lose data when the boxes crash.
>
> I am running with async export option, but that should not matter too much, since my backing storage is tmpfs mounted with noatime.
>
> > >>
> > >> -- Wendy
> > >
> > >
> > > Wendy,
> > >
> > > It has a been a few years since I looked at RPCRDMA, but I seem to
> > remember that RPCs were limited to 32KB which means that you have to
> > pipeline them to get linerate. In addition to requiring pipelining, the
> > argument from the authors was that the goal was to maximize server
> > performance and not single client performance.
> > >
>
> What I see is that performance increases almost linearly up to block size 256K and falls a little at block size 512K
>
> > > Scott
> > >
> >
> > That (client count) brings up a good point ...
> >
> > FIO is really not a good benchmark for NFS. Does anyone have SPECsfs
> > numbers on NFS over RDMA to share ?
> >
> > -- Wendy
>
> What do you suggest for benchmarking NFS?
>
> Yan
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-04-23 21:06 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-17 14:36 NFS over RDMA benchmark Yan Burman
2013-04-17 14:36 ` Yan Burman
2013-04-17 17:15 ` Wendy Cheng
2013-04-17 17:15 ` Wendy Cheng
2013-04-17 17:32 ` Atchley, Scott
2013-04-17 17:32 ` Atchley, Scott
2013-04-17 18:06 ` Wendy Cheng
2013-04-17 18:06 ` Wendy Cheng
2013-04-18 12:47 ` Yan Burman
2013-04-18 12:47 ` Yan Burman
2013-04-18 16:16 ` Wendy Cheng
2013-04-18 16:16 ` Wendy Cheng
2013-04-23 21:06 ` J. Bruce Fields [this message]
2013-04-23 21:06 ` J. Bruce Fields
2013-04-24 12:35 ` Yan Burman
2013-04-24 12:35 ` Yan Burman
2013-04-24 15:05 ` J. Bruce Fields
2013-04-24 15:05 ` J. Bruce Fields
2013-04-24 15:26 ` J. Bruce Fields
2013-04-24 15:26 ` J. Bruce Fields
2013-04-24 16:27 ` Wendy Cheng
2013-04-24 16:27 ` Wendy Cheng
2013-04-24 18:04 ` Wendy Cheng
2013-04-24 18:04 ` Wendy Cheng
2013-04-24 18:26 ` Tom Talpey
2013-04-24 18:26 ` Tom Talpey
2013-04-25 17:18 ` Wendy Cheng
2013-04-25 17:18 ` Wendy Cheng
2013-04-25 19:01 ` Phil Pishioneri
2013-04-25 19:01 ` Phil Pishioneri
2013-04-25 20:14 ` Tom Talpey
2013-04-25 20:14 ` Tom Talpey
2013-04-25 20:04 ` Tom Talpey
2013-04-25 20:04 ` Tom Talpey
2013-04-25 21:17 ` Tom Tucker
2013-04-25 21:17 ` Tom Tucker
2013-04-25 21:58 ` Wendy Cheng
2013-04-25 21:58 ` Wendy Cheng
2013-04-25 22:26 ` Wendy Cheng
2013-04-25 22:26 ` Wendy Cheng
2013-04-28 6:28 ` Yan Burman
2013-04-28 6:28 ` Yan Burman
2013-04-28 14:42 ` J. Bruce Fields
2013-04-28 14:42 ` J. Bruce Fields
2013-04-29 5:34 ` Wendy Cheng
2013-04-29 5:34 ` Wendy Cheng
2013-04-29 12:16 ` Yan Burman
2013-04-29 12:16 ` Yan Burman
2013-04-29 13:05 ` Tom Tucker
2013-04-29 13:05 ` Tom Tucker
2013-04-29 13:07 ` Tom Tucker
2013-04-29 13:07 ` Tom Tucker
2013-04-30 5:09 ` Yan Burman
2013-04-30 5:09 ` Yan Burman
2013-04-30 13:05 ` Tom Talpey
2013-04-30 13:05 ` Tom Talpey
2013-04-30 14:23 ` Yan Burman
2013-04-30 14:23 ` Yan Burman
2013-04-30 14:44 ` Tom Talpey
2013-04-30 14:44 ` Tom Talpey
2013-04-30 14:20 ` Tom Talpey
2013-04-30 14:20 ` Tom Talpey
2013-04-30 14:38 ` Yan Burman
2013-04-30 14:38 ` Yan Burman
2013-04-30 18:58 ` Tom Tucker
2013-04-30 18:58 ` Tom Tucker
[not found] ` <CALsNU1MsjH5=p4Wtj2aJ5+odC7y7-5oTGhrzOL-=15pXaYYUZw@mail.gmail.com>
[not found] ` <CABgxfbFhZTBO81WC5BcRRfQB_YBjE4N=sfS+G9eAzaFHYC_dWw@mail.gmail.com>
2013-06-20 14:56 ` Or Gerlitz
2013-06-20 14:56 ` Or Gerlitz
2013-04-30 16:24 ` Wendy Cheng
2013-04-30 16:24 ` Wendy Cheng
2013-04-30 13:38 ` J. Bruce Fields
2013-04-30 13:38 ` J. Bruce Fields
2013-04-19 2:27 ` Peng Tao
2013-04-19 2:27 ` Peng Tao
2013-04-22 11:07 ` Yan Burman
2013-04-22 11:07 ` Yan Burman
[not found] <51703280.03e9440a.06a6.3f9f@mx.google.com>
2013-04-18 19:15 ` Wendy Cheng
2013-04-18 19:15 ` Wendy Cheng
2013-04-19 1:03 ` Atchley, Scott
2013-04-19 1:03 ` Atchley, Scott
2013-04-19 3:35 ` Spencer
2013-04-19 3:35 ` Spencer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130423210607.GJ3676@fieldses.org \
--to=bfields@fieldses.org \
--cc=atchleyes@ornl.gov \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=s.wendy.cheng@gmail.com \
--cc=tom@opengridcomputing.com \
--cc=yanb@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.