All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Nelson <mark.nelson@inktank.com>
To: "Atchley, Scott" <atchleyes@ornl.gov>
Cc: Gandalf Corvotempesta <gandalf.corvotempesta@gmail.com>,
	"martin@tuxadero.com" <martin@tuxadero.com>,
	Sage Weil <sage@inktank.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: SSD journal suggestion
Date: Thu, 08 Nov 2012 08:39:03 -0600	[thread overview]
Message-ID: <509BC407.4030406@inktank.com> (raw)
In-Reply-To: <FB2218CC-8F8F-480A-A5E5-DEF933100612@ornl.gov>

On 11/08/2012 07:55 AM, Atchley, Scott wrote:
> On Nov 8, 2012, at 3:22 AM, Gandalf Corvotempesta <gandalf.corvotempesta@gmail.com> wrote:
>
>> 2012/11/8 Mark Nelson <mark.nelson@inktank.com>:
>>> I haven't done much with IPoIB (just RDMA), but my understanding is that it
>>> tends to top out at like 15Gb/s.  Some others on this mailing list can
>>> probably speak more authoritatively.  Even with RDMA you are going to top
>>> out at around 3.1-3.2GB/s.
>>
>> 15Gb/s is still faster than 10Gbe
>> But this speed limit seems to be kernel-related and should be the same
>> even in a 10Gbe environment, or not?
>
> We have a test cluster with Mellanox QDR HCAs (i.e. NICs). When using Verbs (the native IB API), I see ~27 Gb/s between two hosts. When running Sockets over these devices using IPoIB, I see 13-22 Gb/s depending on whether I use interrupt affinity and process binding.
>
> For our Ceph testing, we will set the affinity of two of the mlx4 interrupt handlers to cores 0 and 1 and we will not using process binding. For single stream Netperf, we do use process binding and bind it to the same core (i.e. 0) and we see ~22 Gb/s. For multiple, concurrent Netperf runs, we do not use process binding but we still see ~22 Gb/s.

Scott, this is very interesting!  Does setting the interrupt affinity 
make the biggest difference then when you have concurrent netperf 
processes going?  For some reason I thought that setting interrupt 
affinity wasn't even guaranteed in linux any more, but this is just some 
half-remembered recollection from a year or two ago.

>
> We used all of the Mellanox tuning recommendations for IPoIB available in their tuning pdf:
>
> http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf
>
> We looked at their interrupt affinity setting scripts and then wrote our own.
>
> Our testing is with IPoIB in "connected" mode, not "datagram" mode. Connected mode is less scalable, but currently I only get ~3 Gb/s with datagram mode. Mellanox claims that we should get identical performance with both modes and we are looking into it.
>
> We are getting a new test cluster with FDR HCAs and I will look into those as well.

Nice!  At some point I'll probably try to justify getting some FDR cards 
in house.  I'd definitely like to hear how FDR ends up working for you.

>
> Scott
>

Mark

  reply	other threads:[~2012-11-08 14:38 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-07 12:13 SSD journal suggestion Gandalf Corvotempesta
2012-11-07 12:17 ` Sage Weil
2012-11-07 12:28   ` Gandalf Corvotempesta
2012-11-07 15:01     ` Mark Nelson
2012-11-07 16:12       ` Atchley, Scott
2012-11-07 16:20         ` Mark Nelson
2012-11-07 16:35           ` Atchley, Scott
2012-11-07 16:41             ` Mark Nelson
2012-11-07 21:11               ` Martin Mailand
2012-11-07 21:14                 ` Gandalf Corvotempesta
2012-11-07 21:35                   ` Martin Mailand
2012-11-07 21:44                     ` Stefan Priebe
2012-11-07 21:55                       ` Martin Mailand
2012-11-07 21:59                         ` Stefan Priebe
2012-11-07 22:13                           ` Martin Mailand
2012-11-07 22:28                     ` Gandalf Corvotempesta
2012-11-07 22:39                       ` Martin Mailand
2012-11-07 22:51                         ` Gandalf Corvotempesta
2012-11-07 23:12                           ` Mark Nelson
2012-11-08  8:22                             ` Gandalf Corvotempesta
2012-11-08 13:55                               ` Atchley, Scott
2012-11-08 14:39                                 ` Mark Nelson [this message]
2012-11-08 15:00                                   ` Atchley, Scott
2012-11-08 15:02                                     ` Atchley, Scott
2012-11-08 16:19                                       ` Andrey Korolyov
2012-11-08 18:03                                         ` Atchley, Scott
2012-11-08 20:12                                     ` Joseph Glanville
2012-11-08 21:21                                       ` SSD journal suggestion / rsockets Dieter Kasper
2012-11-08 22:00                                         ` Joseph Glanville
2012-11-09 14:43                                           ` Atchley, Scott
2012-11-09 23:41                                             ` Joseph Glanville

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=509BC407.4030406@inktank.com \
    --to=mark.nelson@inktank.com \
    --cc=atchleyes@ornl.gov \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gandalf.corvotempesta@gmail.com \
    --cc=martin@tuxadero.com \
    --cc=sage@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.