From: Dave Chinner <david@fromorbit.com>
To: David Oostdyk <daveo@ll.mit.edu>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"stan@hardwarefreak.com" <stan@hardwarefreak.com>,
"xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: high-speed disk I/O is CPU-bound?
Date: Fri, 17 May 2013 08:56:56 +1000 [thread overview]
Message-ID: <20130516225656.GG24635@dastard> (raw)
In-Reply-To: <5194FCAC.1010300@ll.mit.edu>
On Thu, May 16, 2013 at 11:35:08AM -0400, David Oostdyk wrote:
> On 05/16/13 07:36, Stan Hoeppner wrote:
> >On 5/15/2013 7:59 PM, Dave Chinner wrote:
> >>[cc xfs list, seeing as that's where all the people who use XFS in
> >>these sorts of configurations hang out. ]
> >>
> >>On Fri, May 10, 2013 at 10:04:44AM -0400, David Oostdyk wrote:
> >>>As a basic benchmark, I have an application
> >>>that simply writes the same buffer (say, 128MB) to disk repeatedly.
> >>>Alternatively you could use the "dd" utility. (For these
> >>>benchmarks, I set /proc/sys/vm/dirty_bytes to 512M or lower, since
> >>>these systems have a lot of RAM.)
> >>>
> >>>The basic observations are:
> >>>
> >>>1. "single-threaded" writes, either a file on the mounted
> >>>filesystem or with a "dd" to the raw RAID device, seem to be limited
> >>>to 1200-1400MB/sec. These numbers vary slightly based on whether
> >>>TurboBoost is affecting the writing process or not. "top" will show
> >>>this process running at 100% CPU.
> >>Expected. You are using buffered IO. Write speed is limited by the
> >>rate at which your user process can memcpy data into the page cache.
> >>
> >>>2. With two benchmarks running on the same device, I see aggregate
> >>>write speeds of up to ~2.4GB/sec, which is closer to what I'd expect
> >>>the drives of being able to deliver. This can either be with two
> >>>applications writing to separate files on the same mounted file
> >>>system, or two separate "dd" applications writing to distinct
> >>>locations on the raw device.
> >2.4GB/s is the interface limit of quad lane 6G SAS. Coincidence? If
> >you've daisy chained the SAS expander backplanes within a server chassis
> >(9266-8i/72405), or between external enclosures (9285-8e/71685), and
> >have a single 4 lane cable (SFF-8087/8088/8643/8644) connected to your
> >RAID card, this would fully explain the 2.4GB/s wall, regardless of how
> >many parallel processes are writing, or any other software factor.
> >
> >But surely you already know this, and you're using more than one 4 lane
> >cable. Just covering all the bases here, due to seeing 2.4 GB/s as the
> >stated wall. This number is just too coincidental to ignore.
>
> We definitely have two 4-lane cables being used, but this is an
> interesting coincidence. I'd be surprised if anyone could really
> achieve the theoretical throughput on one cable, though. We have
> one JBOD that only takes a single 4-lane cable, and we seem to cap
> out at closer to 1450MB/sec on that unit. (This is just a single
> point of reference, and I don't have many tests where only one
> 4-lane cable was in use.)
You can get pretty close to the theoretical limit on the back end
SAS cables - just like you can with FC.
What I'd suggest you do is look at the RAID card configuration -
often they default to active/passive failover configurations when
there are multiple channels to the same storage. Then hey only use
one of the cables for all traffic. Some RAID cards offer
ative/active or "load balanced" options where all back end paths are
used in redundant configurations rather than just one....
> You guys hit the nail on the head! With O_DIRECT I can use a single
> writer thread and easily see the same throughput that I _ever_ saw
> in the multiple-writer case (~2.4GB/sec), and "top" shows the writer
> at 10% CPU usage. I've modified my application to use O_DIRECT and
> it makes a world of difference.
Be aware that O_DIRECT is not a magic bullet. It can make your IO
go a lot slower on some worklaods and storage configs....
> [It's interesting that you see performance benefits for O_DIRECT
> even with a single SATA drive. The reason it took me so long to
> test O_DIRECT in this case, is that I never saw any significant
> benefit from using it in the past. But that is when I didn't have
> such fast storage, so I probably wasn't hitting the bottleneck with
> buffered I/O?]
Right - for applications not designed to use direct IO from the
ground up, this is typically the case - buffered IO is faster right
up to the point where you run out of CPU....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: David Oostdyk <daveo@ll.mit.edu>
Cc: "stan@hardwarefreak.com" <stan@hardwarefreak.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: high-speed disk I/O is CPU-bound?
Date: Fri, 17 May 2013 08:56:56 +1000 [thread overview]
Message-ID: <20130516225656.GG24635@dastard> (raw)
In-Reply-To: <5194FCAC.1010300@ll.mit.edu>
On Thu, May 16, 2013 at 11:35:08AM -0400, David Oostdyk wrote:
> On 05/16/13 07:36, Stan Hoeppner wrote:
> >On 5/15/2013 7:59 PM, Dave Chinner wrote:
> >>[cc xfs list, seeing as that's where all the people who use XFS in
> >>these sorts of configurations hang out. ]
> >>
> >>On Fri, May 10, 2013 at 10:04:44AM -0400, David Oostdyk wrote:
> >>>As a basic benchmark, I have an application
> >>>that simply writes the same buffer (say, 128MB) to disk repeatedly.
> >>>Alternatively you could use the "dd" utility. (For these
> >>>benchmarks, I set /proc/sys/vm/dirty_bytes to 512M or lower, since
> >>>these systems have a lot of RAM.)
> >>>
> >>>The basic observations are:
> >>>
> >>>1. "single-threaded" writes, either a file on the mounted
> >>>filesystem or with a "dd" to the raw RAID device, seem to be limited
> >>>to 1200-1400MB/sec. These numbers vary slightly based on whether
> >>>TurboBoost is affecting the writing process or not. "top" will show
> >>>this process running at 100% CPU.
> >>Expected. You are using buffered IO. Write speed is limited by the
> >>rate at which your user process can memcpy data into the page cache.
> >>
> >>>2. With two benchmarks running on the same device, I see aggregate
> >>>write speeds of up to ~2.4GB/sec, which is closer to what I'd expect
> >>>the drives of being able to deliver. This can either be with two
> >>>applications writing to separate files on the same mounted file
> >>>system, or two separate "dd" applications writing to distinct
> >>>locations on the raw device.
> >2.4GB/s is the interface limit of quad lane 6G SAS. Coincidence? If
> >you've daisy chained the SAS expander backplanes within a server chassis
> >(9266-8i/72405), or between external enclosures (9285-8e/71685), and
> >have a single 4 lane cable (SFF-8087/8088/8643/8644) connected to your
> >RAID card, this would fully explain the 2.4GB/s wall, regardless of how
> >many parallel processes are writing, or any other software factor.
> >
> >But surely you already know this, and you're using more than one 4 lane
> >cable. Just covering all the bases here, due to seeing 2.4 GB/s as the
> >stated wall. This number is just too coincidental to ignore.
>
> We definitely have two 4-lane cables being used, but this is an
> interesting coincidence. I'd be surprised if anyone could really
> achieve the theoretical throughput on one cable, though. We have
> one JBOD that only takes a single 4-lane cable, and we seem to cap
> out at closer to 1450MB/sec on that unit. (This is just a single
> point of reference, and I don't have many tests where only one
> 4-lane cable was in use.)
You can get pretty close to the theoretical limit on the back end
SAS cables - just like you can with FC.
What I'd suggest you do is look at the RAID card configuration -
often they default to active/passive failover configurations when
there are multiple channels to the same storage. Then hey only use
one of the cables for all traffic. Some RAID cards offer
ative/active or "load balanced" options where all back end paths are
used in redundant configurations rather than just one....
> You guys hit the nail on the head! With O_DIRECT I can use a single
> writer thread and easily see the same throughput that I _ever_ saw
> in the multiple-writer case (~2.4GB/sec), and "top" shows the writer
> at 10% CPU usage. I've modified my application to use O_DIRECT and
> it makes a world of difference.
Be aware that O_DIRECT is not a magic bullet. It can make your IO
go a lot slower on some worklaods and storage configs....
> [It's interesting that you see performance benefits for O_DIRECT
> even with a single SATA drive. The reason it took me so long to
> test O_DIRECT in this case, is that I never saw any significant
> benefit from using it in the past. But that is when I didn't have
> such fast storage, so I probably wasn't hitting the bottleneck with
> buffered I/O?]
Right - for applications not designed to use direct IO from the
ground up, this is typically the case - buffered IO is faster right
up to the point where you run out of CPU....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2013-05-16 22:57 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-10 14:04 high-speed disk I/O is CPU-bound? David Oostdyk
2013-05-11 0:19 ` Eric Wong
2013-05-13 14:58 ` David Oostdyk
2013-05-12 16:53 ` Rob Landley
2013-05-13 15:18 ` David Oostdyk
2013-05-16 0:59 ` Dave Chinner
2013-05-16 0:59 ` Dave Chinner
2013-05-16 11:36 ` Stan Hoeppner
2013-05-16 11:36 ` Stan Hoeppner
2013-05-16 15:35 ` David Oostdyk
2013-05-16 15:35 ` David Oostdyk
2013-05-16 22:56 ` Dave Chinner [this message]
2013-05-16 22:56 ` Dave Chinner
2013-05-17 11:56 ` Stan Hoeppner
2013-05-17 11:56 ` Stan Hoeppner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130516225656.GG24635@dastard \
--to=david@fromorbit.com \
--cc=daveo@ll.mit.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=stan@hardwarefreak.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.