* recomended bcache setup
@ 2012-12-22 3:06 Thomas Fjellstrom
[not found] ` <201212212006.41906.thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Fjellstrom @ 2012-12-22 3:06 UTC (permalink / raw)
To: linux-bcache-u79uwXL29TY76Z2rM5mHXA
I'm setting up a little home NAS here, and I've been thinking about using
bcache to speed up the random access bits on the "big" raid6 array (7x2TB).
How does one get started using bcache (custom patched kernel?), and what is
the recommended setup for use with mdraid? I remember reading ages ago that it
was recommended that each component device was attached directly to the cache,
and then mdraid put on top, but a quick google suggests putting the cache on
top of the raid instead.
Also, is it possible to add a cache to an existing volume yet? I have a
smaller array (7x1TB) that I wouldn't mind adding the cache layer to.
--
Thomas Fjellstrom
thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: recomended bcache setup
[not found] ` <201212212006.41906.thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org>
@ 2012-12-24 8:20 ` Thomas Fjellstrom
[not found] ` <201212240120.34879.thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Fjellstrom @ 2012-12-24 8:20 UTC (permalink / raw)
To: linux-bcache-u79uwXL29TY76Z2rM5mHXA
On Fri Dec 21, 2012, Thomas Fjellstrom wrote:
> I'm setting up a little home NAS here, and I've been thinking about using
> bcache to speed up the random access bits on the "big" raid6 array (7x2TB).
>
> How does one get started using bcache (custom patched kernel?), and what is
> the recommended setup for use with mdraid? I remember reading ages ago that
> it was recommended that each component device was attached directly to the
> cache, and then mdraid put on top, but a quick google suggests putting the
> cache on top of the raid instead.
>
> Also, is it possible to add a cache to an existing volume yet? I have a
> smaller array (7x1TB) that I wouldn't mind adding the cache layer to.
I just tried a basic setup with the cache ontop of the raid6. I ran a quick
iozone test with the default debian sid (3.2.35) kernel, the bcache (3.2.28)
kernel without bcache enabled, and with bcache enabled (See below).
Here's a little information:
system info:
Intel S1200KP Motherboard
Intel Core i3 2120 CPU
16GB DDR3 1333 ECC
IBM M1015 in IT mode
7 x 2TB Seagate Barracuda HDDs
1 x 240 GB Samsung 470 SSD
kernel: fresh git checkout of the bcache repo, 3.2.28
Raid Info:
/dev/md0:
Version : 1.2
Creation Time : Sat Dec 22 03:38:05 2012
Raid Level : raid6
Array Size : 9766914560 (9314.46 GiB 10001.32 GB)
Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
Raid Devices : 7
Total Devices : 7
Persistence : Superblock is persistent
Update Time : Mon Dec 24 00:22:28 2012
State : clean
Active Devices : 7
Working Devices : 7
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : mrbig:0 (local to host mrbig)
UUID : 547c30d1:3af4b2ec:14712d0b:88e4337a
Events : 10591
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
2 8 32 2 active sync /dev/sdc
3 8 48 3 active sync /dev/sdd
4 8 80 4 active sync /dev/sdf
5 8 96 5 active sync /dev/sdg
6 8 112 6 active sync /dev/sdh
Fs info:
root@mrbig:~/build/bcache-tools# xfs_info /dev/bcache0
meta-data=/dev/bcache0 isize=256 agcount=10, agsize=268435328 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=2441728638, imaxpct=5
= sunit=128 swidth=640 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
iozone -a -s 32G -r 8M
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
w/o cache (debian kernel 3.2.35-1)
33554432 8192 212507 210382 630327 630852 372807 161710 388319 4922757 617347 210642 217122 717279 716150
w/ cache (bcache git kernel 3.2.28):
33554432 8192 248376 231717 268560 269966 123718 132210 148030 4888983 152240 230099 238223 276254 282441
w/o cache (bcache git kernel 3.2.28):
33554432 8192 277607 259159 709837 702192 399889 151629 399779 4846688 655210 251297 245953 783930 778595
Note: I disabled the cache before the last test, unregistered the device and
"stop"ed the cache. I also changed the config slightly for the bcache kernel,
I started out with the debian config, and then switched the preemption option
to server, which may be the reason for the performance difference between the
two non cached tests.
I probably messed up the setup somehow. If anyone has some tips or suggestions
I'd appreciate some input.
--
Thomas Fjellstrom
thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: recomended bcache setup
[not found] ` <201212240120.34879.thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org>
@ 2012-12-26 18:47 ` Kent Overstreet
[not found] ` <20121226184726.GB20185-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Kent Overstreet @ 2012-12-26 18:47 UTC (permalink / raw)
To: Thomas Fjellstrom; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA
On Mon, Dec 24, 2012 at 01:20:34AM -0700, Thomas Fjellstrom wrote:
> On Fri Dec 21, 2012, Thomas Fjellstrom wrote:
> > I'm setting up a little home NAS here, and I've been thinking about using
> > bcache to speed up the random access bits on the "big" raid6 array (7x2TB).
> >
> > How does one get started using bcache (custom patched kernel?), and what is
> > the recommended setup for use with mdraid? I remember reading ages ago that
> > it was recommended that each component device was attached directly to the
> > cache, and then mdraid put on top, but a quick google suggests putting the
> > cache on top of the raid instead.
> >
> > Also, is it possible to add a cache to an existing volume yet? I have a
> > smaller array (7x1TB) that I wouldn't mind adding the cache layer to.
>
> I just tried a basic setup with the cache ontop of the raid6. I ran a quick
> iozone test with the default debian sid (3.2.35) kernel, the bcache (3.2.28)
> kernel without bcache enabled, and with bcache enabled (See below).
>
> Here's a little information:
>
> system info:
> Intel S1200KP Motherboard
> Intel Core i3 2120 CPU
> 16GB DDR3 1333 ECC
> IBM M1015 in IT mode
> 7 x 2TB Seagate Barracuda HDDs
> 1 x 240 GB Samsung 470 SSD
>
>
> kernel: fresh git checkout of the bcache repo, 3.2.28
>
>
>
> Raid Info:
> /dev/md0:
> Version : 1.2
> Creation Time : Sat Dec 22 03:38:05 2012
> Raid Level : raid6
> Array Size : 9766914560 (9314.46 GiB 10001.32 GB)
> Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
> Raid Devices : 7
> Total Devices : 7
> Persistence : Superblock is persistent
>
> Update Time : Mon Dec 24 00:22:28 2012
> State : clean
> Active Devices : 7
> Working Devices : 7
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Name : mrbig:0 (local to host mrbig)
> UUID : 547c30d1:3af4b2ec:14712d0b:88e4337a
> Events : 10591
>
> Number Major Minor RaidDevice State
> 0 8 0 0 active sync /dev/sda
> 1 8 16 1 active sync /dev/sdb
> 2 8 32 2 active sync /dev/sdc
> 3 8 48 3 active sync /dev/sdd
> 4 8 80 4 active sync /dev/sdf
> 5 8 96 5 active sync /dev/sdg
> 6 8 112 6 active sync /dev/sdh
>
>
>
>
> Fs info:
> root@mrbig:~/build/bcache-tools# xfs_info /dev/bcache0
> meta-data=/dev/bcache0 isize=256 agcount=10, agsize=268435328 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=2441728638, imaxpct=5
> = sunit=128 swidth=640 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal bsize=4096 blocks=521728, version=2
> = sectsz=512 sunit=8 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
>
>
>
> iozone -a -s 32G -r 8M
> random random bkwd record stride
> KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
> w/o cache (debian kernel 3.2.35-1)
> 33554432 8192 212507 210382 630327 630852 372807 161710 388319 4922757 617347 210642 217122 717279 716150
> w/ cache (bcache git kernel 3.2.28):
> 33554432 8192 248376 231717 268560 269966 123718 132210 148030 4888983 152240 230099 238223 276254 282441
> w/o cache (bcache git kernel 3.2.28):
> 33554432 8192 277607 259159 709837 702192 399889 151629 399779 4846688 655210 251297 245953 783930 778595
>
> Note: I disabled the cache before the last test, unregistered the device and
> "stop"ed the cache. I also changed the config slightly for the bcache kernel,
> I started out with the debian config, and then switched the preemption option
> to server, which may be the reason for the performance difference between the
> two non cached tests.
>
> I probably messed up the setup somehow. If anyone has some tips or suggestions
> I'd appreciate some input.
So you probably didn't put bcache in writeback mode, which would explain
the write numbers being slightly worse.
Something I noticed myself with bcache on top of a raid6 is that in
writeback mode sequential write throughput was significantly worse - due
to the ssd not having as much write bandwidth as the raid6 and bcache's
writeback having no knowledge of the stripe layout.
This is something I'd like to fix, if I ever get time. Normal operation
(i.e. with mostly random writes) was vastly improved, though.
Not sure why your read numbers are worse, though - I haven't used iozone
myself so I'm not sure what exactly it's doing.
It'd useful to know what iozone's reads look like - how many in flight
at a time, how big they are, etc.
I suppose it'd be informative to have a benchmark where bcache is
enabled but all the reads are cache misses, and bcache isn't writing any
of the cache misses to the cache. I think I'd need to add another cache
mode for that, though (call it "readaround" I suppose).
I wouldn't worry _too_ much about iozone's numbers, I suspect whatever
it's doing differently to get such bad read numbers isn't terribly
representative. I'd benchmark whatever you're using the server for, if
you can. Still be good to know what's going on, there's certainly
something that ought to be fixed.
Oh, one thing that comes to mind - there's an issue with pure read
workloads in the current stable branch, where inserting data from a
cache miss will fail to update the index if the btree node is full (but
after the data has been written to the cache). This shows up in
benchmarks, because they tend to test reads and writes separately, but
it's not an issue in any real world workload I know of because any
amount of write traffic keeps it from showing up, as the btree nodes
will split when necessary on writes.
I have a fix for this in the dev branch, and I think it's stable but the
dev branch needs more testing.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: recomended bcache setup
[not found] ` <20121226184726.GB20185-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-12-26 19:01 ` Thomas Fjellstrom
0 siblings, 0 replies; 4+ messages in thread
From: Thomas Fjellstrom @ 2012-12-26 19:01 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA
On Wed Dec 26, 2012, you wrote:
> On Mon, Dec 24, 2012 at 01:20:34AM -0700, Thomas Fjellstrom wrote:
> > On Fri Dec 21, 2012, Thomas Fjellstrom wrote:
> > > I'm setting up a little home NAS here, and I've been thinking about
> > > using bcache to speed up the random access bits on the "big" raid6
> > > array (7x2TB).
> > >
> > > How does one get started using bcache (custom patched kernel?), and
> > > what is the recommended setup for use with mdraid? I remember reading
> > > ages ago that it was recommended that each component device was
> > > attached directly to the cache, and then mdraid put on top, but a
> > > quick google suggests putting the cache on top of the raid instead.
> > >
> > > Also, is it possible to add a cache to an existing volume yet? I have a
> > > smaller array (7x1TB) that I wouldn't mind adding the cache layer to.
> >
> > I just tried a basic setup with the cache ontop of the raid6. I ran a
> > quick iozone test with the default debian sid (3.2.35) kernel, the
> > bcache (3.2.28) kernel without bcache enabled, and with bcache enabled
> > (See below).
> >
> > Here's a little information:
> >
> > system info:
> > Intel S1200KP Motherboard
> > Intel Core i3 2120 CPU
> > 16GB DDR3 1333 ECC
> > IBM M1015 in IT mode
> > 7 x 2TB Seagate Barracuda HDDs
> > 1 x 240 GB Samsung 470 SSD
> >
> > kernel: fresh git checkout of the bcache repo, 3.2.28
> >
> >
> >
> > Raid Info:
> >
> > /dev/md0:
> > Version : 1.2
> >
> > Creation Time : Sat Dec 22 03:38:05 2012
> >
> > Raid Level : raid6
> > Array Size : 9766914560 (9314.46 GiB 10001.32 GB)
> >
> > Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
> >
> > Raid Devices : 7
> >
> > Total Devices : 7
> >
> > Persistence : Superblock is persistent
> >
> > Update Time : Mon Dec 24 00:22:28 2012
> >
> > State : clean
> >
> > Active Devices : 7
> >
> > Working Devices : 7
> >
> > Failed Devices : 0
> >
> > Spare Devices : 0
> >
> > Layout : left-symmetric
> >
> > Chunk Size : 512K
> >
> > Name : mrbig:0 (local to host mrbig)
> > UUID : 547c30d1:3af4b2ec:14712d0b:88e4337a
> >
> > Events : 10591
> >
> > Number Major Minor RaidDevice State
> >
> > 0 8 0 0 active sync /dev/sda
> > 1 8 16 1 active sync /dev/sdb
> > 2 8 32 2 active sync /dev/sdc
> > 3 8 48 3 active sync /dev/sdd
> > 4 8 80 4 active sync /dev/sdf
> > 5 8 96 5 active sync /dev/sdg
> > 6 8 112 6 active sync /dev/sdh
> >
> > Fs info:
> > root@mrbig:~/build/bcache-tools# xfs_info /dev/bcache0
> > meta-data=/dev/bcache0 isize=256 agcount=10, agsize=268435328 blks
> >
> > = sectsz=512 attr=2
> >
> > data = bsize=4096 blocks=2441728638, imaxpct=5
> >
> > = sunit=128 swidth=640 blks
> >
> > naming =version 2 bsize=4096 ascii-ci=0
> > log =internal bsize=4096 blocks=521728, version=2
> >
> > = sectsz=512 sunit=8 blks, lazy-count=1
> >
> > realtime =none extsz=4096 blocks=0, rtextents=0
> >
> >
> >
> >
> > iozone -a -s 32G -r 8M
> >
> > random random
> > bkwd record
> > stride
> >
> > KB reclen write rewrite read reread read write
> > read rewrite read fwrite frewrite fread freread
> >
> > w/o cache (debian kernel 3.2.35-1)
> > 33554432 8192 212507 210382 630327 630852 372807 161710
> > 388319 4922757 617347 210642 217122 717279 716150 w/ cache
> > (bcache git kernel 3.2.28):
> > 33554432 8192 248376 231717 268560 269966 123718 132210
> > 148030 4888983 152240 230099 238223 276254 282441 w/o cache
> > (bcache git kernel 3.2.28):
> > 33554432 8192 277607 259159 709837 702192 399889 151629
> > 399779 4846688 655210 251297 245953 783930 778595
> >
> > Note: I disabled the cache before the last test, unregistered the device
> > and "stop"ed the cache. I also changed the config slightly for the
> > bcache kernel, I started out with the debian config, and then switched
> > the preemption option to server, which may be the reason for the
> > performance difference between the two non cached tests.
> >
> > I probably messed up the setup somehow. If anyone has some tips or
> > suggestions I'd appreciate some input.
>
> So you probably didn't put bcache in writeback mode, which would explain
> the write numbers being slightly worse.
Yeah, I didn't test in writeback.
> Something I noticed myself with bcache on top of a raid6 is that in
> writeback mode sequential write throughput was significantly worse - due
> to the ssd not having as much write bandwidth as the raid6 and bcache's
> writeback having no knowledge of the stripe layout.
Yeah, I think the SSD I'm using is limited to about 200MB/s sequential write,
which is probably half of what the raid may be capable of. I could pick up
another more modern SATA III SSD to remedy that, and I might. But It isn't too
terribly important. It's random writes I really want to take care of, since
those really kill performance.
> This is something I'd like to fix, if I ever get time. Normal operation
> (i.e. with mostly random writes) was vastly improved, though.
>
> Not sure why your read numbers are worse, though - I haven't used iozone
> myself so I'm not sure what exactly it's doing.
>
> It'd useful to know what iozone's reads look like - how many in flight
> at a time, how big they are, etc.
>
> I suppose it'd be informative to have a benchmark where bcache is
> enabled but all the reads are cache misses, and bcache isn't writing any
> of the cache misses to the cache. I think I'd need to add another cache
> mode for that, though (call it "readaround" I suppose).
>
> I wouldn't worry _too_ much about iozone's numbers, I suspect whatever
> it's doing differently to get such bad read numbers isn't terribly
> representative. I'd benchmark whatever you're using the server for, if
> you can. Still be good to know what's going on, there's certainly
> something that ought to be fixed.
I'm betting most of it is pure un-cached reads. I don't know though if it
keeps the same file around between tests or not. If it does, then things
should improve significantly after the random write test I should think. But
it doesn't really. I wasn't too concerned about the initial read tests, but
I'd have thought it'd at least match or come in slightly under the actual
numbers without the cache. I remember reading before that bcache has been
shown to at most add a very slight amount of overhead, if any at all.
> Oh, one thing that comes to mind - there's an issue with pure read
> workloads in the current stable branch, where inserting data from a
> cache miss will fail to update the index if the btree node is full (but
> after the data has been written to the cache). This shows up in
> benchmarks, because they tend to test reads and writes separately, but
> it's not an issue in any real world workload I know of because any
> amount of write traffic keeps it from showing up, as the btree nodes
> will split when necessary on writes.
>
> I have a fix for this in the dev branch, and I think it's stable but the
> dev branch needs more testing.
Ah ok, I'll have to test that. I remember reading about that on the list here.
I'll test that and get back to you
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thomas Fjellstrom
thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-12-26 19:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-22 3:06 recomended bcache setup Thomas Fjellstrom
[not found] ` <201212212006.41906.thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org>
2012-12-24 8:20 ` Thomas Fjellstrom
[not found] ` <201212240120.34879.thomas-T/OQ2aoscs6U4IzBdx3r/Q@public.gmane.org>
2012-12-26 18:47 ` Kent Overstreet
[not found] ` <20121226184726.GB20185-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-12-26 19:01 ` Thomas Fjellstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).