From: Christoph Anton Mitterer <calestyo@scientia.net>
To: linux-raid@vger.kernel.org
Subject: unexpected speed difference at RAID initialisation
Date: Sat, 09 May 2015 00:47:19 +0200 [thread overview]
Message-ID: <1431125239.6206.11.camel@scientia.net> (raw)
[-- Attachment #1: Type: text/plain, Size: 10451 bytes --]
Hey.
I'm just deploying some new servers at the faculty, where I made the
following strange observation, which I cannot explain.
All nodes have exactly the same hardware (all recent stuff, some 15k€
Dell servers with 16 disks á 6 TB, plenty of CPU), the same
BIOS/firmware config, the same OS (Debian jessie, except the kernel
4.0.0 from experimental and btrfs-tools 4.0 from sid) with identical
config.
The discs are connected via some Dell PERC RAID controller but for
testing they're exported as JBODs.
Nothing except some standard daemons (haveged, irqbalance and that like)
are running on these nodes.
I created an MD RAID6 over all disks via:
mdadm --create /dev/md/data-test-raid --verbose --metadata=1.2
--size=max --chunk=512K --level=raid6 --bitmap=internal
--name=data-test-raid --raid-devices=16
--spare-devices=0 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp
basically at the same time (few seconds difference) on both nodes.
But looking at the initial rebuild on two nodes, one can see substantial
speed differences:
node A:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdp[15] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
82045479936 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
[=====>...............] resync = 28.8% (1691996416/5860391424) finish=841.8min speed=82526K/sec
bitmap: 32/44 pages [128KB], 65536KB chunk
unused devices: <none>
node B:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdp[15] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
82045479936 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
[====>................] resync = 20.1% (1180137984/5860391424) finish=1496.2min speed=52132K/sec
bitmap: 36/44 pages [144KB], 65536KB chunk
unused devices: <none>
(again taken with only few seconds in between).
As you can see it already shows different speed (~80000K/s for node A
and ~50000K/s for node B).
dmesg on A gives:
[ 8333.919976] md: bind<sda>
[ 8333.920285] md: bind<sdb>
[ 8333.920553] md: bind<sdc>
[ 8333.920831] md: bind<sdd>
[ 8333.921092] md: bind<sde>
[ 8333.921344] md: bind<sdf>
[ 8333.921593] md: bind<sdg>
[ 8333.921853] md: bind<sdh>
[ 8333.922093] md: bind<sdi>
[ 8333.922377] md: bind<sdj>
[ 8333.922694] md: bind<sdk>
[ 8333.922973] md: bind<sdl>
[ 8333.923235] md: bind<sdm>
[ 8333.923502] md: bind<sdn>
[ 8333.923744] md: bind<sdo>
[ 8333.923941] md: bind<sdp>
[ 8334.002228] raid6: sse2x1 2651 MB/s
[ 8334.070159] raid6: sse2x2 7676 MB/s
[ 8334.138090] raid6: sse2x4 13690 MB/s
[ 8334.206027] raid6: avx2x1 18736 MB/s
[ 8334.273958] raid6: avx2x2 21984 MB/s
[ 8334.341893] raid6: avx2x4 24764 MB/s
[ 8334.341904] raid6: using algorithm avx2x4 (24764 MB/s)
[ 8334.341923] raid6: using avx2x2 recovery algorithm
[ 8334.342962] async_tx: api initialized (async)
[ 8334.343648] xor: automatically using best checksumming function:
[ 8334.381855] avx : 30433.000 MB/sec
[ 8334.386213] md: raid6 personality registered for level 6
[ 8334.386235] md: raid5 personality registered for level 5
[ 8334.386255] md: raid4 personality registered for level 4
[ 8334.428030] md/raid:md127: not clean -- starting background reconstruction
[ 8334.428133] md/raid:md127: device sdp operational as raid disk 15
[ 8334.428157] md/raid:md127: device sdo operational as raid disk 14
[ 8334.428180] md/raid:md127: device sdn operational as raid disk 13
[ 8334.428203] md/raid:md127: device sdm operational as raid disk 12
[ 8334.428226] md/raid:md127: device sdl operational as raid disk 11
[ 8334.428248] md/raid:md127: device sdk operational as raid disk 10
[ 8334.428272] md/raid:md127: device sdj operational as raid disk 9
[ 8334.428294] md/raid:md127: device sdi operational as raid disk 8
[ 8334.428756] md/raid:md127: device sdh operational as raid disk 7
[ 8334.429108] md/raid:md127: device sdg operational as raid disk 6
[ 8334.429450] md/raid:md127: device sdf operational as raid disk 5
[ 8334.429754] md/raid:md127: device sde operational as raid disk 4
[ 8334.430062] md/raid:md127: device sdd operational as raid disk 3
[ 8334.430329] md/raid:md127: device sdc operational as raid disk 2
[ 8334.430595] md/raid:md127: device sdb operational as raid disk 1
[ 8334.430845] md/raid:md127: device sda operational as raid disk 0
[ 8334.432415] md/raid:md127: allocated 0kB
[ 8334.432749] md/raid:md127: raid level 6 active with 16 out of 16 devices, algorithm 2
[ 8334.433021] RAID conf printout:
[ 8334.433022] --- level:6 rd:16 wd:16
[ 8334.433023] disk 0, o:1, dev:sda
[ 8334.433024] disk 1, o:1, dev:sdb
[ 8334.433025] disk 2, o:1, dev:sdc
[ 8334.433026] disk 3, o:1, dev:sdd
[ 8334.433027] disk 4, o:1, dev:sde
[ 8334.433028] disk 5, o:1, dev:sdf
[ 8334.433029] disk 6, o:1, dev:sdg
[ 8334.433030] disk 7, o:1, dev:sdh
[ 8334.433031] disk 8, o:1, dev:sdi
[ 8334.433032] disk 9, o:1, dev:sdj
[ 8334.433032] disk 10, o:1, dev:sdk
[ 8334.433033] disk 11, o:1, dev:sdl
[ 8334.433034] disk 12, o:1, dev:sdm
[ 8334.433035] disk 13, o:1, dev:sdn
[ 8334.433036] disk 14, o:1, dev:sdo
[ 8334.433037] disk 15, o:1, dev:sdp
[ 8334.433257] created bitmap (44 pages) for device md127
[ 8334.437094] md127: bitmap initialized from disk: read 3 pages, set 89423 of 89423 bits
[ 8334.492393] md127: detected capacity change from 0 to 84014571454464
[ 8334.492438] md: md127 switched to read-write mode.
[ 8334.518862] md: resync of RAID array md127
[ 8334.519677] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 8334.520329] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 8334.521357] md: using 128k window, over a total of 5860391424k.
and on B:
[ 8269.297090] md: bind<sda>
[ 8269.297414] md: bind<sdb>
[ 8269.297689] md: bind<sdc>
[ 8269.297957] md: bind<sdd>
[ 8269.298230] md: bind<sde>
[ 8269.298487] md: bind<sdf>
[ 8269.298753] md: bind<sdg>
[ 8269.299006] md: bind<sdh>
[ 8269.299260] md: bind<sdi>
[ 8269.299522] md: bind<sdj>
[ 8269.299778] md: bind<sdk>
[ 8269.300061] md: bind<sdl>
[ 8269.300326] md: bind<sdm>
[ 8269.300543] md: bind<sdn>
[ 8269.300912] md: bind<sdo>
[ 8269.301123] md: bind<sdp>
[ 8269.372959] raid6: sse2x1 2622 MB/s
[ 8269.440892] raid6: sse2x2 4685 MB/s
[ 8269.508815] raid6: sse2x4 9777 MB/s
[ 8269.576747] raid6: avx2x1 16486 MB/s
[ 8269.644680] raid6: avx2x2 21875 MB/s
[ 8269.712615] raid6: avx2x4 25242 MB/s
[ 8269.712632] raid6: using algorithm avx2x4 (25242 MB/s)
[ 8269.712652] raid6: using avx2x2 recovery algorithm
[ 8269.713686] async_tx: api initialized (async)
[ 8269.714400] xor: automatically using best checksumming function:
[ 8269.752579] avx : 30506.000 MB/sec
[ 8269.757308] md: raid6 personality registered for level 6
[ 8269.757339] md: raid5 personality registered for level 5
[ 8269.757365] md: raid4 personality registered for level 4
[ 8269.782660] md/raid:md127: not clean -- starting background reconstruction
[ 8269.782767] md/raid:md127: device sdp operational as raid disk 15
[ 8269.782797] md/raid:md127: device sdo operational as raid disk 14
[ 8269.782827] md/raid:md127: device sdn operational as raid disk 13
[ 8269.782855] md/raid:md127: device sdm operational as raid disk 12
[ 8269.782884] md/raid:md127: device sdl operational as raid disk 11
[ 8269.782912] md/raid:md127: device sdk operational as raid disk 10
[ 8269.782941] md/raid:md127: device sdj operational as raid disk 9
[ 8269.782969] md/raid:md127: device sdi operational as raid disk 8
[ 8269.783539] md/raid:md127: device sdh operational as raid disk 7
[ 8269.783965] md/raid:md127: device sdg operational as raid disk 6
[ 8269.784395] md/raid:md127: device sdf operational as raid disk 5
[ 8269.784776] md/raid:md127: device sde operational as raid disk 4
[ 8269.785106] md/raid:md127: device sdd operational as raid disk 3
[ 8269.785425] md/raid:md127: device sdc operational as raid disk 2
[ 8269.785718] md/raid:md127: device sdb operational as raid disk 1
[ 8269.786012] md/raid:md127: device sda operational as raid disk 0
[ 8269.787672] md/raid:md127: allocated 0kB
[ 8269.788026] md/raid:md127: raid level 6 active with 16 out of 16 devices, algorithm 2
[ 8269.788349] RAID conf printout:
[ 8269.788351] --- level:6 rd:16 wd:16
[ 8269.788352] disk 0, o:1, dev:sda
[ 8269.788354] disk 1, o:1, dev:sdb
[ 8269.788355] disk 2, o:1, dev:sdc
[ 8269.788356] disk 3, o:1, dev:sdd
[ 8269.788357] disk 4, o:1, dev:sde
[ 8269.788358] disk 5, o:1, dev:sdf
[ 8269.788359] disk 6, o:1, dev:sdg
[ 8269.788360] disk 7, o:1, dev:sdh
[ 8269.788361] disk 8, o:1, dev:sdi
[ 8269.788362] disk 9, o:1, dev:sdj
[ 8269.788364] disk 10, o:1, dev:sdk
[ 8269.788365] disk 11, o:1, dev:sdl
[ 8269.788366] disk 12, o:1, dev:sdm
[ 8269.788367] disk 13, o:1, dev:sdn
[ 8269.788368] disk 14, o:1, dev:sdo
[ 8269.788369] disk 15, o:1, dev:sdp
[ 8269.788589] created bitmap (44 pages) for device md127
[ 8269.791496] md127: bitmap initialized from disk: read 3 pages, set 89423 of 89423 bits
[ 8269.807348] md127: detected capacity change from 0 to 84014571454464
[ 8269.807896] md: md127 switched to read-write mode.
[ 8269.808387] md: resync of RAID array md127
[ 8269.808685] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 8269.808980] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 8269.809504] md: using 128k window, over a total of 5860391424k.
There's a bigger difference at:
[ 8334.341904] raid6: using algorithm avx2x4 (24764 MB/s)
vs.
[ 8269.712632] raid6: using algorithm avx2x4 (25242 MB/s)
How are these numbers determined?
Further:
node A:
# ps ax | grep md127
7550 ? S 273:47 [md127_raid6]
7552 ? D 79:43 [md127_resync]
node B:
# ps ax | grep md127
7494 ? R 251:30 [md127_raid6]
7495 ? D 63:48 [md127_resync]
Any ideas where this performance difference could come from?
Thanks,
Chris.
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]
next reply other threads:[~2015-05-08 22:47 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-08 22:47 Christoph Anton Mitterer [this message]
2015-05-13 3:34 ` unexpected speed difference at RAID initialisation NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1431125239.6206.11.camel@scientia.net \
--to=calestyo@scientia.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).