From: Stan Hoeppner <stan@hardwarefreak.com>
To: Pieter De Wit <pieter@insync.za.net>,
linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Is partition alignment needed for RAID partitions ?
Date: Mon, 30 Dec 2013 11:10:08 -0600 [thread overview]
Message-ID: <52C1A8F0.2030208@hardwarefreak.com> (raw)
In-Reply-To: <52C162A7.1080309@insync.za.net>
On 12/30/2013 6:10 AM, Pieter De Wit wrote:
> Hi Stan,
>> Size is incorrect in what way? If your RAID0 chunk is 512KiB, then
>> 3407028224 sectors is 3327176 chunks, evenly divisible, so this
>> partition is fully aligned. Whether the capacity is correct is
>> something only you can determine. Partition 2 is 1.587 TiB.
> Would you mind showing me the calc you did to get there,
> 3407028224/3327176=1024,
(3407028224 sectors * 512 bytes per sector) / 524288 (chunk bytes) =
3327176 chunks
> I don't understand how the 512kiB came into play ?
> # mdadm --detail /dev/md1
...
> Chunk Size : 512K
One kilobyte (K,KB) is 2^10, or 1024 bytes. 512*1024 = 524288 bytes
>> I'm not intending to be jerk, but this is a technical mailing list.
> Understood - here is the complete layout:
>
> /dev/sda - 250 gig disk
> /dev/sdb - 2TB disk
> /dev/sdc - 2TB disk
> /dev/sdd - 256gig iSCSI target on QNAP NAS (block allocated, not thin
> prov'ed)
> /dev/sde - 2TB iSCSI target on QNAP NAS (block allocated, not thin prov'ed)
>> Show your partition table for sdc. Even if the partitions on it are not
>> aligned, reads shouldn't be adversely affected by it. Show
>>
>> $ mdadm --detail
> # parted /dev/sdb unit s print
> Model: ATA WDC WD20EARX-008 (scsi)
> Disk /dev/sdb: 3907029168s
> Sector size (logical/physical): 512B/4096B
> Partition Table: gpt
>
> Number Start End Size File system Name Flags
> 1 2048s 500000767s 499998720s raid
> 2 500000768s 3907028991s 3407028224s raid
>
> # parted /dev/sdc unit s print
> Model: ATA WDC WD20EARX-008 (scsi)
> Disk /dev/sdc: 3907029168s
> Sector size (logical/physical): 512B/4096B
> Partition Table: gpt
>
> Number Start End Size File system Name Flags
> 1 2048s 500000767s 499998720s raid
> 2 500000768s 3907028991s 3407028224s raid
These partitions are all aligned and the same sizes. No problems here.
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Mon Dec 30 12:33:43 2013
> Raid Level : raid1
> Array Size : 249868096 (238.29 GiB 255.86 GB)
> Used Dev Size : 249868096 (238.29 GiB 255.86 GB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Tue Dec 31 01:01:42 2013
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Name : srv01:0 (local to host srv01)
> UUID : 45d71ef8:9a1115cb:8ed0c4d9:95d56df4
> Events : 25
>
> Number Major Minor RaidDevice State
> 0 8 17 0 active sync /dev/sdb1
> 1 8 33 1 active sync /dev/sdc1
>
> # mdadm --detail /dev/md1
> /dev/md1:
> Version : 1.2
> Creation Time : Mon Dec 30 12:33:56 2013
> Raid Level : raid0
> Array Size : 3407027200 (3249.19 GiB 3488.80 GB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Mon Dec 30 12:33:56 2013
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Chunk Size : 512K
>
> Name : srv01:1 (local to host srv01)
> UUID : abfdcb5e:804fa119:9c4a8d88:fa2f08a7
> Events : 0
>
> Number Major Minor RaidDevice State
> 0 8 18 0 active sync /dev/sdb2
> 1 8 34 1 active sync /dev/sdc2
>
>>
>> for the RAID0 array. md itself, especially in RAID0 personality, is
>> simply not going to be the -cause- of low performance. The problem lay
>> somewhere else. Given the track record of Western Digital's Green
>> series of drives I'm leaning toward that cause. Post output from
>>
>> $ smartctl -A /dev/sdb
>> $ smartctl -A /dev/sdc
> # smartctl -A /dev/sdb
> smartctl 6.2 2013-04-20 r3812 [i686-linux-3.11.0-14-generic] (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 217 186 021 Pre-fail
> Always - 4141
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 102
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 089 089 000 Old_age
> Always - 8263
> 10 Spin_Retry_Count 0x0032 100 100 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 102
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
> Always - 88
> 193 Load_Cycle_Count 0x0032 155 155 000 Old_age
> Always - 135985
> 194 Temperature_Celsius 0x0022 121 108 000 Old_age
> Always - 29
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
> Always - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
> Always - 0
> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
> Always - 0
> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
> Offline - 0
>
> # smartctl -A /dev/sdc
> smartctl 6.2 2013-04-20 r3812 [i686-linux-3.11.0-14-generic] (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 217 186 021 Pre-fail
> Always - 4141
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 100
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 089 089 000 Old_age
> Always - 8263
> 10 Spin_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 100
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
> Always - 86
> 193 Load_Cycle_Count 0x0032 156 156 000 Old_age
> Always - 134976
> 194 Temperature_Celsius 0x0022 122 109 000 Old_age
> Always - 28
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
> Always - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
> Always - 0
> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
> Always - 0
> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
> Offline - 0
smartctl data indicates there are no problems with the drives.
>>>>> I would have expected the RAID0 device to easily get
>>>>> up to the 60meg/sec mark ?
>>>> As the source disk of a bulk file copy over NFS/CIFS? As a point of
>>>> reference, I have a workstation that maxes 50MB/s FTP and only 24MB/s
>>>> CIFS to/from a server. Both hosts have far in excess of 100MB/s disk
>>>> throughput. The 50MB/s limitation is due to the cheap Realtek mobo
>>>> NIC,
>>>> and the 24MB/s is a Samba limit. I've spent dozens of hours attempting
>>>> to tweak Samba to greater throughput but it simply isn't capable on
>>>> that
>>>> machine.
>>>>
>>>> Your throughput issues are with your network, not your RAID. Learn and
>>>> use FIO to see what your RAID/disks can do. For now a really simple
>>>> test is to time cat of a large file and pipe to /dev/null. Divide the
>>>> file size by the elapsed time. Or simply do a large read with dd.
>>>> This
>>>> will be much more informative than "moving data to a NAS", where your
>>>> throughput is network limited, not disk.
>>>>
>>> The system is using a server grade NIC, I will run a dd/network test
>>> shortly after the copy is done. (I am shifting all the data back to the
>>> NAS, incase I mucked up the partitions :) ), I do recall that this
>>> system was able to fill a gig pipe...
>> Now that you've made it clear the first scenario was over iSCSI same as
>> the 2nd scenario, and not NFS/CIFS, I doubt the TCP stack is the
>> problem. Assume the network is fine for now and concentrate on the disk
>> drives in the host. That's seems the most likely cause of the problem
>> at this point.
>>
>> BTW, you didn't state the throughput of the RAID1 device on sdb/sdc.
>> The RAID0 device is on the same disks, yes? RAID0 was 15 MB/s. What
>> was the RAID1?
>>
> ATM, the data is still moving back to the NAS (from the RAID1 device).
> According to iostat, this is reading at +30000 kB/s (all of my numbers
> are from iostat -x)
Please show the exact iostat command line you are using and the output.
> Also, there is no other disk usage in the system. All the data is
> currently on the NAS (except system "stuff" for a quite firewall)
>
> I just spotted another thing, the two drives are on the same SATA
> controller, from rescan-scsi-bus:
>
> Scanning for device 3 0 0 0 ...
> OLD: Host: scsi3 Channel: 00 Id: 00 Lun: 00
> Vendor: ATA Model: WDC WD20EARX-008 Rev: 51.0
> Type: Direct-Access ANSI SCSI revision: 05
> Scanning for device 3 0 1 0 ...
> OLD: Host: scsi3 Channel: 00 Id: 01 Lun: 00
> Vendor: ATA Model: WDC WD20EARX-008 Rev: 51.0
> Type: Direct-Access ANSI SCSI revision: 05
>
> Would it be better to move these apart ? I remember IDE used to have
> this issue, but I also recall SATA "fixed" that.
This isn't the problem. Even if both drives were connected via a plain
old 33MHz 132MB/s PCI SATA card you'd still be capable of 120MB/s
throughput, 60MB/s per drive.
> Thanks again,
You're welcome. Eventually you get to the bottom of this.
--
Stan
next prev parent reply other threads:[~2013-12-30 17:10 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-29 21:04 Is partition alignment needed for RAID partitions ? Pieter De Wit
2013-12-30 6:56 ` Stan Hoeppner
2013-12-30 8:32 ` Pieter De Wit
2013-12-30 10:49 ` Stan Hoeppner
2013-12-30 12:10 ` Pieter De Wit
2013-12-30 17:10 ` Stan Hoeppner [this message]
2013-12-30 18:32 ` Pieter De Wit
2013-12-31 14:21 ` Stan Hoeppner
2013-12-31 1:05 ` Pieter De Wit
2013-12-31 14:38 ` Stan Hoeppner
2014-01-02 19:49 ` Phillip Susi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52C1A8F0.2030208@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=linux-raid@vger.kernel.org \
--cc=pieter@insync.za.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).