From: Stan Hoeppner <stan@hardwarefreak.com>
To: Pieter De Wit <pieter@insync.za.net>,
linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Is partition alignment needed for RAID partitions ?
Date: Mon, 30 Dec 2013 11:10:08 -0600 [thread overview]
Message-ID: <52C1A8F0.2030208@hardwarefreak.com> (raw)
In-Reply-To: <52C162A7.1080309@insync.za.net>
On 12/30/2013 6:10 AM, Pieter De Wit wrote:
> Hi Stan,
>> Size is incorrect in what way? If your RAID0 chunk is 512KiB, then
>> 3407028224 sectors is 3327176 chunks, evenly divisible, so this
>> partition is fully aligned. Whether the capacity is correct is
>> something only you can determine. Partition 2 is 1.587 TiB.
> Would you mind showing me the calc you did to get there,
> 3407028224/3327176=1024,
(3407028224 sectors * 512 bytes per sector) / 524288 (chunk bytes) =
3327176 chunks
> I don't understand how the 512kiB came into play ?
> # mdadm --detail /dev/md1
...
> Chunk Size : 512K
One kilobyte (K,KB) is 2^10, or 1024 bytes. 512*1024 = 524288 bytes
>> I'm not intending to be jerk, but this is a technical mailing list.
> Understood - here is the complete layout:
>
> /dev/sda - 250 gig disk
> /dev/sdb - 2TB disk
> /dev/sdc - 2TB disk
> /dev/sdd - 256gig iSCSI target on QNAP NAS (block allocated, not thin
> prov'ed)
> /dev/sde - 2TB iSCSI target on QNAP NAS (block allocated, not thin prov'ed)
>> Show your partition table for sdc. Even if the partitions on it are not
>> aligned, reads shouldn't be adversely affected by it. Show
>>
>> $ mdadm --detail
> # parted /dev/sdb unit s print
> Model: ATA WDC WD20EARX-008 (scsi)
> Disk /dev/sdb: 3907029168s
> Sector size (logical/physical): 512B/4096B
> Partition Table: gpt
>
> Number Start End Size File system Name Flags
> 1 2048s 500000767s 499998720s raid
> 2 500000768s 3907028991s 3407028224s raid
>
> # parted /dev/sdc unit s print
> Model: ATA WDC WD20EARX-008 (scsi)
> Disk /dev/sdc: 3907029168s
> Sector size (logical/physical): 512B/4096B
> Partition Table: gpt
>
> Number Start End Size File system Name Flags
> 1 2048s 500000767s 499998720s raid
> 2 500000768s 3907028991s 3407028224s raid
These partitions are all aligned and the same sizes. No problems here.
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Mon Dec 30 12:33:43 2013
> Raid Level : raid1
> Array Size : 249868096 (238.29 GiB 255.86 GB)
> Used Dev Size : 249868096 (238.29 GiB 255.86 GB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Tue Dec 31 01:01:42 2013
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Name : srv01:0 (local to host srv01)
> UUID : 45d71ef8:9a1115cb:8ed0c4d9:95d56df4
> Events : 25
>
> Number Major Minor RaidDevice State
> 0 8 17 0 active sync /dev/sdb1
> 1 8 33 1 active sync /dev/sdc1
>
> # mdadm --detail /dev/md1
> /dev/md1:
> Version : 1.2
> Creation Time : Mon Dec 30 12:33:56 2013
> Raid Level : raid0
> Array Size : 3407027200 (3249.19 GiB 3488.80 GB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Mon Dec 30 12:33:56 2013
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Chunk Size : 512K
>
> Name : srv01:1 (local to host srv01)
> UUID : abfdcb5e:804fa119:9c4a8d88:fa2f08a7
> Events : 0
>
> Number Major Minor RaidDevice State
> 0 8 18 0 active sync /dev/sdb2
> 1 8 34 1 active sync /dev/sdc2
>
>>
>> for the RAID0 array. md itself, especially in RAID0 personality, is
>> simply not going to be the -cause- of low performance. The problem lay
>> somewhere else. Given the track record of Western Digital's Green
>> series of drives I'm leaning toward that cause. Post output from
>>
>> $ smartctl -A /dev/sdb
>> $ smartctl -A /dev/sdc
> # smartctl -A /dev/sdb
> smartctl 6.2 2013-04-20 r3812 [i686-linux-3.11.0-14-generic] (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 217 186 021 Pre-fail
> Always - 4141
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 102
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 089 089 000 Old_age
> Always - 8263
> 10 Spin_Retry_Count 0x0032 100 100 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 102
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
> Always - 88
> 193 Load_Cycle_Count 0x0032 155 155 000 Old_age
> Always - 135985
> 194 Temperature_Celsius 0x0022 121 108 000 Old_age
> Always - 29
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
> Always - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
> Always - 0
> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
> Always - 0
> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
> Offline - 0
>
> # smartctl -A /dev/sdc
> smartctl 6.2 2013-04-20 r3812 [i686-linux-3.11.0-14-generic] (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 217 186 021 Pre-fail
> Always - 4141
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 100
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 089 089 000 Old_age
> Always - 8263
> 10 Spin_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 100
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
> Always - 86
> 193 Load_Cycle_Count 0x0032 156 156 000 Old_age
> Always - 134976
> 194 Temperature_Celsius 0x0022 122 109 000 Old_age
> Always - 28
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
> Always - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
> Always - 0
> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
> Always - 0
> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
> Offline - 0
smartctl data indicates there are no problems with the drives.
>>>>> I would have expected the RAID0 device to easily get
>>>>> up to the 60meg/sec mark ?
>>>> As the source disk of a bulk file copy over NFS/CIFS? As a point of
>>>> reference, I have a workstation that maxes 50MB/s FTP and only 24MB/s
>>>> CIFS to/from a server. Both hosts have far in excess of 100MB/s disk
>>>> throughput. The 50MB/s limitation is due to the cheap Realtek mobo
>>>> NIC,
>>>> and the 24MB/s is a Samba limit. I've spent dozens of hours attempting
>>>> to tweak Samba to greater throughput but it simply isn't capable on
>>>> that
>>>> machine.
>>>>
>>>> Your throughput issues are with your network, not your RAID. Learn and
>>>> use FIO to see what your RAID/disks can do. For now a really simple
>>>> test is to time cat of a large file and pipe to /dev/null. Divide the
>>>> file size by the elapsed time. Or simply do a large read with dd.
>>>> This
>>>> will be much more informative than "moving data to a NAS", where your
>>>> throughput is network limited, not disk.
>>>>
>>> The system is using a server grade NIC, I will run a dd/network test
>>> shortly after the copy is done. (I am shifting all the data back to the
>>> NAS, incase I mucked up the partitions :) ), I do recall that this
>>> system was able to fill a gig pipe...
>> Now that you've made it clear the first scenario was over iSCSI same as
>> the 2nd scenario, and not NFS/CIFS, I doubt the TCP stack is the
>> problem. Assume the network is fine for now and concentrate on the disk
>> drives in the host. That's seems the most likely cause of the problem
>> at this point.
>>
>> BTW, you didn't state the throughput of the RAID1 device on sdb/sdc.
>> The RAID0 device is on the same disks, yes? RAID0 was 15 MB/s. What
>> was the RAID1?
>>
> ATM, the data is still moving back to the NAS (from the RAID1 device).
> According to iostat, this is reading at +30000 kB/s (all of my numbers
> are from iostat -x)
Please show the exact iostat command line you are using and the output.
> Also, there is no other disk usage in the system. All the data is
> currently on the NAS (except system "stuff" for a quite firewall)
>
> I just spotted another thing, the two drives are on the same SATA
> controller, from rescan-scsi-bus:
>
> Scanning for device 3 0 0 0 ...
> OLD: Host: scsi3 Channel: 00 Id: 00 Lun: 00
> Vendor: ATA Model: WDC WD20EARX-008 Rev: 51.0
> Type: Direct-Access ANSI SCSI revision: 05
> Scanning for device 3 0 1 0 ...
> OLD: Host: scsi3 Channel: 00 Id: 01 Lun: 00
> Vendor: ATA Model: WDC WD20EARX-008 Rev: 51.0
> Type: Direct-Access ANSI SCSI revision: 05
>
> Would it be better to move these apart ? I remember IDE used to have
> this issue, but I also recall SATA "fixed" that.
This isn't the problem. Even if both drives were connected via a plain
old 33MHz 132MB/s PCI SATA card you'd still be capable of 120MB/s
throughput, 60MB/s per drive.
> Thanks again,
You're welcome. Eventually you get to the bottom of this.
--
Stan
next prev parent reply other threads:[~2013-12-30 17:10 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-29 21:04 Is partition alignment needed for RAID partitions ? Pieter De Wit
2013-12-30 6:56 ` Stan Hoeppner
2013-12-30 8:32 ` Pieter De Wit
2013-12-30 10:49 ` Stan Hoeppner
2013-12-30 12:10 ` Pieter De Wit
2013-12-30 17:10 ` Stan Hoeppner [this message]
2013-12-30 18:32 ` Pieter De Wit
2013-12-31 14:21 ` Stan Hoeppner
2013-12-31 1:05 ` Pieter De Wit
2013-12-31 14:38 ` Stan Hoeppner
2014-01-02 19:49 ` Phillip Susi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52C1A8F0.2030208@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=linux-raid@vger.kernel.org \
--cc=pieter@insync.za.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.