All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pieter De Wit <pieter@insync.za.net>
To: stan@hardwarefreak.com, linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Is partition alignment needed for RAID partitions ?
Date: Tue, 31 Dec 2013 01:10:15 +1300	[thread overview]
Message-ID: <52C162A7.1080309@insync.za.net> (raw)
In-Reply-To: <52C14FB8.8080005@hardwarefreak.com>

Hi Stan,
> Size is incorrect in what way?  If your RAID0 chunk is 512KiB, then
> 3407028224 sectors is 3327176 chunks, evenly divisible, so this
> partition is fully aligned.  Whether the capacity is correct is
> something only you can determine.  Partition 2 is 1.587 TiB.
Would you mind showing me the calc you did to get there, 
3407028224/3327176=1024, I don't understand how the 512kiB came into play ?
> I'm not intending to be jerk, but this is a technical mailing list.
Understood - here is the complete layout:

/dev/sda - 250 gig disk
/dev/sdb - 2TB disk
/dev/sdc - 2TB disk
/dev/sdd - 256gig iSCSI target on QNAP NAS (block allocated, not thin 
prov'ed)
/dev/sde - 2TB iSCSI target on QNAP NAS (block allocated, not thin prov'ed)
> Show your partition table for sdc.  Even if the partitions on it are not
> aligned, reads shouldn't be adversely affected by it.  Show
>
> $ mdadm --detail
# parted /dev/sdb unit s print
Model: ATA WDC WD20EARX-008 (scsi)
Disk /dev/sdb: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start       End          Size         File system  Name Flags
  1      2048s       500000767s   499998720s raid
  2      500000768s  3907028991s  3407028224s raid

# parted /dev/sdc unit s print
Model: ATA WDC WD20EARX-008 (scsi)
Disk /dev/sdc: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start       End          Size         File system  Name Flags
  1      2048s       500000767s   499998720s raid
  2      500000768s  3907028991s  3407028224s raid


# mdadm --detail /dev/md0
/dev/md0:
         Version : 1.2
   Creation Time : Mon Dec 30 12:33:43 2013
      Raid Level : raid1
      Array Size : 249868096 (238.29 GiB 255.86 GB)
   Used Dev Size : 249868096 (238.29 GiB 255.86 GB)
    Raid Devices : 2
   Total Devices : 2
     Persistence : Superblock is persistent

     Update Time : Tue Dec 31 01:01:42 2013
           State : clean
  Active Devices : 2
Working Devices : 2
  Failed Devices : 0
   Spare Devices : 0

            Name : srv01:0  (local to host srv01)
            UUID : 45d71ef8:9a1115cb:8ed0c4d9:95d56df4
          Events : 25

     Number   Major   Minor   RaidDevice State
        0       8       17        0      active sync   /dev/sdb1
        1       8       33        1      active sync   /dev/sdc1

# mdadm --detail /dev/md1
/dev/md1:
         Version : 1.2
   Creation Time : Mon Dec 30 12:33:56 2013
      Raid Level : raid0
      Array Size : 3407027200 (3249.19 GiB 3488.80 GB)
    Raid Devices : 2
   Total Devices : 2
     Persistence : Superblock is persistent

     Update Time : Mon Dec 30 12:33:56 2013
           State : clean
  Active Devices : 2
Working Devices : 2
  Failed Devices : 0
   Spare Devices : 0

      Chunk Size : 512K

            Name : srv01:1  (local to host srv01)
            UUID : abfdcb5e:804fa119:9c4a8d88:fa2f08a7
          Events : 0

     Number   Major   Minor   RaidDevice State
        0       8       18        0      active sync   /dev/sdb2
        1       8       34        1      active sync   /dev/sdc2

>
> for the RAID0 array.  md itself, especially in RAID0 personality, is
> simply not going to be the -cause- of low performance.  The problem lay
> somewhere else.  Given the track record of Western Digital's Green
> series of drives I'm leaning toward that cause.  Post output from
>
> $ smartctl -A /dev/sdb
> $ smartctl -A /dev/sdc
# smartctl -A /dev/sdb
smartctl 6.2 2013-04-20 r3812 [i686-linux-3.11.0-14-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       0
   3 Spin_Up_Time            0x0027   217   186   021    Pre-fail 
Always       -       4141
   4 Start_Stop_Count        0x0032   100   100   000    Old_age 
Always       -       102
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x002e   200   200   000    Old_age 
Always       -       0
   9 Power_On_Hours          0x0032   089   089   000    Old_age 
Always       -       8263
  10 Spin_Retry_Count        0x0032   100   100   000    Old_age 
Always       -       0
  11 Calibration_Retry_Count 0x0032   100   100   000    Old_age 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       102
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age 
Always       -       88
193 Load_Cycle_Count        0x0032   155   155   000    Old_age 
Always       -       135985
194 Temperature_Celsius     0x0022   121   108   000    Old_age 
Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age 
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age 
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
Offline      -       0

# smartctl -A /dev/sdc
smartctl 6.2 2013-04-20 r3812 [i686-linux-3.11.0-14-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       0
   3 Spin_Up_Time            0x0027   217   186   021    Pre-fail 
Always       -       4141
   4 Start_Stop_Count        0x0032   100   100   000    Old_age 
Always       -       100
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x002e   200   200   000    Old_age 
Always       -       0
   9 Power_On_Hours          0x0032   089   089   000    Old_age 
Always       -       8263
  10 Spin_Retry_Count        0x0032   100   253   000    Old_age 
Always       -       0
  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       100
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age 
Always       -       86
193 Load_Cycle_Count        0x0032   156   156   000    Old_age 
Always       -       134976
194 Temperature_Celsius     0x0022   122   109   000    Old_age 
Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age 
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age 
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
Offline      -       0

>>>> I would have expected the RAID0 device to easily get
>>>> up to the 60meg/sec mark ?
>>> As the source disk of a bulk file copy over NFS/CIFS?  As a point of
>>> reference, I have a workstation that maxes 50MB/s FTP and only 24MB/s
>>> CIFS to/from a server.  Both hosts have far in excess of 100MB/s disk
>>> throughput.  The 50MB/s limitation is due to the cheap Realtek mobo NIC,
>>> and the 24MB/s is a Samba limit.  I've spent dozens of hours attempting
>>> to tweak Samba to greater throughput but it simply isn't capable on that
>>> machine.
>>>
>>> Your throughput issues are with your network, not your RAID.  Learn and
>>> use FIO to see what your RAID/disks can do.  For now a really simple
>>> test is to time cat of a large file and pipe to /dev/null.  Divide the
>>> file size by the elapsed time.  Or simply do a large read with dd.  This
>>> will be much more informative than "moving data to a NAS", where your
>>> throughput is network limited, not disk.
>>>
>> The system is using a server grade NIC, I will run a dd/network test
>> shortly after the copy is done. (I am shifting all the data back to the
>> NAS, incase I mucked up the partitions :) ), I do recall that this
>> system was able to fill a gig pipe...
> Now that you've made it clear the first scenario was over iSCSI same as
> the 2nd scenario, and not NFS/CIFS, I doubt the TCP stack is the
> problem.  Assume the network is fine for now and concentrate on the disk
> drives in the host.  That's seems the most likely cause of the problem
> at this point.
>
> BTW, you didn't state the throughput of the RAID1 device on sdb/sdc.
> The RAID0 device is on the same disks, yes?  RAID0 was 15 MB/s.  What
> was the RAID1?
>
ATM, the data is still moving back to the NAS (from the RAID1 device). 
According to iostat, this is reading at +30000 kB/s (all of my numbers 
are from iostat -x)

Also, there is no other disk usage in the system. All the data is 
currently on the NAS (except system "stuff" for a quite firewall)

I just spotted another thing, the two drives are on the same SATA 
controller, from rescan-scsi-bus:

Scanning for device 3 0 0 0 ...
OLD: Host: scsi3 Channel: 00 Id: 00 Lun: 00
       Vendor: ATA      Model: WDC WD20EARX-008 Rev: 51.0
       Type:   Direct-Access                    ANSI SCSI revision: 05
Scanning for device 3 0 1 0 ...
OLD: Host: scsi3 Channel: 00 Id: 01 Lun: 00
       Vendor: ATA      Model: WDC WD20EARX-008 Rev: 51.0
       Type:   Direct-Access                    ANSI SCSI revision: 05

Would it be better to move these apart ? I remember IDE used to have 
this issue, but I also recall SATA "fixed" that.

Thanks again,

Pieter

  reply	other threads:[~2013-12-30 12:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-29 21:04 Is partition alignment needed for RAID partitions ? Pieter De Wit
2013-12-30  6:56 ` Stan Hoeppner
2013-12-30  8:32   ` Pieter De Wit
2013-12-30 10:49     ` Stan Hoeppner
2013-12-30 12:10       ` Pieter De Wit [this message]
2013-12-30 17:10         ` Stan Hoeppner
2013-12-30 18:32           ` Pieter De Wit
2013-12-31 14:21             ` Stan Hoeppner
2013-12-31  1:05           ` Pieter De Wit
2013-12-31 14:38             ` Stan Hoeppner
2014-01-02 19:49             ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52C162A7.1080309@insync.za.net \
    --to=pieter@insync.za.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.