From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stan Hoeppner Subject: Re: The chunk size paradox Date: Thu, 02 Jan 2014 09:41:27 -0600 Message-ID: <52C588A7.6010207@hardwarefreak.com> References: <52C1C01A.7010407@ubuntu.com> <52C57C7B.80400@shiftmail.org> Reply-To: stan@hardwarefreak.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <52C57C7B.80400@shiftmail.org> Sender: linux-raid-owner@vger.kernel.org To: joystick , Phillip Susi Cc: linux-raid List-Id: linux-raid.ids On 1/2/2014 8:49 AM, joystick wrote: > For a 4k write in raid5, two 4k sectors are read, then > two 4k sectors are written, and this is completely independent from > chunk size. First, there is no such thing as a 4K sector in Linux. Sectors are 512 bytes. Filesystem blocks and memory pages are 4K. I'm no expert WRT raid5.c/raid6.c, but I'm pretty sure it doesn't work as you state. I'm pretty sure it works like this: Redundancy is maintained at the chunk level, not the filesystem block level or page level. If modifying a single filesystem block, md will read the data chunk of the stripe in which the 4 sectors of the 4KB block resides, write back the chunk incorporating the changes to the 4 sectors, read the parity chunk, recalculate the parity chunk based on the new data chunk, and then write back the parity chunk. This is precisely why many folks, including myself, consider the current 512KB chunk default to be way too high. Modifying a single 4KB filesystem block requires reading 1MB from disk and writing 1MB, a total of 2MB of IO just to modify a single 4KB page. And AFAIK this is the best case scenario. According to past posts by Neil, IIRC, the current RAID5/6 code may read more than just two chunks during RMW depending on certain factors. With RAID6 you have at least one extra chunk write, if not an extra chunk read, so your IO is at least 2.5MB for a single 4K write with RAID6. -- Stan