* O_DIRECT alignment requirements ? @ 2003-04-09 12:16 Rob van Nieuwkerk 2003-04-09 15:48 ` Joel Becker 0 siblings, 1 reply; 9+ messages in thread From: Rob van Nieuwkerk @ 2003-04-09 12:16 UTC (permalink / raw) To: linux-kernel Hi all, I plan to use O_DIRECT in my application (on a partition, no fs). It is hard to find info on the exact requirements on the mandatory alignments of buffer, offset, transfer size: it's easy to find many contradicting documents. And checking the kernel source itself isn't trivial. lseek(int fildes, off_t offset, int whence); read(int fd, void *buf, size_t count); My current assumption is this: - offset must be block_aligned (multiple of 512) - buf must be page_aligned (4096 on IA32) - count must be "block_aligned" (multiple of 512) Is this correct ? r. sysinfo: -------- - Linux 2.4 (recent versions) - using raw partion, no files, no fs - IA32 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: O_DIRECT alignment requirements ? 2003-04-09 12:16 O_DIRECT alignment requirements ? Rob van Nieuwkerk @ 2003-04-09 15:48 ` Joel Becker 2003-04-09 16:53 ` Rob van Nieuwkerk 2003-04-09 18:08 ` Benjamin LaHaise 0 siblings, 2 replies; 9+ messages in thread From: Joel Becker @ 2003-04-09 15:48 UTC (permalink / raw) To: Rob van Nieuwkerk; +Cc: linux-kernel On Wed, Apr 09, 2003 at 02:16:08PM +0200, Rob van Nieuwkerk wrote: > I plan to use O_DIRECT in my application (on a partition, no fs). > It is hard to find info on the exact requirements on the mandatory > alignments of buffer, offset, transfer size: it's easy to find many > contradicting documents. And checking the kernel source itself isn't > trivial. In 2.4, your buffer, offset, and transfer size must be soft blocksize aligned. That's the output of BLKBSZGET against the block device. For unmounted partitions that is 512b, for most people's ext3 filesystems that is 4K. It is, FYI, the number set by set_blocksize(). In 2.5, the alignment restrictions have been relaxed. Your offset, buffer, and transfer size must all be aligned on the hardware sector size. That is the output of BLKSSZGET against the block device, and is also what get_hardsect_size() returns in the kernel. For almost all disks this number is 512b, so you can do O_DIRECT on 512b alignment for a raw disk or for an ext3 filesystem. About the only thing that may not have a 512b hardware sector size is a CD-ROM. Joel -- "Hey mister if you're gonna walk on water, Could you drop a line my way?" Joel Becker Senior Member of Technical Staff Oracle Corporation E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: O_DIRECT alignment requirements ? 2003-04-09 15:48 ` Joel Becker @ 2003-04-09 16:53 ` Rob van Nieuwkerk 2003-04-09 17:59 ` Joel Becker 2003-04-09 19:15 ` Andrew Morton 2003-04-09 18:08 ` Benjamin LaHaise 1 sibling, 2 replies; 9+ messages in thread From: Rob van Nieuwkerk @ 2003-04-09 16:53 UTC (permalink / raw) To: Joel Becker; +Cc: Rob van Nieuwkerk, linux-kernel Joel Becker wrote: > On Wed, Apr 09, 2003 at 02:16:08PM +0200, Rob van Nieuwkerk wrote: > > I plan to use O_DIRECT in my application (on a partition, no fs). > > It is hard to find info on the exact requirements on the mandatory > > alignments of buffer, offset, transfer size: it's easy to find many > > contradicting documents. And checking the kernel source itself isn't > > trivial. > > In 2.4, your buffer, offset, and transfer size must be soft > blocksize aligned. That's the output of BLKBSZGET against the block > device. For unmounted partitions that is 512b, for most people's ext3 > filesystems that is 4K. It is, FYI, the number set by set_blocksize(). Hi Joel, Thank you for your reaction. I get 4096 with BLKBSZGET on several unmounted partitions on my system (RH 2.4.18-27.7.x kernel). Some give 1024 .. Maybe it is because I had them mounted first and unmounted them for the test ? greetings, Rob van Nieuwkerk ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: O_DIRECT alignment requirements ? 2003-04-09 16:53 ` Rob van Nieuwkerk @ 2003-04-09 17:59 ` Joel Becker 2003-04-09 19:15 ` Andrew Morton 1 sibling, 0 replies; 9+ messages in thread From: Joel Becker @ 2003-04-09 17:59 UTC (permalink / raw) To: Rob van Nieuwkerk; +Cc: linux-kernel On Wed, Apr 09, 2003 at 06:53:17PM +0200, Rob van Nieuwkerk wrote: > I get 4096 with BLKBSZGET on several unmounted partitions on my system > (RH 2.4.18-27.7.x kernel). Some give 1024 .. Maybe it is because I > had them mounted first and unmounted them for the test ? That would be the most likely answer. When you unmount, I don't believe the filesystem bothers to set_blocksize(get_hardsect_size(dev)). Joel -- Life's Little Instruction Book #94 "Make it a habit to do nice things for people who will never find out." Joel Becker Senior Member of Technical Staff Oracle Corporation E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: O_DIRECT alignment requirements ? 2003-04-09 16:53 ` Rob van Nieuwkerk 2003-04-09 17:59 ` Joel Becker @ 2003-04-09 19:15 ` Andrew Morton 2003-04-09 21:09 ` Rob van Nieuwkerk 1 sibling, 1 reply; 9+ messages in thread From: Andrew Morton @ 2003-04-09 19:15 UTC (permalink / raw) To: Rob van Nieuwkerk; +Cc: Joel.Becker, robn, linux-kernel Rob van Nieuwkerk <robn@verdi.et.tudelft.nl> wrote: > > > Joel Becker wrote: > > On Wed, Apr 09, 2003 at 02:16:08PM +0200, Rob van Nieuwkerk wrote: > > > I plan to use O_DIRECT in my application (on a partition, no fs). > > > It is hard to find info on the exact requirements on the mandatory > > > alignments of buffer, offset, transfer size: it's easy to find many > > > contradicting documents. And checking the kernel source itself isn't > > > trivial. > > > > In 2.4, your buffer, offset, and transfer size must be soft > > blocksize aligned. That's the output of BLKBSZGET against the block > > device. For unmounted partitions that is 512b, for most people's ext3 > > filesystems that is 4K. It is, FYI, the number set by set_blocksize(). > > Hi Joel, > > Thank you for your reaction. > > I get 4096 with BLKBSZGET on several unmounted partitions on my system > (RH 2.4.18-27.7.x kernel). Some give 1024 .. Maybe it is because I > had them mounted first and unmounted them for the test ? > Yes, the blockdev initially comes up with a 1024 softblocksize. When you mount a filesystem on the device, the soft blocksize gets rewritten to (typically) 4096. Unfortunately it remains at 4096 after the fs is unmounted, which is rather silly. In 2.5 you should use BLKSSZGET (sector-size, not block-size) to work out the supported alignment. I'm not sure what a good and general solution is really. You might have to resort to probing the size at runtime, by trying increasing blocksizes until it works. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: O_DIRECT alignment requirements ? 2003-04-09 19:15 ` Andrew Morton @ 2003-04-09 21:09 ` Rob van Nieuwkerk 2003-04-09 23:27 ` Joel Becker 0 siblings, 1 reply; 9+ messages in thread From: Rob van Nieuwkerk @ 2003-04-09 21:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Rob van Nieuwkerk, Joel.Becker, linux-kernel > Rob van Nieuwkerk <robn@verdi.et.tudelft.nl> wrote: > > > > > > Joel Becker wrote: > > > On Wed, Apr 09, 2003 at 02:16:08PM +0200, Rob van Nieuwkerk wrote: > > > > I plan to use O_DIRECT in my application (on a partition, no fs). > > > > It is hard to find info on the exact requirements on the mandatory > > > > alignments of buffer, offset, transfer size: it's easy to find many > > > > contradicting documents. And checking the kernel source itself isn't > > > > trivial. > > > > > > In 2.4, your buffer, offset, and transfer size must be soft > > > blocksize aligned. That's the output of BLKBSZGET against the block > > > device. For unmounted partitions that is 512b, for most people's ext3 > > > filesystems that is 4K. It is, FYI, the number set by set_blocksize(). > > > > Hi Joel, > > > > Thank you for your reaction. > > > > I get 4096 with BLKBSZGET on several unmounted partitions on my system > > (RH 2.4.18-27.7.x kernel). Some give 1024 .. Maybe it is because I > > had them mounted first and unmounted them for the test ? > > > > Yes, the blockdev initially comes up with a 1024 softblocksize. When you > mount a filesystem on the device, the soft blocksize gets rewritten to > (typically) 4096. Hi all, OK, my experiments confirm that BLKBSZGET returns 1024 on partitions that have not been mounted and on complete disk devices (eg. /dev/hda). But there remain some mysteries .. :-) My original theory was that: - read offset must be aligned (multiple of 512) - read buffer must be page_aligned (4096 on IA32) - count must be "aligned" (multiple of 512) Then based on the postings on the list it became (for 2.4 kernels): - read offset must be BLKBSZGET aligned - read buffer must be BLKBSZGET aligned - count must be BLKBSZGET aligned - on an unused/unmounted partition BLKBSZGET returns 1024 (on an IDE disk partition), so everything must be 1024 byte aligned But a friend of mine uses O_DIRECT with 2.4 kernels to read *individual* single harddisk sectors of 512 bytes ! He claims that my original theory is the right one and that you can read 512 byte chunks on 512 byte bounderies (he uses the complete device eg. /dev/hda). So, I'm confused. What are *exactly* the 2.4 O_DIRECT alignment requirements ? greetings, Rob van Nieuwkerk (yes, I should write a small test program, but my real app will be ready soon) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: O_DIRECT alignment requirements ? 2003-04-09 21:09 ` Rob van Nieuwkerk @ 2003-04-09 23:27 ` Joel Becker 2003-04-10 16:33 ` Rob van Nieuwkerk 0 siblings, 1 reply; 9+ messages in thread From: Joel Becker @ 2003-04-09 23:27 UTC (permalink / raw) To: Rob van Nieuwkerk; +Cc: Andrew Morton, linux-kernel On Wed, Apr 09, 2003 at 11:09:23PM +0200, Rob van Nieuwkerk wrote: > But a friend of mine uses O_DIRECT with 2.4 kernels to read *individual* > single harddisk sectors of 512 bytes ! He claims that my original > theory is the right one and that you can read 512 byte chunks on 512 > byte bounderies (he uses the complete device eg. /dev/hda). Well, how does your friend access /dev/hda? Is he using raw devices? Joel -- "What do you take me for, an idiot?" - General Charles de Gaulle, when a journalist asked him if he was happy. Joel Becker Senior Member of Technical Staff Oracle Corporation E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: O_DIRECT alignment requirements ? 2003-04-09 23:27 ` Joel Becker @ 2003-04-10 16:33 ` Rob van Nieuwkerk 0 siblings, 0 replies; 9+ messages in thread From: Rob van Nieuwkerk @ 2003-04-10 16:33 UTC (permalink / raw) To: Joel Becker; +Cc: Rob van Nieuwkerk, Andrew Morton, linux-kernel > On Wed, Apr 09, 2003 at 11:09:23PM +0200, Rob van Nieuwkerk wrote: > > But a friend of mine uses O_DIRECT with 2.4 kernels to read *individual* > > single harddisk sectors of 512 bytes ! He claims that my original > > theory is the right one and that you can read 512 byte chunks on 512 > > byte bounderies (he uses the complete device eg. /dev/hda). > > Well, how does your friend access /dev/hda? Is he using raw > devices? Hi all, OK, I checked again with him. It turns out there was some confusion. He does read in *1024* byte chunks after all (accessing /dev/hdX directly) .. But I still need to do 512 byte chunks myself (*). I understand that this can be done with the the raw device construction ? Would it also be possible to change the device blocksize with a BLKBSZSET ioctl() to 512 and do 512 byte transfers after that ? greetings, Rob van Nieuwkerk *: you might wonder why I'm so obsessed with 512 transfers .. The reason is that I'm working with a USB device (driver) that dies when there is too much CompactFlash IDE access (because of data-records logged to the CF). I want to have the absolute minimum amount of disk-access. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: O_DIRECT alignment requirements ? 2003-04-09 15:48 ` Joel Becker 2003-04-09 16:53 ` Rob van Nieuwkerk @ 2003-04-09 18:08 ` Benjamin LaHaise 1 sibling, 0 replies; 9+ messages in thread From: Benjamin LaHaise @ 2003-04-09 18:08 UTC (permalink / raw) To: Joel Becker; +Cc: Rob van Nieuwkerk, linux-kernel On Wed, Apr 09, 2003 at 08:48:36AM -0700, Joel Becker wrote: > In 2.5, the alignment restrictions have been relaxed. Your > offset, buffer, and transfer size must all be aligned on the hardware > sector size. That is the output of BLKSSZGET against the block device, > and is also what get_hardsect_size() returns in the kernel. For almost > all disks this number is 512b, so you can do O_DIRECT on 512b alignment > for a raw disk or for an ext3 filesystem. About the only thing that > may not have a 512b hardware sector size is a CD-ROM. Well, and SCSI devices configured to use 528 byte sectors and such... -ben -- Junk email? <a href="mailto:aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-04-10 16:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-04-09 12:16 O_DIRECT alignment requirements ? Rob van Nieuwkerk 2003-04-09 15:48 ` Joel Becker 2003-04-09 16:53 ` Rob van Nieuwkerk 2003-04-09 17:59 ` Joel Becker 2003-04-09 19:15 ` Andrew Morton 2003-04-09 21:09 ` Rob van Nieuwkerk 2003-04-09 23:27 ` Joel Becker 2003-04-10 16:33 ` Rob van Nieuwkerk 2003-04-09 18:08 ` Benjamin LaHaise
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox