public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [rfc patch 2/2] direct-io: remove address alignment check
@ 2005-07-13 23:43 Daniel McNeil
  2005-07-14 23:16 ` Badari Pulavarty
  2005-07-15  0:28 ` Tejun Heo
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel McNeil @ 2005-07-13 23:43 UTC (permalink / raw)
  To: linux-aio@kvack.org, Linux Kernel Mailing List

This patch relaxes the direct i/o alignment check so that user addresses
do not have to be a multiple of the device block size.

I've done some preliminary testing and it mostly works on an ext3
file system on a ide disk.  I have seen trouble when the user address
is on an odd byte boundary.  Sometimes the data is read back incorrectly
on read and sometimes I get these kernel error messages:
	hda: dma_timer_expiry: dma status == 0x60
	hda: DMA timeout retry
	hda: timeout waiting for DMA
	hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
	ide: failed opcode was: unknown
	hda: drive not ready for command

Doing direct-io with user addresses on even, non-512 boundaries appears
to be working correctly.

Any additional testing and/or comments welcome.

Signed-off-by: Daniel McNeil <daniel@osdl.org>

--- linux-2.6.12.orig/fs/direct-io.c	2005-06-28 16:39:39.000000000 -0700
+++ linux-2.6.12/fs/direct-io.c	2005-06-28 16:39:59.000000000 -0700
@@ -1147,7 +1147,9 @@ __blockdev_direct_IO(int rw, struct kioc
 			goto out;
 	}
 
-	/* Check the memory alignment.  Blocks cannot straddle pages */
+	/*
+	 * Check the i/o.  It must be a multiple of device block size.
+	 */
 	for (seg = 0; seg < nr_segs; seg++) {
 		addr = (unsigned long)iov[seg].iov_base;
 		size = iov[seg].iov_len;
@@ -1156,7 +1158,7 @@ __blockdev_direct_IO(int rw, struct kioc
 			if (bdev)
 				 blkbits = bdev_blkbits;
 			blocksize_mask = (1 << blkbits) - 1;
-			if ((addr & blocksize_mask) || (size & blocksize_mask))
+			if (size & blocksize_mask)
 				goto out;
 		}
 	}



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
       [not found] <1121298112.6025.21.camel@ibm-c.pdx.osdl.net.suse.lists.linux.kernel>
@ 2005-07-14 13:18 ` Andi Kleen
  2005-07-14 16:02   ` Daniel McNeil
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2005-07-14 13:18 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: linux-kernel

Daniel McNeil <daniel@osdl.org> writes:

> This patch relaxes the direct i/o alignment check so that user addresses
> do not have to be a multiple of the device block size.

The original reason for this limit was that lots of drivers
(not only IDE) explode when you give them odd sizes. Sometimes
it is even worse.

I doubt all of them have been fixed.

Very risky change.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-14 13:18 ` Andi Kleen
@ 2005-07-14 16:02   ` Daniel McNeil
  2005-07-14 18:23     ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel McNeil @ 2005-07-14 16:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linux Kernel Mailing List

On Thu, 2005-07-14 at 06:18, Andi Kleen wrote:
> Daniel McNeil <daniel@osdl.org> writes:
> 
> > This patch relaxes the direct i/o alignment check so that user addresses
> > do not have to be a multiple of the device block size.
> 
> The original reason for this limit was that lots of drivers
> (not only IDE) explode when you give them odd sizes. Sometimes
> it is even worse.
> 
> I doubt all of them have been fixed.
> 
> Very risky change.
> 

That is exactly why I made this a separate patch, so that we
can test and find out where the problems are and work to fix
them.

Are there problems only with odd sizes, or do drivers have problems
with non-512 sizes?

Allowing 4-byte aligned user addresses would be a good step
forward, since it looks like malloc() returns 4-byte aligned 
addresses.

Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-14 16:02   ` Daniel McNeil
@ 2005-07-14 18:23     ` Andi Kleen
  2005-07-14 20:40       ` Daniel McNeil
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2005-07-14 18:23 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: Andi Kleen, Linux Kernel Mailing List

> That is exactly why I made this a separate patch, so that we
> can test and find out where the problems are and work to fix
> them.

That's pretty hard because there are a lot of block drivers.

And might not very nice for people's data.

> 
> Are there problems only with odd sizes, or do drivers have problems
> with non-512 sizes?

I believe they have problems with non 512 sizes (and probably alignments) 
too.

-Andi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-14 18:23     ` Andi Kleen
@ 2005-07-14 20:40       ` Daniel McNeil
  2005-07-14 23:39         ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel McNeil @ 2005-07-14 20:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linux Kernel Mailing List

On Thu, 2005-07-14 at 11:23, Andi Kleen wrote:
> > That is exactly why I made this a separate patch, so that we
> > can test and find out where the problems are and work to fix
> > them.
> 
> That's pretty hard because there are a lot of block drivers.
> 
> And might not very nice for people's data.
> 
> > 
> > Are there problems only with odd sizes, or do drivers have problems
> > with non-512 sizes?
> 
> I believe they have problems with non 512 sizes (and probably alignments) 
> too.

The check still only allows i/o that is multiple of the device block
size.  That will always be a requirement.

I was trying to ask:
Do drivers have problems with odd addresses or with
non-512 addresses?

In my limited testing, I saw problems with odd user space
addresses on IDE (using DMA).  When testing 2-byte aligned
addresses, I did not see any problems, and so far, the data
looks correct.

I am continuing to test and this patch allows other to try
it out as well.  For the most part, it should be safe because
nobody has application code that uses O_DIRECT with non-aligned
addresses.  Obviously, it will only be ready for mainline
if/when we fix all the drivers.

Thanks,

Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-13 23:43 [rfc patch 2/2] direct-io: remove address alignment check Daniel McNeil
@ 2005-07-14 23:16 ` Badari Pulavarty
  2005-07-14 23:44   ` Daniel McNeil
  2005-07-15  0:28 ` Tejun Heo
  1 sibling, 1 reply; 19+ messages in thread
From: Badari Pulavarty @ 2005-07-14 23:16 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List

How does your patch ensures that we meet the driver alignment
restrictions ? Like you said, you need atleast "even" byte alignment
for IDE etc..

And also, are there any restrictions on how much the "minimum" IO
size has to be ? I mean, can I read "1" byte ? I guess you are
not relaxing it (yet)..

Thanks,
Badari

On Wed, 2005-07-13 at 16:43 -0700, Daniel McNeil wrote:
> This patch relaxes the direct i/o alignment check so that user addresses
> do not have to be a multiple of the device block size.
> 
> I've done some preliminary testing and it mostly works on an ext3
> file system on a ide disk.  I have seen trouble when the user address
> is on an odd byte boundary.  Sometimes the data is read back incorrectly
> on read and sometimes I get these kernel error messages:
> 	hda: dma_timer_expiry: dma status == 0x60
> 	hda: DMA timeout retry
> 	hda: timeout waiting for DMA
> 	hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> 	ide: failed opcode was: unknown
> 	hda: drive not ready for command
> 
> Doing direct-io with user addresses on even, non-512 boundaries appears
> to be working correctly.
> 
> Any additional testing and/or comments welcome.
> 
> Signed-off-by: Daniel McNeil <daniel@osdl.org>
> 
> --- linux-2.6.12.orig/fs/direct-io.c	2005-06-28 16:39:39.000000000 -0700
> +++ linux-2.6.12/fs/direct-io.c	2005-06-28 16:39:59.000000000 -0700
> @@ -1147,7 +1147,9 @@ __blockdev_direct_IO(int rw, struct kioc
>  			goto out;
>  	}
>  
> -	/* Check the memory alignment.  Blocks cannot straddle pages */
> +	/*
> +	 * Check the i/o.  It must be a multiple of device block size.
> +	 */
>  	for (seg = 0; seg < nr_segs; seg++) {
>  		addr = (unsigned long)iov[seg].iov_base;
>  		size = iov[seg].iov_len;
> @@ -1156,7 +1158,7 @@ __blockdev_direct_IO(int rw, struct kioc
>  			if (bdev)
>  				 blkbits = bdev_blkbits;
>  			blocksize_mask = (1 << blkbits) - 1;
> -			if ((addr & blocksize_mask) || (size & blocksize_mask))
> +			if (size & blocksize_mask)
>  				goto out;
>  		}
>  	}
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-14 20:40       ` Daniel McNeil
@ 2005-07-14 23:39         ` Andrew Morton
  2005-07-15  0:03           ` Daniel McNeil
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2005-07-14 23:39 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: ak, linux-kernel

Daniel McNeil <daniel@osdl.org> wrote:
>
> Do drivers have problems with odd addresses or with
>  non-512 addresses?

I do recall hearing rumours that some bus-masters have fairly strict memory
alignment requirements.  A cacheline size, perhaps - that would be 32 bytes
given the age of the hardware.

But yeah, it's v.  risky to assume that all bus masters can cope with
memory alignments down to two bytes.

It would be sane to put the minimum alignment into ->backing_dev_info,
default to 512, get the device drivers to override that as they are tested.

But this introduces a very very bad problem: people will write applications
which work on their hardware, ship the things and then find that the apps
break on other people's hardware.  So we can't do that.

Instead, we need to work out the minimum alignment requirement for all disk
controllers and DMA controllers and motherboards in the world.  And that
includes catering for weird ones which appear to work but which
occasionally fail in mysterious ways with finer alignments.  That's hard. 
It's easier to continue to make application developers jump through hoops.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-14 23:16 ` Badari Pulavarty
@ 2005-07-14 23:44   ` Daniel McNeil
  2005-07-15  5:27     ` Badari Pulavarty
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel McNeil @ 2005-07-14 23:44 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List

On Thu, 2005-07-14 at 16:16, Badari Pulavarty wrote:
> How does your patch ensures that we meet the driver alignment
> restrictions ? Like you said, you need atleast "even" byte alignment
> for IDE etc..
> 
> And also, are there any restrictions on how much the "minimum" IO
> size has to be ? I mean, can I read "1" byte ? I guess you are
> not relaxing it (yet)..
> 

This patch does not change the i/o size requirements -- they
must be a multiple of device block size (usually 512).

It only relaxes the address alignment restriction.  I do not
know what the driver alignment restrictions are.  Without the
1st patch, it was impossible to relax the address space
check and have direct-io generate the correct i/o's to submit.

This 2nd patch, is just for testing and generating feedback
to find out what the address alignment issues are.  Then
we can decide how to proceed.

Did you look over the 1st patch?  Comments?

Thanks,

Daniel
 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-14 23:39         ` Andrew Morton
@ 2005-07-15  0:03           ` Daniel McNeil
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel McNeil @ 2005-07-15  0:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ak, Linux Kernel Mailing List

On Thu, 2005-07-14 at 16:39, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > Do drivers have problems with odd addresses or with
> >  non-512 addresses?
> 
> I do recall hearing rumours that some bus-masters have fairly strict memory
> alignment requirements.  A cacheline size, perhaps - that would be 32 bytes
> given the age of the hardware.
> 
> But yeah, it's v.  risky to assume that all bus masters can cope with
> memory alignments down to two bytes.
> 
> It would be sane to put the minimum alignment into ->backing_dev_info,
> default to 512, get the device drivers to override that as they are tested.
> 
> But this introduces a very very bad problem: people will write applications
> which work on their hardware, ship the things and then find that the apps
> break on other people's hardware.  So we can't do that.
> 
> Instead, we need to work out the minimum alignment requirement for all disk
> controllers and DMA controllers and motherboards in the world.  And that
> includes catering for weird ones which appear to work but which
> occasionally fail in mysterious ways with finer alignments.  That's hard. 
> It's easier to continue to make application developers jump through hoops.

I was hoping this patch would help turn rumors into real data :)

If we did put min alignment into backing_dev_info, we could implement
the equivalent of bounce buffers for direct-io -- or just fall back
to buffer i/o like it does sometimes anyway.  That way application
would not break, just get worse performance on some hardware.

Right now I just wanted to get the issues on table, get some test
results, and see how to proceed from there.  Since this patch only
affects direct i/o, getting test results shouldn't cause too many
problems.

Thanks,

Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-13 23:43 [rfc patch 2/2] direct-io: remove address alignment check Daniel McNeil
  2005-07-14 23:16 ` Badari Pulavarty
@ 2005-07-15  0:28 ` Tejun Heo
  2005-07-15  5:18   ` Badari Pulavarty
  1 sibling, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2005-07-15  0:28 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List

Daniel McNeil wrote:
> This patch relaxes the direct i/o alignment check so that user addresses
> do not have to be a multiple of the device block size.
> 
> I've done some preliminary testing and it mostly works on an ext3
> file system on a ide disk.  I have seen trouble when the user address
> is on an odd byte boundary.  Sometimes the data is read back incorrectly
> on read and sometimes I get these kernel error messages:
> 	hda: dma_timer_expiry: dma status == 0x60
> 	hda: DMA timeout retry
> 	hda: timeout waiting for DMA
> 	hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> 	ide: failed opcode was: unknown
> 	hda: drive not ready for command
> 
> Doing direct-io with user addresses on even, non-512 boundaries appears
> to be working correctly.
> 
> Any additional testing and/or comments welcome.
> 

  Hi, Daniel.

  I don't think the change is a good idea.  We may be able to relax 
alignment contraints on some hardware to certain levels, but IMHO it 
will be very difficult to verify.  All internal block IO code follows 
strict block boundary alignment.  And as raw IOs (especially unaligned 
ones) aren't very common operations, they won't get tested much.  Then 
when some rare (probably not an open source one) application uses it on 
some rare buggy hardware, it may cause *very* strange things.

  Also, I don't think it will improve application programmer's 
convenience.  As each hardware employs different DMA alignemnt, we need 
to implement a way to export the alignment to user space and enforce it. 
   So, in the end, user application must do aligned allocation 
accordingly.  Just following block boundary will be easier.

  I don't know why you wanna relax the alignment requirement, but 
wouldn't it be easier to just write/use block-aligned allocator for such 
buffers?  It will even make the program more portable.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-15  0:28 ` Tejun Heo
@ 2005-07-15  5:18   ` Badari Pulavarty
  2005-07-15  8:23     ` Tejun Heo
  2005-07-15 16:56     ` Joel Becker
  0 siblings, 2 replies; 19+ messages in thread
From: Badari Pulavarty @ 2005-07-15  5:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Daniel McNeil, linux-aio@kvack.org, Linux Kernel Mailing List

Tejun Heo wrote:

> Daniel McNeil wrote:
> 
>> This patch relaxes the direct i/o alignment check so that user addresses
>> do not have to be a multiple of the device block size.
>>
>> I've done some preliminary testing and it mostly works on an ext3
>> file system on a ide disk.  I have seen trouble when the user address
>> is on an odd byte boundary.  Sometimes the data is read back incorrectly
>> on read and sometimes I get these kernel error messages:
>>     hda: dma_timer_expiry: dma status == 0x60
>>     hda: DMA timeout retry
>>     hda: timeout waiting for DMA
>>     hda: status error: status=0x58 { DriveReady SeekComplete 
>> DataRequest }
>>     ide: failed opcode was: unknown
>>     hda: drive not ready for command
>>
>> Doing direct-io with user addresses on even, non-512 boundaries appears
>> to be working correctly.
>>
>> Any additional testing and/or comments welcome.
>>
> 
>  Hi, Daniel.
> 
>  I don't think the change is a good idea.  We may be able to relax 
> alignment contraints on some hardware to certain levels, but IMHO it 
> will be very difficult to verify.  All internal block IO code follows 
> strict block boundary alignment.  And as raw IOs (especially unaligned 
> ones) aren't very common operations, they won't get tested much.  Then 
> when some rare (probably not an open source one) application uses it on 
> some rare buggy hardware, it may cause *very* strange things.
> 
>  Also, I don't think it will improve application programmer's 
> convenience.  As each hardware employs different DMA alignemnt, we need 
> to implement a way to export the alignment to user space and enforce it. 
>   So, in the end, user application must do aligned allocation 
> accordingly.  Just following block boundary will be easier.
> 
>  I don't know why you wanna relax the alignment requirement, but 
> wouldn't it be easier to just write/use block-aligned allocator for such 
> buffers?  It will even make the program more portable.
> 

I can imagine a reason for relaxing the alignment. I keep getting asked
whether we can do "O_DIRECT mount option".  Database folks wants to
make sure all the access to files in a given filesystem are O_DIRECT
(whether they are accessing or some random program like ftp, scp, cp
are acessing them). This was mainly to ensure that buffered accesses to
the file doesn't polute the pagecache (while database is using O_DIRECT
access). Seems like a logical request, but not easy to do :(

Thanks,
Badari


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-14 23:44   ` Daniel McNeil
@ 2005-07-15  5:27     ` Badari Pulavarty
  2005-07-15 20:06       ` Daniel McNeil
  0 siblings, 1 reply; 19+ messages in thread
From: Badari Pulavarty @ 2005-07-15  5:27 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List

Daniel McNeil wrote:

> On Thu, 2005-07-14 at 16:16, Badari Pulavarty wrote:
> 
>>How does your patch ensures that we meet the driver alignment
>>restrictions ? Like you said, you need atleast "even" byte alignment
>>for IDE etc..
>>
>>And also, are there any restrictions on how much the "minimum" IO
>>size has to be ? I mean, can I read "1" byte ? I guess you are
>>not relaxing it (yet)..
>>
> 
> 
> This patch does not change the i/o size requirements -- they
> must be a multiple of device block size (usually 512).
> 
> It only relaxes the address alignment restriction.  I do not
> know what the driver alignment restrictions are.  Without the
> 1st patch, it was impossible to relax the address space
> check and have direct-io generate the correct i/o's to submit.
> 
> This 2nd patch, is just for testing and generating feedback
> to find out what the address alignment issues are.  Then
> we can decide how to proceed.
> 
> Did you look over the 1st patch?  Comments?

Yes. I did look at the first patch and my questions were basically
towards the first patch. I don't see any enforcement of alignment
with your patch at all. So, we let the driver fail if it can't
handle it ?

BTW, I don't think the first patch is really doing the right thing.
You got little carried away while cleaning up.
You are trying to relax "user buffer" alignment only. If your
"offset" is in the middle of a filesystem block (say 4k), you still
need to zero out the first portion to be able to write into the
middle. That "evil" code is still needed. :(

Thanks,
Badari


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-15  5:18   ` Badari Pulavarty
@ 2005-07-15  8:23     ` Tejun Heo
  2005-07-15 17:54       ` Badari Pulavarty
  2005-07-15 16:56     ` Joel Becker
  1 sibling, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2005-07-15  8:23 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Daniel McNeil, linux-aio@kvack.org, Linux Kernel Mailing List

Badari Pulavarty wrote:
> Tejun Heo wrote:
> 
>> Daniel McNeil wrote:
>>
>>> This patch relaxes the direct i/o alignment check so that user addresses
>>> do not have to be a multiple of the device block size.
>>>
>>> I've done some preliminary testing and it mostly works on an ext3
>>> file system on a ide disk.  I have seen trouble when the user address
>>> is on an odd byte boundary.  Sometimes the data is read back incorrectly
>>> on read and sometimes I get these kernel error messages:
>>>     hda: dma_timer_expiry: dma status == 0x60
>>>     hda: DMA timeout retry
>>>     hda: timeout waiting for DMA
>>>     hda: status error: status=0x58 { DriveReady SeekComplete 
>>> DataRequest }
>>>     ide: failed opcode was: unknown
>>>     hda: drive not ready for command
>>>
>>> Doing direct-io with user addresses on even, non-512 boundaries appears
>>> to be working correctly.
>>>
>>> Any additional testing and/or comments welcome.
>>>
>>
>>  Hi, Daniel.
>>
>>  I don't think the change is a good idea.  We may be able to relax 
>> alignment contraints on some hardware to certain levels, but IMHO it 
>> will be very difficult to verify.  All internal block IO code follows 
>> strict block boundary alignment.  And as raw IOs (especially unaligned 
>> ones) aren't very common operations, they won't get tested much.  Then 
>> when some rare (probably not an open source one) application uses it 
>> on some rare buggy hardware, it may cause *very* strange things.
>>
>>  Also, I don't think it will improve application programmer's 
>> convenience.  As each hardware employs different DMA alignemnt, we 
>> need to implement a way to export the alignment to user space and 
>> enforce it.   So, in the end, user application must do aligned 
>> allocation accordingly.  Just following block boundary will be easier.
>>
>>  I don't know why you wanna relax the alignment requirement, but 
>> wouldn't it be easier to just write/use block-aligned allocator for 
>> such buffers?  It will even make the program more portable.
>>
> 
> I can imagine a reason for relaxing the alignment. I keep getting asked
> whether we can do "O_DIRECT mount option".  Database folks wants to
> make sure all the access to files in a given filesystem are O_DIRECT
> (whether they are accessing or some random program like ftp, scp, cp
> are acessing them). This was mainly to ensure that buffered accesses to
> the file doesn't polute the pagecache (while database is using O_DIRECT
> access). Seems like a logical request, but not easy to do :(
> 
> Thanks,
> Badari

  I don't know much about VM, but, if that's necessary, I think that 
limiting pagecache size per mounted fs (or by some other applicable 
category) is easier/more complete approach.  After all, you cannot mmap 
w/ O_DIRECT and many programs (gcc, ld come to mind) mmap large part of 
their memory usage.

  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-15  5:18   ` Badari Pulavarty
  2005-07-15  8:23     ` Tejun Heo
@ 2005-07-15 16:56     ` Joel Becker
  2005-07-15 17:50       ` Badari Pulavarty
  1 sibling, 1 reply; 19+ messages in thread
From: Joel Becker @ 2005-07-15 16:56 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Tejun Heo, Daniel McNeil, linux-aio@kvack.org,
	Linux Kernel Mailing List

On Thu, Jul 14, 2005 at 10:18:28PM -0700, Badari Pulavarty wrote:
> I can imagine a reason for relaxing the alignment. I keep getting asked
> whether we can do "O_DIRECT mount option".  Database folks wants to
> make sure all the access to files in a given filesystem are O_DIRECT

	All currently existing "O_DIRECT mount option" implementations
that I know of do:

	if (not-512-aligned)
		bounce_buffer()

That is, no one attempts to support the wacky variations in DMA engines.

Joel

-- 

 Brain: I shall pollute the water supply with this DNAdefibuliser,
        turning everyone into mindless slaves.
 Pinky: What about the people who drink bottled water?
 Brain: Pinky, people who pay 5 dollars for a bottle of water are
        already mindless slaves.

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-15 16:56     ` Joel Becker
@ 2005-07-15 17:50       ` Badari Pulavarty
  2005-07-15 19:16         ` Joel Becker
  0 siblings, 1 reply; 19+ messages in thread
From: Badari Pulavarty @ 2005-07-15 17:50 UTC (permalink / raw)
  To: Joel Becker
  Cc: Tejun Heo, Daniel McNeil, linux-aio@kvack.org,
	Linux Kernel Mailing List

On Fri, 2005-07-15 at 17:56 +0100, Joel Becker wrote:
> On Thu, Jul 14, 2005 at 10:18:28PM -0700, Badari Pulavarty wrote:
> > I can imagine a reason for relaxing the alignment. I keep getting asked
> > whether we can do "O_DIRECT mount option".  Database folks wants to
> > make sure all the access to files in a given filesystem are O_DIRECT
> 
> 	All currently existing "O_DIRECT mount option" implementations
> that I know of do:
> 
> 	if (not-512-aligned)
> 		bounce_buffer()
> 
> That is, no one attempts to support the wacky variations in DMA engines.


I believe some OSs do buffered IO, if there is a problem with alignment.

Thanks,
Badari


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-15  8:23     ` Tejun Heo
@ 2005-07-15 17:54       ` Badari Pulavarty
  2005-07-16  3:50         ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Badari Pulavarty @ 2005-07-15 17:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Daniel McNeil, linux-aio@kvack.org, Linux Kernel Mailing List

On Fri, 2005-07-15 at 17:23 +0900, Tejun Heo wrote:
> Badari Pulavarty wrote:
...
> >>  I don't know why you wanna relax the alignment requirement, but 
> >> wouldn't it be easier to just write/use block-aligned allocator for 
> >> such buffers?  It will even make the program more portable.
> >>
> > 
> > I can imagine a reason for relaxing the alignment. I keep getting asked
> > whether we can do "O_DIRECT mount option".  Database folks wants to
> > make sure all the access to files in a given filesystem are O_DIRECT
> > (whether they are accessing or some random program like ftp, scp, cp
> > are acessing them). This was mainly to ensure that buffered accesses to
> > the file doesn't polute the pagecache (while database is using O_DIRECT
> > access). Seems like a logical request, but not easy to do :(
> > 
> > Thanks,
> > Badari
> 
>   I don't know much about VM, but, if that's necessary, I think that 
> limiting pagecache size per mounted fs (or by some other applicable 
> category) is easier/more complete approach.  After all, you cannot mmap 
> w/ O_DIRECT and many programs (gcc, ld come to mind) mmap large part of 
> their memory usage.

I agree. I guess for mmap()ed access we can kick it back to buffered
mode.

I don't think limiting pagecache use per filesystem is an acceptable
option. In fact, database folks exactly want this -  to limit the
pagecache use by filesystems - but I don't think its right thing to do,
so I am trying to propose mount O_DIRECT as an alternative (if its
feasible).

Thanks,
Badari


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-15 17:50       ` Badari Pulavarty
@ 2005-07-15 19:16         ` Joel Becker
  0 siblings, 0 replies; 19+ messages in thread
From: Joel Becker @ 2005-07-15 19:16 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Tejun Heo, Daniel McNeil, linux-aio@kvack.org,
	Linux Kernel Mailing List

On Fri, Jul 15, 2005 at 10:50:46AM -0700, Badari Pulavarty wrote:
> I believe some OSs do buffered IO, if there is a problem with alignment.

	That's what I said.  They all do buffered I/O if the alignment
is not 512B.  They do _not_ try to accept alignments that are smaller.
There's no good reason to.  It just adds needless complexity.

Joel

-- 

"I think it would be a good idea."  
        - Mahatma Ghandi, when asked what he thought of Western
          civilization

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-15  5:27     ` Badari Pulavarty
@ 2005-07-15 20:06       ` Daniel McNeil
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel McNeil @ 2005-07-15 20:06 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: linux-aio@kvack.org, Linux Kernel Mailing List

On Thu, 2005-07-14 at 22:27, Badari Pulavarty wrote:
> Daniel McNeil wrote:
> 
> > On Thu, 2005-07-14 at 16:16, Badari Pulavarty wrote:
> > 
> >>How does your patch ensures that we meet the driver alignment
> >>restrictions ? Like you said, you need atleast "even" byte alignment
> >>for IDE etc..
> >>
> >>And also, are there any restrictions on how much the "minimum" IO
> >>size has to be ? I mean, can I read "1" byte ? I guess you are
> >>not relaxing it (yet)..
> >>
> > 
> > 
> > This patch does not change the i/o size requirements -- they
> > must be a multiple of device block size (usually 512).
> > 
> > It only relaxes the address alignment restriction.  I do not
> > know what the driver alignment restrictions are.  Without the
> > 1st patch, it was impossible to relax the address space
> > check and have direct-io generate the correct i/o's to submit.
> > 
> > This 2nd patch, is just for testing and generating feedback
> > to find out what the address alignment issues are.  Then
> > we can decide how to proceed.
> > 
> > Did you look over the 1st patch?  Comments?
> 
> Yes. I did look at the first patch and my questions were basically
> towards the first patch. I don't see any enforcement of alignment
> with your patch at all. So, we let the driver fail if it can't
> handle it ?
> 

The 1st patch re-writes direct-io to handle non-512 aligned
addresses.  Without the 2nd patch, it will never see non-512
aligned user address and should work the same as before only
with slightly smaller code :).  The drivers will get the
same 512-byte aligned addresses.  Am I missing something?

> BTW, I don't think the first patch is really doing the right thing.
> You got little carried away while cleaning up.
> You are trying to relax "user buffer" alignment only. If your
> "offset" is in the middle of a filesystem block (say 4k), you still
> need to zero out the first portion to be able to write into the
> middle. That "evil" code is still needed. :(
> 

The code still does zero out the 1st portion.  dio_zero_block()
is being called twice still.  Sure looks like it is working to
me:

Test program d.c:
------------------------
#define _GNU_SOURCE     1
                                                                                
                                                                                
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>

main()
{
	int fd;
	char *buf;
	int io_size = 512;
	off_t skip = 512;
	int i;

	if (posix_memalign((void *)&buf, getpagesize(), io_size) != 0) {
		perror("cannot alloc mem");
		exit(1);
	}

	memset(buf, 'a', io_size);
	
	fd = open("direct_test_file", O_RDWR|O_DIRECT|O_TRUNC|O_CREAT, 0666);

	lseek(fd, skip, SEEK_SET);
	if ((i = write(fd, buf, io_size)) != io_size) {
		perror("bad write");
		exit(2);
	}
	
	printf("write to direct_test_file %d bytes of 'a' at %d\n", i, skip);
	memset(buf, 'b', io_size);
	lseek(fd, getpagesize(), SEEK_SET);
	if ((i = write(fd, buf, io_size)) != io_size) {
		perror("bad write");
		exit(2);
	}
	printf("write to direct_test_file %d bytes of 'b' at %d\n", i, getpagesize());
}
--------------------------

$ ./d
write to direct_test_file 512 bytes of 'a' at 512
write to direct_test_file 512 bytes of 'b' at 4096

$ hexdump direct_test_file
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000200 6161 6161 6161 6161 6161 6161 6161 6161
*
0000400 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000 6262 6262 6262 6262 6262 6262 6262 6262
*
0001200

The 1st 512 bytes are zeroed as well as the bytes between
1k and 4k.

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [rfc patch 2/2] direct-io: remove address alignment check
  2005-07-15 17:54       ` Badari Pulavarty
@ 2005-07-16  3:50         ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-07-16  3:50 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Daniel McNeil, linux-aio@kvack.org, Linux Kernel Mailing List

Badari Pulavarty wrote:
> On Fri, 2005-07-15 at 17:23 +0900, Tejun Heo wrote:
> 
>>Badari Pulavarty wrote:
> 
> ...
> 
>>>> I don't know why you wanna relax the alignment requirement, but 
>>>>wouldn't it be easier to just write/use block-aligned allocator for 
>>>>such buffers?  It will even make the program more portable.
>>>>
>>>
>>>I can imagine a reason for relaxing the alignment. I keep getting asked
>>>whether we can do "O_DIRECT mount option".  Database folks wants to
>>>make sure all the access to files in a given filesystem are O_DIRECT
>>>(whether they are accessing or some random program like ftp, scp, cp
>>>are acessing them). This was mainly to ensure that buffered accesses to
>>>the file doesn't polute the pagecache (while database is using O_DIRECT
>>>access). Seems like a logical request, but not easy to do :(
>>>
>>>Thanks,
>>>Badari
>>
>>  I don't know much about VM, but, if that's necessary, I think that 
>>limiting pagecache size per mounted fs (or by some other applicable 
>>category) is easier/more complete approach.  After all, you cannot mmap 
>>w/ O_DIRECT and many programs (gcc, ld come to mind) mmap large part of 
>>their memory usage.
> 
> 
> I agree. I guess for mmap()ed access we can kick it back to buffered
> mode.
> 
> I don't think limiting pagecache use per filesystem is an acceptable
> option. In fact, database folks exactly want this -  to limit the
> pagecache use by filesystems - but I don't think its right thing to do,
> so I am trying to propose mount O_DIRECT as an alternative (if its
> feasible).

  Just out of curiosity, can you tell me why you think limiting 
pagecache isn't the right thing to do (tm)?  O_DIRECT mount seems to me 
incomplete/complex solution (DMA alignment etc...).  Forgive me if this 
issue has been discussed to death already.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-07-16  3:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-13 23:43 [rfc patch 2/2] direct-io: remove address alignment check Daniel McNeil
2005-07-14 23:16 ` Badari Pulavarty
2005-07-14 23:44   ` Daniel McNeil
2005-07-15  5:27     ` Badari Pulavarty
2005-07-15 20:06       ` Daniel McNeil
2005-07-15  0:28 ` Tejun Heo
2005-07-15  5:18   ` Badari Pulavarty
2005-07-15  8:23     ` Tejun Heo
2005-07-15 17:54       ` Badari Pulavarty
2005-07-16  3:50         ` Tejun Heo
2005-07-15 16:56     ` Joel Becker
2005-07-15 17:50       ` Badari Pulavarty
2005-07-15 19:16         ` Joel Becker
     [not found] <1121298112.6025.21.camel@ibm-c.pdx.osdl.net.suse.lists.linux.kernel>
2005-07-14 13:18 ` Andi Kleen
2005-07-14 16:02   ` Daniel McNeil
2005-07-14 18:23     ` Andi Kleen
2005-07-14 20:40       ` Daniel McNeil
2005-07-14 23:39         ` Andrew Morton
2005-07-15  0:03           ` Daniel McNeil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox