kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* Direct IO and Page cache
@ 2013-07-26  3:15 Kumar amit mehta
  2013-07-26  9:14 ` Chinmay V S
  2013-07-26 14:56 ` Valdis.Kletnieks at vt.edu
  0 siblings, 2 replies; 9+ messages in thread
From: Kumar amit mehta @ 2013-07-26  3:15 UTC (permalink / raw)
  To: kernelnewbies

Hi,

We have direct I/O(O_DIRECT), for example raw devices(/dev/rawctl) that
map to the block devices and we also have page cache. Now If I've
understood this correctly, direct I/O will bypass this page cache, which
is fine, I'll not get into the performance debate, but what about data
consistency. Kernel cannot and __should'nt__ try to control how the
applications are being written. So one bad day somebody comes up with
an application which does both these two types of IO(one that goes
through page cache and the other that doesn't) and in that application,
one instance is writing directly to the backend device and the other
instance, who is not aware of this write, goes ahead and writes to the
page cache, and that write would be written later to the backend device.
So wouldn't we end up corrupting the on disk data.

I can think of multiple other scenarios which could corrupt the on-disk
data, if there isn't any safeguarding policies employed by the kernel.
But I'm very much sure that kernel is aware of such nasty attempts, and
I'd like to know how does kernel takes care of this.

!!amit

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Direct IO and Page cache
  2013-07-26  9:14 ` Chinmay V S
@ 2013-07-26  4:02   ` Kumar amit mehta
  2013-07-26 10:21     ` Chinmay V S
  2013-07-26 14:59     ` Valdis.Kletnieks at vt.edu
  0 siblings, 2 replies; 9+ messages in thread
From: Kumar amit mehta @ 2013-07-26  4:02 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Jul 26, 2013 at 05:14:21PM +0800, Chinmay V S wrote:
> > We have direct I/O(O_DIRECT), for example raw devices(/dev/rawctl) that
> > map to the block devices and we also have page cache. Now If I've
> > understood this correctly, direct I/O will bypass this page cache, which
> > is fine, I'll not get into the performance debate, but what about data
> > consistency. Kernel cannot and __should'nt__ try to control how the
> > applications are being written. So one bad day somebody comes up with
> > an application which does both these two types of IO(one that goes
> > through page cache and the other that doesn't) and in that application,
> > one instance is writing directly to the backend device and the other
> > instance, who is not aware of this write, goes ahead and writes to the
> > page cache, and that write would be written later to the backend device.
> > So wouldn't we end up corrupting the on disk data.
> 
> Yes. And that is the responsibility of the application. While the
> existence of O_DIRECT may not be common sense, anyone who knows about
> it *must* know that it bypasses the kernel page-cache and hence *must*
> know the consequences of doing cached and direct I/O on the same file
> simultaneously.
> 
> > I can think of multiple other scenarios which could corrupt the on-disk
> > data, if there isn't any safeguarding policies employed by the kernel.
> > But I'm very much sure that kernel is aware of such nasty attempts, and
> > I'd like to know how does kernel takes care of this.
> 
> O_DIRECT is an explicit flag not enabled by default.
> 
> It is the app's responsibility to ensure that it does NOT misuse the
> feature. Essentially specifying the O_DIRECT flag is the app's way of
> saying - "Hey kernel, i know what i am doing. Please step aside and
> let me talk to the hardware directly. Please do NOT interfere."
> 
> The kernel happily obliges.
> 
> Later, the app should NOT go crying back to kernel (and blaming it),
> if the app manages to screw-up the direct "relationship" with the
> hardware.

So leaving the hardware at the mercy of the application doesn't sound
like a good practice. This __may__ compromise kernel stability too. Also
think of this:

In app1:
fdx = open("blah" , O_RW|O_DIRECT);
write(fdx,buf,sizeof(buf));
	
In app2(unaware of app1):
fdy = open("blah", O_RW);
write(fdy,buf, sizeof(buf));

I think this isn't highly unlikely to do, and if you agree with me then
we may end up with same could-be/would-be data-corruption. Now who should
be blamed here, app1, app2 or the kernel? Or it will be handled
differently here?

!!amit

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Direct IO and Page cache
  2013-07-26 10:31       ` Chinmay V S
@ 2013-07-26  6:04         ` Kumar amit mehta
  0 siblings, 0 replies; 9+ messages in thread
From: Kumar amit mehta @ 2013-07-26  6:04 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Jul 26, 2013 at 06:31:34PM +0800, Chinmay V S wrote:
> 1. Do not worry about coherency between the page-cache and the data
> transferred using O_DIRECT. The kernel will invalidate the cache after
> an O_DIRECT write and flush the cache before an O_DIRECT read.

Thank you. Grey skies are clearing up now.

!!amit

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Direct IO and Page cache
  2013-07-26  3:15 Direct IO and Page cache Kumar amit mehta
@ 2013-07-26  9:14 ` Chinmay V S
  2013-07-26  4:02   ` Kumar amit mehta
  2013-07-26 14:56 ` Valdis.Kletnieks at vt.edu
  1 sibling, 1 reply; 9+ messages in thread
From: Chinmay V S @ 2013-07-26  9:14 UTC (permalink / raw)
  To: kernelnewbies

> We have direct I/O(O_DIRECT), for example raw devices(/dev/rawctl) that
> map to the block devices and we also have page cache. Now If I've
> understood this correctly, direct I/O will bypass this page cache, which
> is fine, I'll not get into the performance debate, but what about data
> consistency. Kernel cannot and __should'nt__ try to control how the
> applications are being written. So one bad day somebody comes up with
> an application which does both these two types of IO(one that goes
> through page cache and the other that doesn't) and in that application,
> one instance is writing directly to the backend device and the other
> instance, who is not aware of this write, goes ahead and writes to the
> page cache, and that write would be written later to the backend device.
> So wouldn't we end up corrupting the on disk data.

Yes. And that is the responsibility of the application. While the
existence of O_DIRECT may not be common sense, anyone who knows about
it *must* know that it bypasses the kernel page-cache and hence *must*
know the consequences of doing cached and direct I/O on the same file
simultaneously.

> I can think of multiple other scenarios which could corrupt the on-disk
> data, if there isn't any safeguarding policies employed by the kernel.
> But I'm very much sure that kernel is aware of such nasty attempts, and
> I'd like to know how does kernel takes care of this.

O_DIRECT is an explicit flag not enabled by default.

It is the app's responsibility to ensure that it does NOT misuse the
feature. Essentially specifying the O_DIRECT flag is the app's way of
saying - "Hey kernel, i know what i am doing. Please step aside and
let me talk to the hardware directly. Please do NOT interfere."

The kernel happily obliges.

Later, the app should NOT go crying back to kernel (and blaming it),
if the app manages to screw-up the direct "relationship" with the
hardware.

regards
ChinmayVS

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Direct IO and Page cache
  2013-07-26  4:02   ` Kumar amit mehta
@ 2013-07-26 10:21     ` Chinmay V S
  2013-07-26 10:31       ` Chinmay V S
  2013-07-26 14:59     ` Valdis.Kletnieks at vt.edu
  1 sibling, 1 reply; 9+ messages in thread
From: Chinmay V S @ 2013-07-26 10:21 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Jul 26, 2013 at 12:02 PM, Kumar amit mehta <gmate.amit@gmail.com> wrote:
> On Fri, Jul 26, 2013 at 05:14:21PM +0800, Chinmay V S wrote:
>> > We have direct I/O(O_DIRECT), for example raw devices(/dev/rawctl) that
>> > map to the block devices and we also have page cache. Now If I've
>> > understood this correctly, direct I/O will bypass this page cache, which
>> > is fine, I'll not get into the performance debate, but what about data
>> > consistency. Kernel cannot and __should'nt__ try to control how the
>> > applications are being written. So one bad day somebody comes up with
>> > an application which does both these two types of IO(one that goes
>> > through page cache and the other that doesn't) and in that application,
>> > one instance is writing directly to the backend device and the other
>> > instance, who is not aware of this write, goes ahead and writes to the
>> > page cache, and that write would be written later to the backend device.
>> > So wouldn't we end up corrupting the on disk data.
>>
>> Yes. And that is the responsibility of the application. While the
>> existence of O_DIRECT may not be common sense, anyone who knows about
>> it *must* know that it bypasses the kernel page-cache and hence *must*
>> know the consequences of doing cached and direct I/O on the same file
>> simultaneously.
>>
>> > I can think of multiple other scenarios which could corrupt the on-disk
>> > data, if there isn't any safeguarding policies employed by the kernel.
>> > But I'm very much sure that kernel is aware of such nasty attempts, and
>> > I'd like to know how does kernel takes care of this.
>>
>> O_DIRECT is an explicit flag not enabled by default.
>>
>> It is the app's responsibility to ensure that it does NOT misuse the
>> feature. Essentially specifying the O_DIRECT flag is the app's way of
>> saying - "Hey kernel, i know what i am doing. Please step aside and
>> let me talk to the hardware directly. Please do NOT interfere."
>>
>> The kernel happily obliges.
>>
>> Later, the app should NOT go crying back to kernel (and blaming it),
>> if the app manages to screw-up the direct "relationship" with the
>> hardware.
>
> So leaving the hardware at the mercy of the application doesn't sound
> like a good practice. This __may__ compromise kernel stability too. Also
> think of this:
>
> In app1:
> fdx = open("blah" , O_RW|O_DIRECT);
> write(fdx,buf,sizeof(buf));
>
> In app2(unaware of app1):
> fdy = open("blah", O_RW);
> write(fdy,buf, sizeof(buf));
>
> I think this isn't highly unlikely to do, and if you agree with me then
> we may end up with same could-be/would-be data-corruption. Now who should
> be blamed here, app1, app2 or the kernel? Or it will be handled
> differently here?

As long as both app1 and app2 are managing separate files (even on the
same underlying storage media), the situation looks good.

>From an app developer's perspective :
In case both the apps do I/O on the same file then it implies
knowledge of the other app. (Otherwise how would the second app know
that the file exists at such and such location?) And hence the second
app really ought to think about what it is going to do.

case1: app1 uses regular I/O;
==> app2 should NOT use direct I/O.

case2: app1 uses direct I/O;
==> app2 should NOT use regular I/O.

>From a kernel developer's perspective :
The kernel driver guarantees coherency between then page-cache and
data transferred using O_DIRECT. Refer to the page-15 of this deck[1]
that talks about the design of O_DIRECT.

In either case the bigger problem lies in the fact that both the apps
need to work out a mutex mechanism to prevent the handful of
readers-writers problems[2] when both try to read/write from the same
file simultaneously.

So it is more important(in fact, downright necessary) to ensure mutual
exclusion between the 2 apps during I/O. Otherwise one of them will
end-up overwriting the changes made by the other, unless both the apps
are doing ONLY read()s.

[1] http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html
[2] http://en.wikipedia.org/wiki/Readers-writers_problem


regards
ChinmayVS

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Direct IO and Page cache
  2013-07-26 10:21     ` Chinmay V S
@ 2013-07-26 10:31       ` Chinmay V S
  2013-07-26  6:04         ` Kumar amit mehta
  0 siblings, 1 reply; 9+ messages in thread
From: Chinmay V S @ 2013-07-26 10:31 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Jul 26, 2013 at 6:21 PM, Chinmay V S <cvs268@gmail.com> wrote:
> On Fri, Jul 26, 2013 at 12:02 PM, Kumar amit mehta <gmate.amit@gmail.com> wrote:
>> On Fri, Jul 26, 2013 at 05:14:21PM +0800, Chinmay V S wrote:
>>> > We have direct I/O(O_DIRECT), for example raw devices(/dev/rawctl) that
>>> > map to the block devices and we also have page cache. Now If I've
>>> > understood this correctly, direct I/O will bypass this page cache, which
>>> > is fine, I'll not get into the performance debate, but what about data
>>> > consistency. Kernel cannot and __should'nt__ try to control how the
>>> > applications are being written. So one bad day somebody comes up with
>>> > an application which does both these two types of IO(one that goes
>>> > through page cache and the other that doesn't) and in that application,
>>> > one instance is writing directly to the backend device and the other
>>> > instance, who is not aware of this write, goes ahead and writes to the
>>> > page cache, and that write would be written later to the backend device.
>>> > So wouldn't we end up corrupting the on disk data.
>>>
>>> Yes. And that is the responsibility of the application. While the
>>> existence of O_DIRECT may not be common sense, anyone who knows about
>>> it *must* know that it bypasses the kernel page-cache and hence *must*
>>> know the consequences of doing cached and direct I/O on the same file
>>> simultaneously.
>>>
>>> > I can think of multiple other scenarios which could corrupt the on-disk
>>> > data, if there isn't any safeguarding policies employed by the kernel.
>>> > But I'm very much sure that kernel is aware of such nasty attempts, and
>>> > I'd like to know how does kernel takes care of this.
>>>
>>> O_DIRECT is an explicit flag not enabled by default.
>>>
>>> It is the app's responsibility to ensure that it does NOT misuse the
>>> feature. Essentially specifying the O_DIRECT flag is the app's way of
>>> saying - "Hey kernel, i know what i am doing. Please step aside and
>>> let me talk to the hardware directly. Please do NOT interfere."
>>>
>>> The kernel happily obliges.
>>>
>>> Later, the app should NOT go crying back to kernel (and blaming it),
>>> if the app manages to screw-up the direct "relationship" with the
>>> hardware.
>>
>> So leaving the hardware at the mercy of the application doesn't sound
>> like a good practice. This __may__ compromise kernel stability too. Also
>> think of this:
>>
>> In app1:
>> fdx = open("blah" , O_RW|O_DIRECT);
>> write(fdx,buf,sizeof(buf));
>>
>> In app2(unaware of app1):
>> fdy = open("blah", O_RW);
>> write(fdy,buf, sizeof(buf));
>>
>> I think this isn't highly unlikely to do, and if you agree with me then
>> we may end up with same could-be/would-be data-corruption. Now who should
>> be blamed here, app1, app2 or the kernel? Or it will be handled
>> differently here?
>
> As long as both app1 and app2 are managing separate files (even on the
> same underlying storage media), the situation looks good.
>
> From an app developer's perspective :
> In case both the apps do I/O on the same file then it implies
> knowledge of the other app. (Otherwise how would the second app know
> that the file exists at such and such location?) And hence the second
> app really ought to think about what it is going to do.
>
> case1: app1 uses regular I/O;
> ==> app2 should NOT use direct I/O.
>
> case2: app1 uses direct I/O;
> ==> app2 should NOT use regular I/O.
>
> From a kernel developer's perspective :
> The kernel driver guarantees coherency between then page-cache and
> data transferred using O_DIRECT. Refer to the page-15 of this deck[1]
> that talks about the design of O_DIRECT.
>
> In either case the bigger problem lies in the fact that both the apps
> need to work out a mutex mechanism to prevent the handful of
> readers-writers problems[2] when both try to read/write from the same
> file simultaneously.
>
> So it is more important(in fact, downright necessary) to ensure mutual
> exclusion between the 2 apps during I/O. Otherwise one of them will
> end-up overwriting the changes made by the other, unless both the apps
> are doing ONLY read()s.
>
> [1] http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html
> [2] http://en.wikipedia.org/wiki/Readers-writers_problem
>
>
> regards
> ChinmayVS

TL;DR

1. Do not worry about coherency between the page-cache and the data
transferred using O_DIRECT. The kernel will invalidate the cache after
an O_DIRECT write and flush the cache before an O_DIRECT read.

2. Use mutexes or semaphores(or any of the numerous options [1]) to
prevent the usual synchronisation problems during IPC using a shared
file.

[1] http://beej.us/guide/bgipc/output/html/singlepage/bgipc.html

regards
ChinmayVS

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Direct IO and Page cache
  2013-07-26  3:15 Direct IO and Page cache Kumar amit mehta
  2013-07-26  9:14 ` Chinmay V S
@ 2013-07-26 14:56 ` Valdis.Kletnieks at vt.edu
  1 sibling, 0 replies; 9+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2013-07-26 14:56 UTC (permalink / raw)
  To: kernelnewbies

On Thu, 25 Jul 2013 23:15:49 -0400, Kumar amit mehta said:

> applications are being written. So one bad day somebody comes up with
> an application which does both these two types of IO(one that goes
> through page cache and the other that doesn't) and in that application,
> one instance is writing directly to the backend device and the other
> instance, who is not aware of this write, goes ahead and writes to the
> page cache, and that write would be written later to the backend device.
> So wouldn't we end up corrupting the on disk data.

Applications that intermix O_DIRECT and non-O_DIRECT to the same file
get what they deserve.  Consider it evolution in action, you're not going
to get any sympathy from the kernel community for this.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130726/78e61101/attachment-0001.bin 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Direct IO and Page cache
  2013-07-26  4:02   ` Kumar amit mehta
  2013-07-26 10:21     ` Chinmay V S
@ 2013-07-26 14:59     ` Valdis.Kletnieks at vt.edu
  2013-07-26 15:52       ` Kumar amit mehta
  1 sibling, 1 reply; 9+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2013-07-26 14:59 UTC (permalink / raw)
  To: kernelnewbies

On Fri, 26 Jul 2013 00:02:31 -0400, Kumar amit mehta said:

> So leaving the hardware at the mercy of the application doesn't sound
> like a good practice. This __may__ compromise kernel stability too. Also
> think of this:

In what possible way does it compromise kernel stability?:
>
> In app1:
> fdx = open("blah" , O_RW|O_DIRECT);
> write(fdx,buf,sizeof(buf));
>
> In app2(unaware of app1):
> fdy = open("blah", O_RW);
> write(fdy,buf, sizeof(buf));
>
> I think this isn't highly unlikely to do, and if you agree with me then
> we may end up with same could-be/would-be data-corruption. Now who should
> be blamed here, app1, app2

You blame the idiot programmer who didn't use file locking, and/or the idiot
user who ran the two programs.

This isn't even about O_DIRECT - try writing two programs that both
basically do a 'write 1M of data, sleep 10 seconds' using stdio to the
same file, and see what happens....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130726/4cd548f7/attachment.bin 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Direct IO and Page cache
  2013-07-26 14:59     ` Valdis.Kletnieks at vt.edu
@ 2013-07-26 15:52       ` Kumar amit mehta
  0 siblings, 0 replies; 9+ messages in thread
From: Kumar amit mehta @ 2013-07-26 15:52 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Jul 26, 2013 at 10:59:25AM -0400, Valdis.Kletnieks at vt.edu wrote:
> On Fri, 26 Jul 2013 00:02:31 -0400, Kumar amit mehta said:
> 
> > So leaving the hardware at the mercy of the application doesn't sound
> > like a good practice. This __may__ compromise kernel stability too. Also
> > think of this:
> 
> In what possible way does it compromise kernel stability?

Sorry for forgetting that we are talking about userspace programs.
Recently I came across an old HBA driver, which for some reason failed
to properly intialize the firmware(maybe someone didn't properly copied
the firmware in the /lib/firmware directory, i don't know) and because
of that the firmware initialization code tried the same thing for a
couple of iteration, failiing which it tried to reset the board, but
somewhere during this recovery process started disabling the interrupt
on the CPU. Later, watchdog timer kicked in, considering this as hard
lock up and issued a system reset.

When I first saw the firmware initialization error, I thought what harm
this could bring to my overall system other than the particular HBA
turning into an unusable one, but It turned out to be much more disruptive.
When I said "This __may__ compromise kernel stability too", I was
thinking, something on those lines and completely forgot that we are
currently in uspace.

!!amit

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-07-26 15:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-26  3:15 Direct IO and Page cache Kumar amit mehta
2013-07-26  9:14 ` Chinmay V S
2013-07-26  4:02   ` Kumar amit mehta
2013-07-26 10:21     ` Chinmay V S
2013-07-26 10:31       ` Chinmay V S
2013-07-26  6:04         ` Kumar amit mehta
2013-07-26 14:59     ` Valdis.Kletnieks at vt.edu
2013-07-26 15:52       ` Kumar amit mehta
2013-07-26 14:56 ` Valdis.Kletnieks at vt.edu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).