linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Sync does not flush to disk!?
@ 2012-06-08  9:53 Asdo
  2012-06-08 11:39 ` Asdo
  2012-06-08 12:33 ` NeilBrown
  0 siblings, 2 replies; 8+ messages in thread
From: Asdo @ 2012-06-08  9:53 UTC (permalink / raw)
  To: linux-raid, linux-ext4

Hello all
I don't exactly know where to ask this question...

I have a situation of

sda1 + sdb1 --> MD raid1
Above that is an ext4 filesystem. No LVM.

I am making changes to that filesystem (vi a file) and then i am doing
sync
sync
(twice)

then I am starting KVM in snapshot mode on the sda and sdb disks so to 
virtualize the same system on which I am operating.

kvm -m 1024 -hda /dev/sda -hdb /dev/sdb -snapshot

The strange thing is that the virtual machine is NOT seeing the latest 
changes to that file!

Then I tried to do :

for i in /dev/md? /dev/sda /dev/sdb ; do blockdev --flushbufs $i ; done

and restart KVM,
and NOW it is seeing the changes.

In the past I had similar problems, and not knowing about blockdev 
--flushbufs I ended up dismounting the filesystems and stopping the 
RAIDs. That also appeared to actually commit stuff to disk.

So sync is not enough? Would somebody explain to me better?

Thank you

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sync does not flush to disk!?
  2012-06-08  9:53 Sync does not flush to disk!? Asdo
@ 2012-06-08 11:39 ` Asdo
  2012-06-08 12:33 ` NeilBrown
  1 sibling, 0 replies; 8+ messages in thread
From: Asdo @ 2012-06-08 11:39 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid, linux-ext4

On 06/08/12 11:53, Asdo wrote:
> .....
> Then I tried to do :
>
> for i in /dev/md? /dev/sda /dev/sdb ; do blockdev --flushbufs $i ; done
> ....
> and NOW it is seeing the changes.

After some further tests:
Flushbuf'ing just the MD devices only, generated an even more 
intermediate situation in which the file being changed assumed garbage 
content coming from another old file.
Flushbufing just /dev/sda /dev/sdb has worked a few times I tried. I'm 
not sure that it is enough in general.
Flushbufing everything appears to work reliably.

Still I am puzzled. Wasn't "sync" from bash enough to commit to disk 
even in case of power failure?

Or is there any chance that KVM "sees" a version of sda and sdb which is 
actually *older* than the actual content on the platters?

Thank you

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sync does not flush to disk!?
  2012-06-08  9:53 Sync does not flush to disk!? Asdo
  2012-06-08 11:39 ` Asdo
@ 2012-06-08 12:33 ` NeilBrown
  2012-06-08 13:49   ` Phil Turmel
  1 sibling, 1 reply; 8+ messages in thread
From: NeilBrown @ 2012-06-08 12:33 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 1671 bytes --]

On Fri, 08 Jun 2012 11:53:14 +0200 Asdo <asdo@shiftmail.org> wrote:

> Hello all
> I don't exactly know where to ask this question...
> 
> I have a situation of
> 
> sda1 + sdb1 --> MD raid1
> Above that is an ext4 filesystem. No LVM.
> 
> I am making changes to that filesystem (vi a file) and then i am doing
> sync
> sync
> (twice)
> 
> then I am starting KVM in snapshot mode on the sda and sdb disks so to 
> virtualize the same system on which I am operating.
> 
> kvm -m 1024 -hda /dev/sda -hdb /dev/sdb -snapshot
> 
> The strange thing is that the virtual machine is NOT seeing the latest 
> changes to that file!
> 
> Then I tried to do :
> 
> for i in /dev/md? /dev/sda /dev/sdb ; do blockdev --flushbufs $i ; done
> 
> and restart KVM,
> and NOW it is seeing the changes.
> 
> In the past I had similar problems, and not knowing about blockdev 
> --flushbufs I ended up dismounting the filesystems and stopping the 
> RAIDs. That also appeared to actually commit stuff to disk.
> 
> So sync is not enough? Would somebody explain to me better?

There is a cache associated with /dev/sda and /dev/sdb which md does not make
any use of.  The filesystem doesn't use it either.  It is only used from
user-space reads from /dev/sda or /dev/sdb.
When you "sync" the filesystem, the new data is written out, but the cache it
not changes.  When you then read from /dev/sda, you might get cached data,
which is stale.

blockdev --flushbufs
clears that cache so that subsequent reads come from the device, not from the
cache.

i.e. it is read caching that is causing the confusion you see, not write
caching.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sync does not flush to disk!?
  2012-06-08 12:33 ` NeilBrown
@ 2012-06-08 13:49   ` Phil Turmel
  2012-06-08 13:57     ` Asdo
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Turmel @ 2012-06-08 13:49 UTC (permalink / raw)
  To: Asdo; +Cc: NeilBrown, linux-raid, linux-ext4

On 06/08/2012 08:33 AM, NeilBrown wrote:
> On Fri, 08 Jun 2012 11:53:14 +0200 Asdo <asdo@shiftmail.org> wrote:
> 
>> Hello all
>> I don't exactly know where to ask this question...
>>
>> I have a situation of
>>
>> sda1 + sdb1 --> MD raid1
>> Above that is an ext4 filesystem. No LVM.
>>
>> I am making changes to that filesystem (vi a file) and then i am doing
>> sync
>> sync
>> (twice)
>>
>> then I am starting KVM in snapshot mode on the sda and sdb disks so to 
>> virtualize the same system on which I am operating.
>>
>> kvm -m 1024 -hda /dev/sda -hdb /dev/sdb -snapshot
>>
>> The strange thing is that the virtual machine is NOT seeing the latest 
>> changes to that file!
>>
>> Then I tried to do :
>>
>> for i in /dev/md? /dev/sda /dev/sdb ; do blockdev --flushbufs $i ; done
>>
>> and restart KVM,
>> and NOW it is seeing the changes.
>>
>> In the past I had similar problems, and not knowing about blockdev 
>> --flushbufs I ended up dismounting the filesystems and stopping the 
>> RAIDs. That also appeared to actually commit stuff to disk.

*Exactly*

>> So sync is not enough? Would somebody explain to me better?
> 
> There is a cache associated with /dev/sda and /dev/sdb which md does not make
> any use of.  The filesystem doesn't use it either.  It is only used from
> user-space reads from /dev/sda or /dev/sdb.
> When you "sync" the filesystem, the new data is written out, but the cache it
> not changes.  When you then read from /dev/sda, you might get cached data,
> which is stale.
> 
> blockdev --flushbufs
> clears that cache so that subsequent reads come from the device, not from the
> cache.
> 
> i.e. it is read caching that is causing the confusion you see, not write
> caching.

To put it another way:  You can't safely access ext filesystems via
raw devices in two systems.  The kernel cache won't be synchronized,
and you almost certainly *will* corrupt the contents.

You can unmount the FS then pass the raid to the VM, or dismantle the
raid as well, and let the VM assemble it.

There are cluster filesystems that allow multiple mounts of shared
devices, though.  I haven't played with them, so you might want to
do some googling.

Phil

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sync does not flush to disk!?
  2012-06-08 13:49   ` Phil Turmel
@ 2012-06-08 13:57     ` Asdo
  2012-06-08 14:11       ` Jan Kara
  2012-06-08 18:28       ` Ted Ts'o
  0 siblings, 2 replies; 8+ messages in thread
From: Asdo @ 2012-06-08 13:57 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid, linux-ext4

On 06/08/12 15:49, Phil Turmel wrote:
>
> To put it another way:  You can't safely access ext filesystems via
> raw devices in two systems.  The kernel cache won't be synchronized,
> and you almost certainly *will* corrupt the contents.

Thanks both of you for your explanations

I might say that it seems to me a bad design: never before I saw a cache 
that is not updated by writes.
Here the cache content is *older* than the data on the real devices!?
if it was *newer*, there are known cases (writeback cache not flushed 
yet), but *older*... never seen.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sync does not flush to disk!?
  2012-06-08 13:57     ` Asdo
@ 2012-06-08 14:11       ` Jan Kara
  2012-06-08 14:16         ` Jan Kara
  2012-06-08 18:28       ` Ted Ts'o
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Kara @ 2012-06-08 14:11 UTC (permalink / raw)
  To: Asdo; +Cc: Phil Turmel, NeilBrown, linux-raid, linux-ext4

On Fri 08-06-12 15:57:04, Asdo wrote:
> On 06/08/12 15:49, Phil Turmel wrote:
> >
> >To put it another way:  You can't safely access ext filesystems via
> >raw devices in two systems.  The kernel cache won't be synchronized,
> >and you almost certainly *will* corrupt the contents.
> 
> Thanks both of you for your explanations
> 
> I might say that it seems to me a bad design: never before I saw a
> cache that is not updated by writes.
> Here the cache content is *older* than the data on the real devices!?
> if it was *newer*, there are known cases (writeback cache not
> flushed yet), but *older*... never seen.
  Well, the problem is in inconsistency of caches. There is one cache -
page cache - used by filesystems to read & write file data which is
addressed by inode, offset. And there is another cache caching the whole
device addressed by device, offset. It would be too costly to keep both
these caches consistent and most people don't care so we don't.

  BTW, if you configured KVM to use direct IO or virt IO when accessing the
devices (a good idea anyway), you wouldn't have the problems either.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sync does not flush to disk!?
  2012-06-08 14:11       ` Jan Kara
@ 2012-06-08 14:16         ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2012-06-08 14:16 UTC (permalink / raw)
  To: Asdo; +Cc: Phil Turmel, NeilBrown, linux-raid, linux-ext4

On Fri 08-06-12 16:11:39, Jan Kara wrote:
> On Fri 08-06-12 15:57:04, Asdo wrote:
> > On 06/08/12 15:49, Phil Turmel wrote:
> > >
> > >To put it another way:  You can't safely access ext filesystems via
> > >raw devices in two systems.  The kernel cache won't be synchronized,
> > >and you almost certainly *will* corrupt the contents.
> > 
> > Thanks both of you for your explanations
> > 
> > I might say that it seems to me a bad design: never before I saw a
> > cache that is not updated by writes.
> > Here the cache content is *older* than the data on the real devices!?
> > if it was *newer*, there are known cases (writeback cache not
> > flushed yet), but *older*... never seen.
>   Well, the problem is in inconsistency of caches. There is one cache -
> page cache - used by filesystems to read & write file data which is
> addressed by inode, offset. And there is another cache caching the whole
> device addressed by device, offset. It would be too costly to keep both
> these caches consistent and most people don't care so we don't.
> 
>   BTW, if you configured KVM to use direct IO or virt IO when accessing the
> devices (a good idea anyway), you wouldn't have the problems either.
  Hmm, I didn't notice you actually keep the fs mounted on host when
starting the guest. That is really asking for trouble - host's data that is
cached in memory (and I'm not speaking just about data but more importantly
also allocation information etc.) will not be update when guest changes the
filesystem so the filesystem will get almost certainly corrupted.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sync does not flush to disk!?
  2012-06-08 13:57     ` Asdo
  2012-06-08 14:11       ` Jan Kara
@ 2012-06-08 18:28       ` Ted Ts'o
  1 sibling, 0 replies; 8+ messages in thread
From: Ted Ts'o @ 2012-06-08 18:28 UTC (permalink / raw)
  To: Asdo; +Cc: Phil Turmel, NeilBrown, linux-raid, linux-ext4

On Fri, Jun 08, 2012 at 03:57:04PM +0200, Asdo wrote:
> 
> I might say that it seems to me a bad design: never before I saw a
> cache that is not updated by writes.
> Here the cache content is *older* than the data on the real devices!?
> if it was *newer*, there are known cases (writeback cache not
> flushed yet), but *older*... never seen.

It's not just a matter of keeping the caches in sync --- it's also a
simple matter of locking.  If a file system is mounted on two systems
at the same time, there's no way (without using a cluster lock
manager, which is what a cluster file system like ocfs2 uses) to avoid
both systems from trying to modify a particular of the file system (an
inode or a directory, for example) at the same time.

As a result, there's no way for a local disk file system to know when
a block has been modified out from under it, so that it can update its
inode cache (where the in-memory inode data structure looks quite
different from the on-disk inode table).

There is overhead in using a cluster file system, since it has to do
all of these extra checks to see if the block device has gotten
magically modified out from under it.  So that's why most people won't
use a cluster file system if it is only going to be mounted on one
system at a time.

But if you are going to have a file system mounted in both the guest
and host file system at the same time, you *have* to use a cluster
file system.  Alternately, you could have the guest access the file
system as mounted on the host OS via NFS.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-06-08 18:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-08  9:53 Sync does not flush to disk!? Asdo
2012-06-08 11:39 ` Asdo
2012-06-08 12:33 ` NeilBrown
2012-06-08 13:49   ` Phil Turmel
2012-06-08 13:57     ` Asdo
2012-06-08 14:11       ` Jan Kara
2012-06-08 14:16         ` Jan Kara
2012-06-08 18:28       ` Ted Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).