All of lore.kernel.org
 help / color / mirror / Atom feed
* somewhat OT query on journalling
@ 2006-07-19 15:27 Payal Rathod
  2006-07-19 17:30   ` Toby Thain
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Payal Rathod @ 2006-07-19 15:27 UTC (permalink / raw)
  To: reiserfs-list

Hi,
I was just reading about filesystems and my ideas are a bit confused.
I read quite a few articles on net but still my basic doubts are not 
completely clarified. I thought this would be the right place to ask, since many 
journalling gurus might be here.
Can someone tell me do journalling fs maintain journal about the 
metadata or the all the data?
Also, is it true that now-a-days there is no such thing as inode "block"
since  for faster access the inodes are kept near the data itself?
How is the journal maintained? How is it prevented from being
too big and why are these fs not slower than traditional fs since it 
involves an overhead of writing to a journal?
And lastly don't the journalling fs give a false sense of security to 
the user, saying that the data is written to disk when in reality only an 
entry is made in journal and data is still not committed to disk.

Thanks a lot for the patience and eagerly waiting for any replies.

With warm regards,
-Payal


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
@ 2006-07-19 17:30   ` Toby Thain
  2006-07-19 19:09     ` David Masover
  0 siblings, 1 reply; 10+ messages in thread
From: Toby Thain @ 2006-07-19 17:30 UTC (permalink / raw)
  To: reiserfs-list


On 19-Jul-06, at 11:27 AM, Payal Rathod wrote:

> Hi, ...
> And lastly don't the journalling fs give a false sense of security to
> the user, saying that the data is written to disk when in reality  
> only an
> entry is made in journal and data is still not committed to disk.

This last one is easy to answer: No. Regardless of the filesystem  
you're using, there is no guarantee your data hits the disk until you  
fsync(). Journalling filesystems don't change this. (And even after  
that, it depends on the device doing the right thing :)

>
> Thanks a lot for the patience and eagerly waiting for any replies.
>
> With warm regards,
> -Payal
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
  2006-07-19 17:30   ` Toby Thain
@ 2006-07-19 19:09     ` David Masover
  0 siblings, 0 replies; 10+ messages in thread
From: David Masover @ 2006-07-19 19:09 UTC (permalink / raw)
  To: Toby Thain; +Cc: reiserfs-list

Toby Thain wrote:
> 
> On 19-Jul-06, at 11:27 AM, Payal Rathod wrote:
> 
>> Hi, ...
>> And lastly don't the journalling fs give a false sense of security to
>> the user, saying that the data is written to disk when in reality only an
>> entry is made in journal and data is still not committed to disk.
> 
> This last one is easy to answer: No. Regardless of the filesystem you're 
> using, there is no guarantee your data hits the disk until you fsync(). 
> Journalling filesystems don't change this. (And even after that, it 
> depends on the device doing the right thing :)

Interestingly, most modern IDE hard drives can not turn off write 
caching.  But, I'm guessing they have a big enough capacitor to flush 
that if you lose power.

I often turn off fsync, because it gets too abused.  There was a bug in 
Evolution where, when dragging columns, every time the display refreshed 
(as you were dragging), it would flush and fsync.  Now tell me, do I 
really need to be absolutely sure that, when recovering from a crash, my 
Evolution column widths are EXACTLY where they were while I was dragging 
the columns?

My philosophy is, unless you have a UPS device, loss of power will 
always result in lost data.  Crashes will also more often than not 
result in lost data.  So do frequent backups and have a 
managed/monitored UPS, so that when you lose power, your system flushes 
everything to disk and shuts down.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
  2006-07-19 15:27 somewhat OT query on journalling Payal Rathod
  2006-07-19 17:30   ` Toby Thain
@ 2006-07-19 19:19 ` David Masover
  2006-07-19 20:28 ` Hans Reiser
  2 siblings, 0 replies; 10+ messages in thread
From: David Masover @ 2006-07-19 19:19 UTC (permalink / raw)
  To: Payal Rathod; +Cc: reiserfs-list

Payal Rathod wrote:
> Hi,
> I was just reading about filesystems and my ideas are a bit confused.

> And lastly don't the journalling fs give a false sense of security to 
> the user, saying that the data is written to disk when in reality only an 
> entry is made in journal and data is still not committed to disk.

A journaling fs does guarantee one thing:  The filesystem itself is 
consistent.  Any changes you make to the directory structure either 
succeed completely or fail completely.

This means that, while you may lose data, at least you won't lose your 
whole partition.  A non-journaling fs can lose a whole partition this 
way, which is one of the main reasons some people like to split their 
disk up into lots of tiny, specialized partitions.

Last I checked (a year ago, at least) there was an API planned for 
Reiser4 which would make it possible for applications to define their 
own transaction.  Any string of operations you could perform on the fs 
(probably with some limitations) or on a single file could be combined 
into one transaction, which either succeeds entirely, or fails entirely.

So, for instance, if you save a file, there may not be a guarantee that 
the file is written to disk, but there's a guarantee that if the file is 
written to disk, either the whole file was saved, or none of it at all. 
  This is a good thing, because the alternative is to allow some of the 
file to be saved, which almost always means a corrupt file.

Journaling could give a false sense of security if you think it means 
you'll never lose data.  Power fails, disks fail, and the question is 
when you'll lose data, not if.  Make backups, no matter what your FS.

But given the choice, journaling is much better than no journaling.  The 
disadvantage is it slows the system down when implemented poorly.  So 
the solution is to use a journaled filesystem that does it right.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
  2006-07-19 15:27 somewhat OT query on journalling Payal Rathod
  2006-07-19 17:30   ` Toby Thain
  2006-07-19 19:19 ` David Masover
@ 2006-07-19 20:28 ` Hans Reiser
  2006-07-21 12:45   ` Payal Rathod
  2 siblings, 1 reply; 10+ messages in thread
From: Hans Reiser @ 2006-07-19 20:28 UTC (permalink / raw)
  To: Payal Rathod; +Cc: reiserfs-list

Payal Rathod wrote:

>Hi,
>I was just reading about filesystems and my ideas are a bit confused.
>I read quite a few articles on net but still my basic doubts are not 
>completely clarified. I thought this would be the right place to ask, since many 
>journalling gurus might be here.
>Can someone tell me do journalling fs maintain journal about the 
>metadata or the all the data?
>  
>
V3 defaults to metadata only, V4 does data also because we can do it
without performance loss.

>Also, is it true that now-a-days there is no such thing as inode "block"
>since  for faster access the inodes are kept near the data itself?
>  
>
reiserfs does not use inodes at all.  see our website for more.

>How is the journal maintained? How is it prevented from being
>too big and why are these fs not slower than traditional fs since it 
>involves an overhead of writing to a journal?
>  
>
see website.  there is overhead.  for v4 it is not a lot though.

>And lastly don't the journalling fs give a false sense of security to 
>the user, saying that the data is written to disk when in reality only an 
>entry is made in journal and data is still not committed to disk.
>  
>
someone else anwered this....

>Thanks a lot for the patience and eagerly waiting for any replies.
>
>With warm regards,
>-Payal
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
  2006-07-19 20:28 ` Hans Reiser
@ 2006-07-21 12:45   ` Payal Rathod
  2006-07-21 18:54     ` David Masover
  0 siblings, 1 reply; 10+ messages in thread
From: Payal Rathod @ 2006-07-21 12:45 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

On Wed, Jul 19, 2006 at 01:28:38PM -0700, Hans Reiser wrote:
> V3 defaults to metadata only, V4 does data also because we can do it
> without performance loss.

wwwwwwowwwww!!! 

> 
> >Also, is it true that now-a-days there is no such thing as inode "block"
> >since  for faster access the inodes are kept near the data itself?
> >  
> >
> reiserfs does not use inodes at all.  see our website for more.

Any particular page you are referring to? I didn't see a page for that 
there.

With warm regards,
-Payal

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
  2006-07-21 12:45   ` Payal Rathod
@ 2006-07-21 18:54     ` David Masover
  2006-07-21 21:26       ` Andreas Schäfer
  0 siblings, 1 reply; 10+ messages in thread
From: David Masover @ 2006-07-21 18:54 UTC (permalink / raw)
  To: Payal Rathod; +Cc: Hans Reiser, reiserfs-list

Payal Rathod wrote:
> On Wed, Jul 19, 2006 at 01:28:38PM -0700, Hans Reiser wrote:
>> V3 defaults to metadata only, V4 does data also because we can do it
>> without performance loss.
> 
> wwwwwwowwwww!!! 

Don't get too excited -- the transactions probably aren't done yet. 
Without those, no filesystem that claims to journal data is really any 
better than a filesystem which only journals metadata.  Even once they 
are implemented (or even if they are already), applications have to 
support them directly.

Regarding transactions in general, you should probably look for some 
papers or tutorials for how they are implemented in databases.  You 
might also read the Reiser4 whitepaper for an idea of how they could be 
implemented in a filesystem.

But no transactional system will work unless applications at least know 
about it.  This is why apps currently rely on features that are known to 
be atomic in a filesystem.  For instance, look at maildirs -- they are 
effectively 100% data journaled on any filesystem that journals metadata 
properly.

>> reiserfs does not use inodes at all.  see our website for more.
> 
> Any particular page you are referring to? I didn't see a page for that 
> there.

Front page of Namesys.com has the Reiser4 whitepaper.  It's a bit out of 
date, as some features still listed for 4.0 have been pushed to 4.1 or 
later, but I don't think the overall plan has changed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
  2006-07-21 18:54     ` David Masover
@ 2006-07-21 21:26       ` Andreas Schäfer
  2006-07-21 21:53         ` David Masover
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Schäfer @ 2006-07-21 21:26 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

> Don't get too excited -- the transactions probably aren't done yet. Without those, no filesystem that claims to journal data is really any better than a filesystem which only journals metadata.  Even once 
> they are implemented (or even if they are already), applications have to support them directly.

Actually, I think transactions in a filesystem context are a bit
different from the transactions you know form databases. Generally
speaking, a transaction denotes a transition from one valid state to
another. This transition should either be performed completely or --
in case of errors -- performed not at all (a.k.a. "roll back").

Databases allow application defined transactions (i.e. the
application specifies when a valid state is being left and when one is
reached again).

IMHO for filesystems a transaction denotes the flushing of write
buffers. "metadata only" transactions/journaling mean that even after
a crash the file itself will be readable (and not pointing to
e.g. sector -4711 ...). "data _and_ metadata" now means that the
filesystem does also guarantee that the data itself is written
completely.

Please correct my if I got this totally wrong.

-Andreas


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
  2006-07-21 21:26       ` Andreas Schäfer
@ 2006-07-21 21:53         ` David Masover
  2006-07-22  0:54           ` Hans Reiser
  0 siblings, 1 reply; 10+ messages in thread
From: David Masover @ 2006-07-21 21:53 UTC (permalink / raw)
  To: Andreas Schäfer; +Cc: reiserfs-list

Andreas Schäfer wrote:
>> Don't get too excited -- the transactions probably aren't done yet. Without those, no filesystem that claims to journal data is really any better than a filesystem which only journals metadata.  Even once 
>> they are implemented (or even if they are already), applications have to support them directly.
> 
> Actually, I think transactions in a filesystem context are a bit
> different from the transactions you know form databases. Generally

Yes, generally speaking, you're entirely right.  But in the case of 
Reiser4, at least for a single file, you can perform a number of writes 
and declare them a single transaction.

I believe there are some other oddities, such as:  The FS doesn't always 
do the rollback, sometimes it delegates to the app for that, and it may 
be possible to perform transactions on a number of files.  I don't 
believe transactions imply locking, only serialization -- that is, the 
last transaction that goes through is the one that counts, and 
overwrites any transactions that completed before, even if they were 
started after.  So:

Alice opens foo
Alice starts writing
Bob opens foo
Bob starts writing
Alice writes some more
Bob writes more
Bob closes foo
Alice closes foo

Even though Bob opened foo last, and did the last write, Alice was the 
last to close a transaction.  If you want to avoid this situation, you 
use locking.  Locking isn't mandatory, but neither are transactions.

But I could be entirely wrong also.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: somewhat OT query on journalling
  2006-07-21 21:53         ` David Masover
@ 2006-07-22  0:54           ` Hans Reiser
  0 siblings, 0 replies; 10+ messages in thread
From: Hans Reiser @ 2006-07-22  0:54 UTC (permalink / raw)
  To: David Masover; +Cc: Andreas Schäfer, reiserfs-list

David Masover wrote:

> Andreas Schäfer wrote:
>
>>> Don't get too excited -- the transactions probably aren't done yet.
>>> Without those, no filesystem that claims to journal data is really
>>> any better than a filesystem which only journals metadata.  Even
>>> once they are implemented (or even if they are already),
>>> applications have to support them directly.
>>
>>
>> Actually, I think transactions in a filesystem context are a bit
>> different from the transactions you know form databases. Generally
>
>
> Yes, generally speaking, you're entirely right.  But in the case of
> Reiser4, at least for a single file, you can perform a number of
> writes and declare them a single transaction.

If we finish that code you can.;-)  One of the problems that we need to
deal with is that we are shipping a product pared of all functionality
not essential so that we can get it out the door, and the website still
describes the whole vision.  We will do the whole vision, but first we
need to get some income flowing.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-07-22  0:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-19 15:27 somewhat OT query on journalling Payal Rathod
2006-07-19 17:30 ` Toby Thain
2006-07-19 17:30   ` Toby Thain
2006-07-19 19:09     ` David Masover
2006-07-19 19:19 ` David Masover
2006-07-19 20:28 ` Hans Reiser
2006-07-21 12:45   ` Payal Rathod
2006-07-21 18:54     ` David Masover
2006-07-21 21:26       ` Andreas Schäfer
2006-07-21 21:53         ` David Masover
2006-07-22  0:54           ` Hans Reiser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.