* somewhat OT query on journalling
@ 2006-07-19 15:27 Payal Rathod
2006-07-19 17:30 ` Toby Thain
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Payal Rathod @ 2006-07-19 15:27 UTC (permalink / raw)
To: reiserfs-list
Hi,
I was just reading about filesystems and my ideas are a bit confused.
I read quite a few articles on net but still my basic doubts are not
completely clarified. I thought this would be the right place to ask, since many
journalling gurus might be here.
Can someone tell me do journalling fs maintain journal about the
metadata or the all the data?
Also, is it true that now-a-days there is no such thing as inode "block"
since for faster access the inodes are kept near the data itself?
How is the journal maintained? How is it prevented from being
too big and why are these fs not slower than traditional fs since it
involves an overhead of writing to a journal?
And lastly don't the journalling fs give a false sense of security to
the user, saying that the data is written to disk when in reality only an
entry is made in journal and data is still not committed to disk.
Thanks a lot for the patience and eagerly waiting for any replies.
With warm regards,
-Payal
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: somewhat OT query on journalling @ 2006-07-19 17:30 ` Toby Thain 2006-07-19 19:09 ` David Masover 0 siblings, 1 reply; 10+ messages in thread From: Toby Thain @ 2006-07-19 17:30 UTC (permalink / raw) To: reiserfs-list On 19-Jul-06, at 11:27 AM, Payal Rathod wrote: > Hi, ... > And lastly don't the journalling fs give a false sense of security to > the user, saying that the data is written to disk when in reality > only an > entry is made in journal and data is still not committed to disk. This last one is easy to answer: No. Regardless of the filesystem you're using, there is no guarantee your data hits the disk until you fsync(). Journalling filesystems don't change this. (And even after that, it depends on the device doing the right thing :) > > Thanks a lot for the patience and eagerly waiting for any replies. > > With warm regards, > -Payal > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling @ 2006-07-19 17:30 ` Toby Thain 2006-07-19 19:09 ` David Masover 0 siblings, 1 reply; 10+ messages in thread From: Toby Thain @ 2006-07-19 17:30 UTC (permalink / raw) To: reiserfs-list On 19-Jul-06, at 11:27 AM, Payal Rathod wrote: > Hi, ... > And lastly don't the journalling fs give a false sense of security to > the user, saying that the data is written to disk when in reality > only an > entry is made in journal and data is still not committed to disk. This last one is easy to answer: No. Regardless of the filesystem you're using, there is no guarantee your data hits the disk until you fsync(). Journalling filesystems don't change this. (And even after that, it depends on the device doing the right thing :) > > Thanks a lot for the patience and eagerly waiting for any replies. > > With warm regards, > -Payal > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling 2006-07-19 17:30 ` Toby Thain @ 2006-07-19 19:09 ` David Masover 0 siblings, 0 replies; 10+ messages in thread From: David Masover @ 2006-07-19 19:09 UTC (permalink / raw) To: Toby Thain; +Cc: reiserfs-list Toby Thain wrote: > > On 19-Jul-06, at 11:27 AM, Payal Rathod wrote: > >> Hi, ... >> And lastly don't the journalling fs give a false sense of security to >> the user, saying that the data is written to disk when in reality only an >> entry is made in journal and data is still not committed to disk. > > This last one is easy to answer: No. Regardless of the filesystem you're > using, there is no guarantee your data hits the disk until you fsync(). > Journalling filesystems don't change this. (And even after that, it > depends on the device doing the right thing :) Interestingly, most modern IDE hard drives can not turn off write caching. But, I'm guessing they have a big enough capacitor to flush that if you lose power. I often turn off fsync, because it gets too abused. There was a bug in Evolution where, when dragging columns, every time the display refreshed (as you were dragging), it would flush and fsync. Now tell me, do I really need to be absolutely sure that, when recovering from a crash, my Evolution column widths are EXACTLY where they were while I was dragging the columns? My philosophy is, unless you have a UPS device, loss of power will always result in lost data. Crashes will also more often than not result in lost data. So do frequent backups and have a managed/monitored UPS, so that when you lose power, your system flushes everything to disk and shuts down. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling 2006-07-19 15:27 somewhat OT query on journalling Payal Rathod 2006-07-19 17:30 ` Toby Thain @ 2006-07-19 19:19 ` David Masover 2006-07-19 20:28 ` Hans Reiser 2 siblings, 0 replies; 10+ messages in thread From: David Masover @ 2006-07-19 19:19 UTC (permalink / raw) To: Payal Rathod; +Cc: reiserfs-list Payal Rathod wrote: > Hi, > I was just reading about filesystems and my ideas are a bit confused. > And lastly don't the journalling fs give a false sense of security to > the user, saying that the data is written to disk when in reality only an > entry is made in journal and data is still not committed to disk. A journaling fs does guarantee one thing: The filesystem itself is consistent. Any changes you make to the directory structure either succeed completely or fail completely. This means that, while you may lose data, at least you won't lose your whole partition. A non-journaling fs can lose a whole partition this way, which is one of the main reasons some people like to split their disk up into lots of tiny, specialized partitions. Last I checked (a year ago, at least) there was an API planned for Reiser4 which would make it possible for applications to define their own transaction. Any string of operations you could perform on the fs (probably with some limitations) or on a single file could be combined into one transaction, which either succeeds entirely, or fails entirely. So, for instance, if you save a file, there may not be a guarantee that the file is written to disk, but there's a guarantee that if the file is written to disk, either the whole file was saved, or none of it at all. This is a good thing, because the alternative is to allow some of the file to be saved, which almost always means a corrupt file. Journaling could give a false sense of security if you think it means you'll never lose data. Power fails, disks fail, and the question is when you'll lose data, not if. Make backups, no matter what your FS. But given the choice, journaling is much better than no journaling. The disadvantage is it slows the system down when implemented poorly. So the solution is to use a journaled filesystem that does it right. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling 2006-07-19 15:27 somewhat OT query on journalling Payal Rathod 2006-07-19 17:30 ` Toby Thain 2006-07-19 19:19 ` David Masover @ 2006-07-19 20:28 ` Hans Reiser 2006-07-21 12:45 ` Payal Rathod 2 siblings, 1 reply; 10+ messages in thread From: Hans Reiser @ 2006-07-19 20:28 UTC (permalink / raw) To: Payal Rathod; +Cc: reiserfs-list Payal Rathod wrote: >Hi, >I was just reading about filesystems and my ideas are a bit confused. >I read quite a few articles on net but still my basic doubts are not >completely clarified. I thought this would be the right place to ask, since many >journalling gurus might be here. >Can someone tell me do journalling fs maintain journal about the >metadata or the all the data? > > V3 defaults to metadata only, V4 does data also because we can do it without performance loss. >Also, is it true that now-a-days there is no such thing as inode "block" >since for faster access the inodes are kept near the data itself? > > reiserfs does not use inodes at all. see our website for more. >How is the journal maintained? How is it prevented from being >too big and why are these fs not slower than traditional fs since it >involves an overhead of writing to a journal? > > see website. there is overhead. for v4 it is not a lot though. >And lastly don't the journalling fs give a false sense of security to >the user, saying that the data is written to disk when in reality only an >entry is made in journal and data is still not committed to disk. > > someone else anwered this.... >Thanks a lot for the patience and eagerly waiting for any replies. > >With warm regards, >-Payal > > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling 2006-07-19 20:28 ` Hans Reiser @ 2006-07-21 12:45 ` Payal Rathod 2006-07-21 18:54 ` David Masover 0 siblings, 1 reply; 10+ messages in thread From: Payal Rathod @ 2006-07-21 12:45 UTC (permalink / raw) To: Hans Reiser; +Cc: reiserfs-list On Wed, Jul 19, 2006 at 01:28:38PM -0700, Hans Reiser wrote: > V3 defaults to metadata only, V4 does data also because we can do it > without performance loss. wwwwwwowwwww!!! > > >Also, is it true that now-a-days there is no such thing as inode "block" > >since for faster access the inodes are kept near the data itself? > > > > > reiserfs does not use inodes at all. see our website for more. Any particular page you are referring to? I didn't see a page for that there. With warm regards, -Payal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling 2006-07-21 12:45 ` Payal Rathod @ 2006-07-21 18:54 ` David Masover 2006-07-21 21:26 ` Andreas Schäfer 0 siblings, 1 reply; 10+ messages in thread From: David Masover @ 2006-07-21 18:54 UTC (permalink / raw) To: Payal Rathod; +Cc: Hans Reiser, reiserfs-list Payal Rathod wrote: > On Wed, Jul 19, 2006 at 01:28:38PM -0700, Hans Reiser wrote: >> V3 defaults to metadata only, V4 does data also because we can do it >> without performance loss. > > wwwwwwowwwww!!! Don't get too excited -- the transactions probably aren't done yet. Without those, no filesystem that claims to journal data is really any better than a filesystem which only journals metadata. Even once they are implemented (or even if they are already), applications have to support them directly. Regarding transactions in general, you should probably look for some papers or tutorials for how they are implemented in databases. You might also read the Reiser4 whitepaper for an idea of how they could be implemented in a filesystem. But no transactional system will work unless applications at least know about it. This is why apps currently rely on features that are known to be atomic in a filesystem. For instance, look at maildirs -- they are effectively 100% data journaled on any filesystem that journals metadata properly. >> reiserfs does not use inodes at all. see our website for more. > > Any particular page you are referring to? I didn't see a page for that > there. Front page of Namesys.com has the Reiser4 whitepaper. It's a bit out of date, as some features still listed for 4.0 have been pushed to 4.1 or later, but I don't think the overall plan has changed. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling 2006-07-21 18:54 ` David Masover @ 2006-07-21 21:26 ` Andreas Schäfer 2006-07-21 21:53 ` David Masover 0 siblings, 1 reply; 10+ messages in thread From: Andreas Schäfer @ 2006-07-21 21:26 UTC (permalink / raw) To: reiserfs-list [-- Attachment #1: Type: text/plain, Size: 1209 bytes --] > Don't get too excited -- the transactions probably aren't done yet. Without those, no filesystem that claims to journal data is really any better than a filesystem which only journals metadata. Even once > they are implemented (or even if they are already), applications have to support them directly. Actually, I think transactions in a filesystem context are a bit different from the transactions you know form databases. Generally speaking, a transaction denotes a transition from one valid state to another. This transition should either be performed completely or -- in case of errors -- performed not at all (a.k.a. "roll back"). Databases allow application defined transactions (i.e. the application specifies when a valid state is being left and when one is reached again). IMHO for filesystems a transaction denotes the flushing of write buffers. "metadata only" transactions/journaling mean that even after a crash the file itself will be readable (and not pointing to e.g. sector -4711 ...). "data _and_ metadata" now means that the filesystem does also guarantee that the data itself is written completely. Please correct my if I got this totally wrong. -Andreas [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling 2006-07-21 21:26 ` Andreas Schäfer @ 2006-07-21 21:53 ` David Masover 2006-07-22 0:54 ` Hans Reiser 0 siblings, 1 reply; 10+ messages in thread From: David Masover @ 2006-07-21 21:53 UTC (permalink / raw) To: Andreas Schäfer; +Cc: reiserfs-list Andreas Schäfer wrote: >> Don't get too excited -- the transactions probably aren't done yet. Without those, no filesystem that claims to journal data is really any better than a filesystem which only journals metadata. Even once >> they are implemented (or even if they are already), applications have to support them directly. > > Actually, I think transactions in a filesystem context are a bit > different from the transactions you know form databases. Generally Yes, generally speaking, you're entirely right. But in the case of Reiser4, at least for a single file, you can perform a number of writes and declare them a single transaction. I believe there are some other oddities, such as: The FS doesn't always do the rollback, sometimes it delegates to the app for that, and it may be possible to perform transactions on a number of files. I don't believe transactions imply locking, only serialization -- that is, the last transaction that goes through is the one that counts, and overwrites any transactions that completed before, even if they were started after. So: Alice opens foo Alice starts writing Bob opens foo Bob starts writing Alice writes some more Bob writes more Bob closes foo Alice closes foo Even though Bob opened foo last, and did the last write, Alice was the last to close a transaction. If you want to avoid this situation, you use locking. Locking isn't mandatory, but neither are transactions. But I could be entirely wrong also. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: somewhat OT query on journalling 2006-07-21 21:53 ` David Masover @ 2006-07-22 0:54 ` Hans Reiser 0 siblings, 0 replies; 10+ messages in thread From: Hans Reiser @ 2006-07-22 0:54 UTC (permalink / raw) To: David Masover; +Cc: Andreas Schäfer, reiserfs-list David Masover wrote: > Andreas Schäfer wrote: > >>> Don't get too excited -- the transactions probably aren't done yet. >>> Without those, no filesystem that claims to journal data is really >>> any better than a filesystem which only journals metadata. Even >>> once they are implemented (or even if they are already), >>> applications have to support them directly. >> >> >> Actually, I think transactions in a filesystem context are a bit >> different from the transactions you know form databases. Generally > > > Yes, generally speaking, you're entirely right. But in the case of > Reiser4, at least for a single file, you can perform a number of > writes and declare them a single transaction. If we finish that code you can.;-) One of the problems that we need to deal with is that we are shipping a product pared of all functionality not essential so that we can get it out the door, and the website still describes the whole vision. We will do the whole vision, but first we need to get some income flowing. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-07-22 0:54 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-19 15:27 somewhat OT query on journalling Payal Rathod 2006-07-19 17:30 ` Toby Thain 2006-07-19 17:30 ` Toby Thain 2006-07-19 19:09 ` David Masover 2006-07-19 19:19 ` David Masover 2006-07-19 20:28 ` Hans Reiser 2006-07-21 12:45 ` Payal Rathod 2006-07-21 18:54 ` David Masover 2006-07-21 21:26 ` Andreas Schäfer 2006-07-21 21:53 ` David Masover 2006-07-22 0:54 ` Hans Reiser
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.