[PATCH 00/14] hfsplus: introduce journal replay functionality

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/14] hfsplus: introduce journal replay functionality
@ 2013-12-26  9:41 Vyacheslav Dubeyko
  0 siblings, 0 replies; 5+ messages in thread
From: Vyacheslav Dubeyko @ 2013-12-26  9:41 UTC (permalink / raw)
  To: Linux FS devel list
  Cc: Al Viro, Christoph Hellwig, Hin-Tak Leung, Andrew Morton

Hi,

This patch implements journal replay functionality in HFS+
file system driver.

Technical Note TN1150:
"The purpose of the journal is to ensure that when a group of
 related changes are being made, that either all of those changes
 are actually made, or none of them are made. This is done by
 gathering up all of the changes, and storing them in a separate
 place (in the journal). Once the journal copy of the changes is
 completely written to disk, the changes can actually be written
 to their normal locations on disk. If a failure happens at that
 time, the changes can simply be copied from the journal to their
 normal locations. If a failure happens when the changes are being
 written to the journal, but before they are marked complete, then
 all of those changes are ignored."

"A group of related changes is called a transaction. When all of
 the changes of a transaction have been written to their normal
 locations on disk, that transaction has been committed, and is
 removed from the journal. The journal may contain several
 transactions. Copying changes from all transactions to their
 normal locations on disk is called replaying the journal."

"In order to replay the journal, an implementation just loops
 over the transactions, copying each individual block in the
 transaction from the journal to its proper location on the
 volume. Once those blocks have been flushed to the media
 (not just the driver!), it may update the journal header to
 remove the transactions."

"Here are the steps to replay the journal:
  1. Read the volume header into variable vhb. The volume may
     have an HFS wrapper; if so, you will need to use it to
     determine the location of the volume header.
  2. Test the kHFSVolumeJournaledBit in the attributes field of
     the volume header. If it is not set, there is no journal
     to replay, and you are done.
  3. Read the journal info block from the allocation block number
     vhb.journalInfoBlock, into variable jib.
  4. If kJIJournalNeedsInitMask is set in jib.flags, the journal
     was never initialized, so there is no journal to replay.
  5. Verify that kJIJournalInFSMask is set and kJIJournalOnOtherDeviceMask
     is clear in jib.flags.
  6. Read the journal header at jib.offset bytes from the start
     of the volume, and place it in variable jhdr.
  7. If jhdr.start equals jhdr.end, the journal does not have
     any transactions, so there is nothing to replay.
  8. Set the current offset in the journal (typically a local
     variable) to the start of the journal buffer, jhdr.start.
  9. While jhdr.start does not equal jhdr.end, perform the
     following steps:
       1. Read a block list header of jhdr.blhdr_size bytes from
          the current offset in the journal into variable blhdr.
       2. For each block in bhdr.binfo[1] to bhdr.binfo[blhdr.num_blocks],
          inclusive, copy bsize bytes from the current offset in
          the journal to sector bnum on the volume (to byte offset
          bnum*jdhr.jhdr_size). Remember that jhdr_size is the
          size of a sector, in bytes.
       3. If bhdr.binfo[0].next is zero, you have completed the
          last block list of the current transaction; set jhdr.start
          to the current offset in the journal."

With the best regards,
Vyacheslav Dubeyko.
---

 Documentation/filesystems/hfsplus.txt |    4 +
 fs/hfsplus/Makefile                   |    3 +-
 fs/hfsplus/hfsplus_fs.h               |   31 +-
 fs/hfsplus/hfsplus_raw.h              |   53 +-
 fs/hfsplus/journal.c                  | 1207 +++++++++++++++++++++++++++++++++
 fs/hfsplus/options.c                  |    8 +-
 fs/hfsplus/part_tbl.c                 |    4 +-
 fs/hfsplus/super.c                    |   42 +-
 fs/hfsplus/wrapper.c                  |   28 +-
 9 files changed, 1356 insertions(+), 24 deletions(-)
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 00/14] hfsplus: introduce journal replay functionality
@ 2013-12-26 15:57 Hin-Tak Leung
  2013-12-27  6:39 ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 5+ messages in thread
From: Hin-Tak Leung @ 2013-12-26 15:57 UTC (permalink / raw)
  To: slava, linux-fsdevel; +Cc: viro, hch, akpm


Hi, 

It is quite a co-incidence - I have also spent some hours in the last few days rebasing the netgear derived patch set bit-rotting in my hard disk. I have a quick diff and it seems that you have drawn some ideas but yours is largely an independent effort. Anyway, here are a few things from where I am. 

- It appears that you don't treat the special files (catalog, etc) special? netgear's does; but i think there is a mistake in that they don't consider them ever getting fragmented, so they were not journaling those files properly. 

- I am still nowhere near figuring out what's the issue with just runing du on one funny volume i have. The driver gets confused after a few du's but the disk always fsck clean. 

- I see I still have one out-standing patch not yet submitted - the one about folder counts on case-sensitive volumes. 

I'll try to spend some time reading your patches and maybe even try them out. Will write again. 

Hin-tak



------------------------------
On Thu, Dec 26, 2013 09:41 GMT Vyacheslav Dubeyko wrote:

>Hi,
>
>This patch implements journal replay functionality in HFS+
>file system driver.
>
>Technical Note TN1150:
>"The purpose of the journal is to ensure that when a group of
> related changes are being made, that either all of those changes
> are actually made, or none of them are made. This is done by
> gathering up all of the changes, and storing them in a separate
> place (in the journal). Once the journal copy of the changes is
> completely written to disk, the changes can actually be written
> to their normal locations on disk. If a failure happens at that
> time, the changes can simply be copied from the journal to their
> normal locations. If a failure happens when the changes are being
> written to the journal, but before they are marked complete, then
> all of those changes are ignored."
>
>"A group of related changes is called a transaction. When all of
> the changes of a transaction have been written to their normal
> locations on disk, that transaction has been committed, and is
> removed from the journal. The journal may contain several
> transactions. Copying changes from all transactions to their
> normal locations on disk is called replaying the journal."
>
>"In order to replay the journal, an implementation just loops
> over the transactions, copying each individual block in the
> transaction from the journal to its proper location on the
> volume. Once those blocks have been flushed to the media
> (not just the driver!), it may update the journal header to
> remove the transactions."
>
>"Here are the steps to replay the journal:
>  1. Read the volume header into variable vhb. The volume may
>     have an HFS wrapper; if so, you will need to use it to
>     determine the location of the volume header.
>  2. Test the kHFSVolumeJournaledBit in the attributes field of
>     the volume header. If it is not set, there is no journal
>     to replay, and you are done.
>  3. Read the journal info block from the allocation block number
>     vhb.journalInfoBlock, into variable jib.
>  4. If kJIJournalNeedsInitMask is set in jib.flags, the journal
>     was never initialized, so there is no journal to replay.
>  5. Verify that kJIJournalInFSMask is set and kJIJournalOnOtherDeviceMask
>     is clear in jib.flags.
>  6. Read the journal header at jib.offset bytes from the start
>     of the volume, and place it in variable jhdr.
>  7. If jhdr.start equals jhdr.end, the journal does not have
>     any transactions, so there is nothing to replay.
>  8. Set the current offset in the journal (typically a local
>     variable) to the start of the journal buffer, jhdr.start.
>  9. While jhdr.start does not equal jhdr.end, perform the
>     following steps:
>       1. Read a block list header of jhdr.blhdr_size bytes from
>          the current offset in the journal into variable blhdr.
>       2. For each block in bhdr.binfo[1] to bhdr.binfo[blhdr.num_blocks],
>          inclusive, copy bsize bytes from the current offset in
>          the journal to sector bnum on the volume (to byte offset
>          bnum*jdhr.jhdr_size). Remember that jhdr_size is the
>          size of a sector, in bytes.
>       3. If bhdr.binfo[0].next is zero, you have completed the
>          last block list of the current transaction; set jhdr.start
>          to the current offset in the journal."
>
>With the best regards,
>Vyacheslav Dubeyko.
>---
>
> Documentation/filesystems/hfsplus.txt |    4 +
> fs/hfsplus/Makefile                   |    3 +-
> fs/hfsplus/hfsplus_fs.h               |   31 +-
> fs/hfsplus/hfsplus_raw.h              |   53 +-
> fs/hfsplus/journal.c                  | 1207 +++++++++++++++++++++++++++++++++
> fs/hfsplus/options.c                  |    8 +-
> fs/hfsplus/part_tbl.c                 |    4 +-
> fs/hfsplus/super.c                    |   42 +-
> fs/hfsplus/wrapper.c                  |   28 +-
> 9 files changed, 1356 insertions(+), 24 deletions(-)
>-- 
>1.7.9.5
>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 00/14] hfsplus: introduce journal replay functionality
  2013-12-26 15:57 Hin-Tak Leung
@ 2013-12-27  6:39 ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 5+ messages in thread
From: Vyacheslav Dubeyko @ 2013-12-27  6:39 UTC (permalink / raw)
  To: Hin-Tak Leung; +Cc: linux-fsdevel, viro, hch, akpm

On Thu, 2013-12-26 at 15:57 +0000, Hin-Tak Leung wrote:

> - It appears that you don't treat the special files (catalog, etc) special? netgear's does; 
> but i think there is a mistake in that they don't consider them ever getting fragmented,
> so they were not journaling those files properly. 

Technical Note TN1150:
"On HFS Plus volumes, the journal info block is stored as a file (so
 that its space can be properly represented in a catalog record and the
 allocation bitmap). The name of that file is ".journal_info_block" and
 it is stored in the volume's root directory. The journal header and
 journal buffer are stored together in a different file named
 ".journal", also in the volume's root directory. Each of these files
 are contiguous on disk (they occupy exactly one extent). An
 implementation that uses the journal must prevent the files from being
 accessed as normal files."

"Note:
 An implementation must find the journal info block by using the
 journalInfoBlock field of the volume header, not by the file name.
 Similarly, an implementation must find the journal header and journal
 buffer using the contents of the journal info block, not the file
 name."

So, I use namely special structures hfsplus_journal_info_block and
hfsplus_journal_header for accessing to journal and replaying of it.

As far as I can see, the Netgear's code has many issues:
(1) It doesn't work for the case of 8KB - 64KB block size;
(2) I suspect that it implements journal wrapping incorrectly;
(3) It doesn't take into account that block list can be placed in
several file system blocks;
(4) It doesn't check validity of transaction sequence numbers, blocks'
checksums and so on.

> I'll try to spend some time reading your patches and maybe even try them out. Will write again. 

If you have opportunity for testing journal replay patch set then it
will be great. I have tested this implementation on different volume's
states that it was created under Mac OS X "Tiger" and "Snow Leopard".
And it is necessary testing for volumes that it is created under Mac OS
X "Lion". I don't expect that it will some troubles with journal
replaying for the case of HFS+ volumes are created under "Lion". But,
anyway, independent testing is good idea always.

Thanks,
Vyacheslav Dubeyko.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 00/14] hfsplus: introduce journal replay functionality
@ 2014-01-19  1:50 Hin-Tak Leung
  2014-01-20  7:54 ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 5+ messages in thread
From: Hin-Tak Leung @ 2014-01-19  1:50 UTC (permalink / raw)
  To: slava; +Cc: Andrew Morton, Linux FS devel list

------------------------------
On Thu, Jan 9, 2014 06:12 GMT Vyacheslav Dubeyko wrote:

>Hi Hin-Tak,
>
>On Thu, 2013-12-26 at 14:26 -0800, Andrew Morton wrote:
>> On Thu, 26 Dec 2013 15:57:45 +0000 (GMT) Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>> 
>> > It is quite a co-incidence - I have also spent some hours in the last few days rebasing the netgear derived patch set bit-rotting in my hard disk. I have a quick diff and it seems that you have drawn some ideas but yours is largely an independent effort. Anyway, here are a few things from where I am. 
>> > 
>> > - It appears that you don't treat the special files (catalog, etc) special? netgear's does; but i think there is a mistake in that they don't consider them ever getting fragmented, so they were not journaling those files properly. 
>> > 
>> > - I am still nowhere near figuring out what's the issue with just runing du on one funny volume i have. The driver gets confused after a few du's but the disk always fsck clean. 
>> > 
>> > - I see I still have one out-standing patch not yet submitted - the one about folder counts on case-sensitive volumes. 
>> > 
>> > I'll try to spend some time reading your patches and maybe even try them out. Will write again. 
>> 
>> Thanks, useful!
>> 
>> Vyacheslav, I'll duck the patchset pending Hin-Tak's review.  Please
>> cc me on the later resend.
>
>How do you feel about the patchset? What opinion do you have? Could you
>share your opinion about patchset?

Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net>

I have given the patch set a bit of light usage, and it fsck'ed clean afterwards.
So that's the minimum it should do.I also see that you have spent substantial
amount of effort in verifying and checking the validity and sanity of the journal
itself (which Netgear certainly didn't do). And thanks for the good work! That part
at least is very desirable and overdue and should land in the kernel soon.

About documentation and the new "norecovery" option. Should "norecovery"
imply read-only (and also state clearly so)? The meaning of the "force" option
also needs some updating - if we play back journal, then unclean journalled
volumes would be mounted read-write after verify/validating and journal playback,
correct? I see there is a need for "norecovery" in addition to "read-only" (in the typical
convention elsewhere, the latter will occasionally write to a journalled fs,
in the case of journal playback - so "norecovery" is "really no write, not
a single byte, I mean it, don't even playback journal"), but I think norecovery
should imply read-only, unless in combination with "force"?

A somewhat minor thing - I see a few instances of
'pr_err("journal replay is failed\n");' - the "is" isn't needed. Just English usage.

Now it still worries me that whichever way we implements journalling in the linux
kernel, it may be correct according to spec, but doesn't inter-op with
Apple's. This is especially relevant since a substantial portion of users
who wants hfs+ journalling in their linux kernel is because they have an
intel Mac and they are dual-booting. Now we have 3 implementations - apple's,
this, and Netgear's. Even if we discount Netgear's, while each of them 
will create a journal during its normal mode of operation, will playback a journal
of its own creation, we really need to test the case of playback of journals
made by the Mac OS X darwin kernel... because I can think of this happening:
somebody has a dual-boot machine, you pull the plug while it is in Mac OS X,
and the boot-loader is configured to go into linux by default after a short time-out
(or vice versa).

So again, I must thank you for spending the effort on verifying and validating the
journal entries.

On separate/somewhat unrelated matters, I came across this bug report in
the recent activities:

https://bugs.launchpad.net/ubuntu/+source/hfsprogs/+bug/680606

I think we addressed both of the 2nd and 3rd entries:
   errors ("Invalid volume file count", "Invalid volume free block count")
   error ("Unused node is not erased")

The first entry looks suspiciously like the problem which I have/had:
   (restore a large directory from time-machine)
   Invoke "rm -r" targeting the directory
   - segfault occurs
   - Various applications freeze, and it's impossible to cleanly shut down
   - OS X Disk Utility reports that the disk can't be repaired

So the disk I manually edited and fixed, still can get the kernel confused (i.e
the kernel just get confused, if one unmount and reboot, the disk is fsck-clean still),
just by repeatedly running "du". I found that I need to put the system under
some memory stress: I can run "du" for a dozen of times on an idle system
freshly rebooted, but as soon as I have google-chrome running with a few browser
windows opening (or mozilla/firefox). The kernel driver can get confused if one transverse
the file system quickly (with a du, or in the above case, with an r"m -rf large directory"),
especially the system is relatively loaded.

Oh, there is still the folder count issue with case-sensitive HFS+ (which AFAIK, just a
convenience for some file system transversal usage, not actually of any critical function),
but that's a relatively harmless issue. We can deal with it at some point.

Regards,
Hin-Tak

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 00/14] hfsplus: introduce journal replay functionality
  2014-01-19  1:50 Hin-Tak Leung
@ 2014-01-20  7:54 ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 5+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-20  7:54 UTC (permalink / raw)
  To: htl10; +Cc: Andrew Morton, Linux FS devel list

Hi Hin-Tak,

On Sun, 2014-01-19 at 01:50 +0000, Hin-Tak Leung wrote:
> ------------------------------

> Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net>
> 
> I have given the patch set a bit of light usage, and it fsck'ed clean afterwards.
> So that's the minimum it should do.I also see that you have spent substantial
> amount of effort in verifying and checking the validity and sanity of the journal
> itself (which Netgear certainly didn't do). And thanks for the good work! That part
> at least is very desirable and overdue and should land in the kernel soon.
> 

Thank you for testing and reviewing of the patchset.

> About documentation and the new "norecovery" option. Should "norecovery"
> imply read-only (and also state clearly so)?

Yes, sure. The "norecovery" option means read-only state. You can see it
in hfsplus_check_journal() method:

+       if (hfsplus_journal_empty(jh)) {
+               err = hfsplus_replay_journal_header(sb);
+               if (unlikely(err)) {
+                       pr_err("journal replay is failed, please run fsck\n");
+                       return err;
+               }
+               /* If they're the same, we can mount, it's clean */
+               hfs_dbg(JOURNAL, "journal is empty\n");
+               return 0;
+       } else {
+               /* Try to replay journal */
+               if (test_bit(HFSPLUS_SB_NORECOVERY, &HFSPLUS_SB(sb)->flags)) {
+                       pr_info("journal replay is skiped, mounting read-only.\n");
+                       sb->s_flags |= MS_RDONLY;
+                       return 0;
+               }
+
+               err = hfsplus_replay_journal(sb);
+               if (unlikely(err)) {
+                       pr_err("journal replay is failed, please run fsck\n");
+                       return err;
+               }
+               hfs_dbg(JOURNAL, "journal replay is done\n");
+       }

>  The meaning of the "force" option
> also needs some updating - if we play back journal, then unclean journalled
> volumes would be mounted read-write after verify/validating and journal playback,
> correct? I see there is a need for "norecovery" in addition to "read-only" (in the typical
> convention elsewhere, the latter will occasionally write to a journalled fs,
> in the case of journal playback - so "norecovery" is "really no write, not
> a single byte, I mean it, don't even playback journal"), but I think norecovery
> should imply read-only, unless in combination with "force"?
> 

Yes, you are right about "force" option. So, I need to check that all
plays well for this option. And it needs to correct documentation also
for the case of "force" option. Thank you. I'll do it.

> A somewhat minor thing - I see a few instances of
> 'pr_err("journal replay is failed\n");' - the "is" isn't needed. Just English usage.
> 

Yes, I correct it. So, I'll prepare next version of the patchset with
these minor corrections ASAP.

> Now it still worries me that whichever way we implements journalling in the linux
> kernel, it may be correct according to spec, but doesn't inter-op with
> Apple's. This is especially relevant since a substantial portion of users
> who wants hfs+ journalling in their linux kernel is because they have an
> intel Mac and they are dual-booting. Now we have 3 implementations - apple's,
> this, and Netgear's. Even if we discount Netgear's, while each of them 
> will create a journal during its normal mode of operation, will playback a journal
> of its own creation, we really need to test the case of playback of journals
> made by the Mac OS X darwin kernel... because I can think of this happening:
> somebody has a dual-boot machine, you pull the plug while it is in Mac OS X,
> and the boot-loader is configured to go into linux by default after a short time-out
> (or vice versa).

Yes, I am taking into account compatibility of journaling and other
functionality of HFS+ driver in Linux kernel with Mac OS X. It is very
important to keep compatibility, from my viewpoint. So, you are right
and I spend efforts on checking such compatibility when I implement and
testing code for HFS+ driver.

> 
> So again, I must thank you for spending the effort on verifying and validating the
> journal entries.
> 
> On separate/somewhat unrelated matters, I came across this bug report in
> the recent activities:
> 
> https://bugs.launchpad.net/ubuntu/+source/hfsprogs/+bug/680606
> 
> I think we addressed both of the 2nd and 3rd entries:
>    errors ("Invalid volume file count", "Invalid volume free block count")
>    error ("Unused node is not erased")
> 
> The first entry looks suspiciously like the problem which I have/had:
>    (restore a large directory from time-machine)
>    Invoke "rm -r" targeting the directory
>    - segfault occurs
>    - Various applications freeze, and it's impossible to cleanly shut down
>    - OS X Disk Utility reports that the disk can't be repaired
> 
> So the disk I manually edited and fixed, still can get the kernel confused (i.e
> the kernel just get confused, if one unmount and reboot, the disk is fsck-clean still),
> just by repeatedly running "du". I found that I need to put the system under
> some memory stress: I can run "du" for a dozen of times on an idle system
> freshly rebooted, but as soon as I have google-chrome running with a few browser
> windows opening (or mozilla/firefox). The kernel driver can get confused if one transverse
> the file system quickly (with a du, or in the above case, with an r"m -rf large directory"),
> especially the system is relatively loaded.
> 

I didn't try such way of reproducing the bug. So, maybe I'll try to
reproduce the issue. Unfortunately, I don't know when I'll try it
because I will be really busy during next several weeks.

> Oh, there is still the folder count issue with case-sensitive HFS+ (which AFAIK, just a
> convenience for some file system transversal usage, not actually of any critical function),
> but that's a relatively harmless issue. We can deal with it at some point.
> 

Yes, sure.

Thanks,
Vyacheslav Dubeyko.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-01-20  8:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-26  9:41 [PATCH 00/14] hfsplus: introduce journal replay functionality Vyacheslav Dubeyko
  -- strict thread matches above, loose matches on Subject: below --
2013-12-26 15:57 Hin-Tak Leung
2013-12-27  6:39 ` Vyacheslav Dubeyko
2014-01-19  1:50 Hin-Tak Leung
2014-01-20  7:54 ` Vyacheslav Dubeyko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).