* [PATCH 00/14] hfsplus: introduce journal replay functionality
@ 2013-12-26 9:41 Vyacheslav Dubeyko
0 siblings, 0 replies; 5+ messages in thread
From: Vyacheslav Dubeyko @ 2013-12-26 9:41 UTC (permalink / raw)
To: Linux FS devel list
Cc: Al Viro, Christoph Hellwig, Hin-Tak Leung, Andrew Morton
Hi,
This patch implements journal replay functionality in HFS+
file system driver.
Technical Note TN1150:
"The purpose of the journal is to ensure that when a group of
related changes are being made, that either all of those changes
are actually made, or none of them are made. This is done by
gathering up all of the changes, and storing them in a separate
place (in the journal). Once the journal copy of the changes is
completely written to disk, the changes can actually be written
to their normal locations on disk. If a failure happens at that
time, the changes can simply be copied from the journal to their
normal locations. If a failure happens when the changes are being
written to the journal, but before they are marked complete, then
all of those changes are ignored."
"A group of related changes is called a transaction. When all of
the changes of a transaction have been written to their normal
locations on disk, that transaction has been committed, and is
removed from the journal. The journal may contain several
transactions. Copying changes from all transactions to their
normal locations on disk is called replaying the journal."
"In order to replay the journal, an implementation just loops
over the transactions, copying each individual block in the
transaction from the journal to its proper location on the
volume. Once those blocks have been flushed to the media
(not just the driver!), it may update the journal header to
remove the transactions."
"Here are the steps to replay the journal:
1. Read the volume header into variable vhb. The volume may
have an HFS wrapper; if so, you will need to use it to
determine the location of the volume header.
2. Test the kHFSVolumeJournaledBit in the attributes field of
the volume header. If it is not set, there is no journal
to replay, and you are done.
3. Read the journal info block from the allocation block number
vhb.journalInfoBlock, into variable jib.
4. If kJIJournalNeedsInitMask is set in jib.flags, the journal
was never initialized, so there is no journal to replay.
5. Verify that kJIJournalInFSMask is set and kJIJournalOnOtherDeviceMask
is clear in jib.flags.
6. Read the journal header at jib.offset bytes from the start
of the volume, and place it in variable jhdr.
7. If jhdr.start equals jhdr.end, the journal does not have
any transactions, so there is nothing to replay.
8. Set the current offset in the journal (typically a local
variable) to the start of the journal buffer, jhdr.start.
9. While jhdr.start does not equal jhdr.end, perform the
following steps:
1. Read a block list header of jhdr.blhdr_size bytes from
the current offset in the journal into variable blhdr.
2. For each block in bhdr.binfo[1] to bhdr.binfo[blhdr.num_blocks],
inclusive, copy bsize bytes from the current offset in
the journal to sector bnum on the volume (to byte offset
bnum*jdhr.jhdr_size). Remember that jhdr_size is the
size of a sector, in bytes.
3. If bhdr.binfo[0].next is zero, you have completed the
last block list of the current transaction; set jhdr.start
to the current offset in the journal."
With the best regards,
Vyacheslav Dubeyko.
---
Documentation/filesystems/hfsplus.txt | 4 +
fs/hfsplus/Makefile | 3 +-
fs/hfsplus/hfsplus_fs.h | 31 +-
fs/hfsplus/hfsplus_raw.h | 53 +-
fs/hfsplus/journal.c | 1207 +++++++++++++++++++++++++++++++++
fs/hfsplus/options.c | 8 +-
fs/hfsplus/part_tbl.c | 4 +-
fs/hfsplus/super.c | 42 +-
fs/hfsplus/wrapper.c | 28 +-
9 files changed, 1356 insertions(+), 24 deletions(-)
--
1.7.9.5
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/14] hfsplus: introduce journal replay functionality
@ 2013-12-26 15:57 Hin-Tak Leung
2013-12-27 6:39 ` Vyacheslav Dubeyko
0 siblings, 1 reply; 5+ messages in thread
From: Hin-Tak Leung @ 2013-12-26 15:57 UTC (permalink / raw)
To: slava, linux-fsdevel; +Cc: viro, hch, akpm
Hi,
It is quite a co-incidence - I have also spent some hours in the last few days rebasing the netgear derived patch set bit-rotting in my hard disk. I have a quick diff and it seems that you have drawn some ideas but yours is largely an independent effort. Anyway, here are a few things from where I am.
- It appears that you don't treat the special files (catalog, etc) special? netgear's does; but i think there is a mistake in that they don't consider them ever getting fragmented, so they were not journaling those files properly.
- I am still nowhere near figuring out what's the issue with just runing du on one funny volume i have. The driver gets confused after a few du's but the disk always fsck clean.
- I see I still have one out-standing patch not yet submitted - the one about folder counts on case-sensitive volumes.
I'll try to spend some time reading your patches and maybe even try them out. Will write again.
Hin-tak
------------------------------
On Thu, Dec 26, 2013 09:41 GMT Vyacheslav Dubeyko wrote:
>Hi,
>
>This patch implements journal replay functionality in HFS+
>file system driver.
>
>Technical Note TN1150:
>"The purpose of the journal is to ensure that when a group of
> related changes are being made, that either all of those changes
> are actually made, or none of them are made. This is done by
> gathering up all of the changes, and storing them in a separate
> place (in the journal). Once the journal copy of the changes is
> completely written to disk, the changes can actually be written
> to their normal locations on disk. If a failure happens at that
> time, the changes can simply be copied from the journal to their
> normal locations. If a failure happens when the changes are being
> written to the journal, but before they are marked complete, then
> all of those changes are ignored."
>
>"A group of related changes is called a transaction. When all of
> the changes of a transaction have been written to their normal
> locations on disk, that transaction has been committed, and is
> removed from the journal. The journal may contain several
> transactions. Copying changes from all transactions to their
> normal locations on disk is called replaying the journal."
>
>"In order to replay the journal, an implementation just loops
> over the transactions, copying each individual block in the
> transaction from the journal to its proper location on the
> volume. Once those blocks have been flushed to the media
> (not just the driver!), it may update the journal header to
> remove the transactions."
>
>"Here are the steps to replay the journal:
> 1. Read the volume header into variable vhb. The volume may
> have an HFS wrapper; if so, you will need to use it to
> determine the location of the volume header.
> 2. Test the kHFSVolumeJournaledBit in the attributes field of
> the volume header. If it is not set, there is no journal
> to replay, and you are done.
> 3. Read the journal info block from the allocation block number
> vhb.journalInfoBlock, into variable jib.
> 4. If kJIJournalNeedsInitMask is set in jib.flags, the journal
> was never initialized, so there is no journal to replay.
> 5. Verify that kJIJournalInFSMask is set and kJIJournalOnOtherDeviceMask
> is clear in jib.flags.
> 6. Read the journal header at jib.offset bytes from the start
> of the volume, and place it in variable jhdr.
> 7. If jhdr.start equals jhdr.end, the journal does not have
> any transactions, so there is nothing to replay.
> 8. Set the current offset in the journal (typically a local
> variable) to the start of the journal buffer, jhdr.start.
> 9. While jhdr.start does not equal jhdr.end, perform the
> following steps:
> 1. Read a block list header of jhdr.blhdr_size bytes from
> the current offset in the journal into variable blhdr.
> 2. For each block in bhdr.binfo[1] to bhdr.binfo[blhdr.num_blocks],
> inclusive, copy bsize bytes from the current offset in
> the journal to sector bnum on the volume (to byte offset
> bnum*jdhr.jhdr_size). Remember that jhdr_size is the
> size of a sector, in bytes.
> 3. If bhdr.binfo[0].next is zero, you have completed the
> last block list of the current transaction; set jhdr.start
> to the current offset in the journal."
>
>With the best regards,
>Vyacheslav Dubeyko.
>---
>
> Documentation/filesystems/hfsplus.txt | 4 +
> fs/hfsplus/Makefile | 3 +-
> fs/hfsplus/hfsplus_fs.h | 31 +-
> fs/hfsplus/hfsplus_raw.h | 53 +-
> fs/hfsplus/journal.c | 1207 +++++++++++++++++++++++++++++++++
> fs/hfsplus/options.c | 8 +-
> fs/hfsplus/part_tbl.c | 4 +-
> fs/hfsplus/super.c | 42 +-
> fs/hfsplus/wrapper.c | 28 +-
> 9 files changed, 1356 insertions(+), 24 deletions(-)
>--
>1.7.9.5
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/14] hfsplus: introduce journal replay functionality
2013-12-26 15:57 Hin-Tak Leung
@ 2013-12-27 6:39 ` Vyacheslav Dubeyko
0 siblings, 0 replies; 5+ messages in thread
From: Vyacheslav Dubeyko @ 2013-12-27 6:39 UTC (permalink / raw)
To: Hin-Tak Leung; +Cc: linux-fsdevel, viro, hch, akpm
On Thu, 2013-12-26 at 15:57 +0000, Hin-Tak Leung wrote:
> - It appears that you don't treat the special files (catalog, etc) special? netgear's does;
> but i think there is a mistake in that they don't consider them ever getting fragmented,
> so they were not journaling those files properly.
Technical Note TN1150:
"On HFS Plus volumes, the journal info block is stored as a file (so
that its space can be properly represented in a catalog record and the
allocation bitmap). The name of that file is ".journal_info_block" and
it is stored in the volume's root directory. The journal header and
journal buffer are stored together in a different file named
".journal", also in the volume's root directory. Each of these files
are contiguous on disk (they occupy exactly one extent). An
implementation that uses the journal must prevent the files from being
accessed as normal files."
"Note:
An implementation must find the journal info block by using the
journalInfoBlock field of the volume header, not by the file name.
Similarly, an implementation must find the journal header and journal
buffer using the contents of the journal info block, not the file
name."
So, I use namely special structures hfsplus_journal_info_block and
hfsplus_journal_header for accessing to journal and replaying of it.
As far as I can see, the Netgear's code has many issues:
(1) It doesn't work for the case of 8KB - 64KB block size;
(2) I suspect that it implements journal wrapping incorrectly;
(3) It doesn't take into account that block list can be placed in
several file system blocks;
(4) It doesn't check validity of transaction sequence numbers, blocks'
checksums and so on.
> I'll try to spend some time reading your patches and maybe even try them out. Will write again.
If you have opportunity for testing journal replay patch set then it
will be great. I have tested this implementation on different volume's
states that it was created under Mac OS X "Tiger" and "Snow Leopard".
And it is necessary testing for volumes that it is created under Mac OS
X "Lion". I don't expect that it will some troubles with journal
replaying for the case of HFS+ volumes are created under "Lion". But,
anyway, independent testing is good idea always.
Thanks,
Vyacheslav Dubeyko.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/14] hfsplus: introduce journal replay functionality
@ 2014-01-19 1:50 Hin-Tak Leung
2014-01-20 7:54 ` Vyacheslav Dubeyko
0 siblings, 1 reply; 5+ messages in thread
From: Hin-Tak Leung @ 2014-01-19 1:50 UTC (permalink / raw)
To: slava; +Cc: Andrew Morton, Linux FS devel list
------------------------------
On Thu, Jan 9, 2014 06:12 GMT Vyacheslav Dubeyko wrote:
>Hi Hin-Tak,
>
>On Thu, 2013-12-26 at 14:26 -0800, Andrew Morton wrote:
>> On Thu, 26 Dec 2013 15:57:45 +0000 (GMT) Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>
>> > It is quite a co-incidence - I have also spent some hours in the last few days rebasing the netgear derived patch set bit-rotting in my hard disk. I have a quick diff and it seems that you have drawn some ideas but yours is largely an independent effort. Anyway, here are a few things from where I am.
>> >
>> > - It appears that you don't treat the special files (catalog, etc) special? netgear's does; but i think there is a mistake in that they don't consider them ever getting fragmented, so they were not journaling those files properly.
>> >
>> > - I am still nowhere near figuring out what's the issue with just runing du on one funny volume i have. The driver gets confused after a few du's but the disk always fsck clean.
>> >
>> > - I see I still have one out-standing patch not yet submitted - the one about folder counts on case-sensitive volumes.
>> >
>> > I'll try to spend some time reading your patches and maybe even try them out. Will write again.
>>
>> Thanks, useful!
>>
>> Vyacheslav, I'll duck the patchset pending Hin-Tak's review. Please
>> cc me on the later resend.
>
>How do you feel about the patchset? What opinion do you have? Could you
>share your opinion about patchset?
Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net>
I have given the patch set a bit of light usage, and it fsck'ed clean afterwards.
So that's the minimum it should do.I also see that you have spent substantial
amount of effort in verifying and checking the validity and sanity of the journal
itself (which Netgear certainly didn't do). And thanks for the good work! That part
at least is very desirable and overdue and should land in the kernel soon.
About documentation and the new "norecovery" option. Should "norecovery"
imply read-only (and also state clearly so)? The meaning of the "force" option
also needs some updating - if we play back journal, then unclean journalled
volumes would be mounted read-write after verify/validating and journal playback,
correct? I see there is a need for "norecovery" in addition to "read-only" (in the typical
convention elsewhere, the latter will occasionally write to a journalled fs,
in the case of journal playback - so "norecovery" is "really no write, not
a single byte, I mean it, don't even playback journal"), but I think norecovery
should imply read-only, unless in combination with "force"?
A somewhat minor thing - I see a few instances of
'pr_err("journal replay is failed\n");' - the "is" isn't needed. Just English usage.
Now it still worries me that whichever way we implements journalling in the linux
kernel, it may be correct according to spec, but doesn't inter-op with
Apple's. This is especially relevant since a substantial portion of users
who wants hfs+ journalling in their linux kernel is because they have an
intel Mac and they are dual-booting. Now we have 3 implementations - apple's,
this, and Netgear's. Even if we discount Netgear's, while each of them
will create a journal during its normal mode of operation, will playback a journal
of its own creation, we really need to test the case of playback of journals
made by the Mac OS X darwin kernel... because I can think of this happening:
somebody has a dual-boot machine, you pull the plug while it is in Mac OS X,
and the boot-loader is configured to go into linux by default after a short time-out
(or vice versa).
So again, I must thank you for spending the effort on verifying and validating the
journal entries.
On separate/somewhat unrelated matters, I came across this bug report in
the recent activities:
https://bugs.launchpad.net/ubuntu/+source/hfsprogs/+bug/680606
I think we addressed both of the 2nd and 3rd entries:
errors ("Invalid volume file count", "Invalid volume free block count")
error ("Unused node is not erased")
The first entry looks suspiciously like the problem which I have/had:
(restore a large directory from time-machine)
Invoke "rm -r" targeting the directory
- segfault occurs
- Various applications freeze, and it's impossible to cleanly shut down
- OS X Disk Utility reports that the disk can't be repaired
So the disk I manually edited and fixed, still can get the kernel confused (i.e
the kernel just get confused, if one unmount and reboot, the disk is fsck-clean still),
just by repeatedly running "du". I found that I need to put the system under
some memory stress: I can run "du" for a dozen of times on an idle system
freshly rebooted, but as soon as I have google-chrome running with a few browser
windows opening (or mozilla/firefox). The kernel driver can get confused if one transverse
the file system quickly (with a du, or in the above case, with an r"m -rf large directory"),
especially the system is relatively loaded.
Oh, there is still the folder count issue with case-sensitive HFS+ (which AFAIK, just a
convenience for some file system transversal usage, not actually of any critical function),
but that's a relatively harmless issue. We can deal with it at some point.
Regards,
Hin-Tak
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/14] hfsplus: introduce journal replay functionality
2014-01-19 1:50 Hin-Tak Leung
@ 2014-01-20 7:54 ` Vyacheslav Dubeyko
0 siblings, 0 replies; 5+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-20 7:54 UTC (permalink / raw)
To: htl10; +Cc: Andrew Morton, Linux FS devel list
Hi Hin-Tak,
On Sun, 2014-01-19 at 01:50 +0000, Hin-Tak Leung wrote:
> ------------------------------
> Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net>
>
> I have given the patch set a bit of light usage, and it fsck'ed clean afterwards.
> So that's the minimum it should do.I also see that you have spent substantial
> amount of effort in verifying and checking the validity and sanity of the journal
> itself (which Netgear certainly didn't do). And thanks for the good work! That part
> at least is very desirable and overdue and should land in the kernel soon.
>
Thank you for testing and reviewing of the patchset.
> About documentation and the new "norecovery" option. Should "norecovery"
> imply read-only (and also state clearly so)?
Yes, sure. The "norecovery" option means read-only state. You can see it
in hfsplus_check_journal() method:
+ if (hfsplus_journal_empty(jh)) {
+ err = hfsplus_replay_journal_header(sb);
+ if (unlikely(err)) {
+ pr_err("journal replay is failed, please run fsck\n");
+ return err;
+ }
+ /* If they're the same, we can mount, it's clean */
+ hfs_dbg(JOURNAL, "journal is empty\n");
+ return 0;
+ } else {
+ /* Try to replay journal */
+ if (test_bit(HFSPLUS_SB_NORECOVERY, &HFSPLUS_SB(sb)->flags)) {
+ pr_info("journal replay is skiped, mounting read-only.\n");
+ sb->s_flags |= MS_RDONLY;
+ return 0;
+ }
+
+ err = hfsplus_replay_journal(sb);
+ if (unlikely(err)) {
+ pr_err("journal replay is failed, please run fsck\n");
+ return err;
+ }
+ hfs_dbg(JOURNAL, "journal replay is done\n");
+ }
> The meaning of the "force" option
> also needs some updating - if we play back journal, then unclean journalled
> volumes would be mounted read-write after verify/validating and journal playback,
> correct? I see there is a need for "norecovery" in addition to "read-only" (in the typical
> convention elsewhere, the latter will occasionally write to a journalled fs,
> in the case of journal playback - so "norecovery" is "really no write, not
> a single byte, I mean it, don't even playback journal"), but I think norecovery
> should imply read-only, unless in combination with "force"?
>
Yes, you are right about "force" option. So, I need to check that all
plays well for this option. And it needs to correct documentation also
for the case of "force" option. Thank you. I'll do it.
> A somewhat minor thing - I see a few instances of
> 'pr_err("journal replay is failed\n");' - the "is" isn't needed. Just English usage.
>
Yes, I correct it. So, I'll prepare next version of the patchset with
these minor corrections ASAP.
> Now it still worries me that whichever way we implements journalling in the linux
> kernel, it may be correct according to spec, but doesn't inter-op with
> Apple's. This is especially relevant since a substantial portion of users
> who wants hfs+ journalling in their linux kernel is because they have an
> intel Mac and they are dual-booting. Now we have 3 implementations - apple's,
> this, and Netgear's. Even if we discount Netgear's, while each of them
> will create a journal during its normal mode of operation, will playback a journal
> of its own creation, we really need to test the case of playback of journals
> made by the Mac OS X darwin kernel... because I can think of this happening:
> somebody has a dual-boot machine, you pull the plug while it is in Mac OS X,
> and the boot-loader is configured to go into linux by default after a short time-out
> (or vice versa).
Yes, I am taking into account compatibility of journaling and other
functionality of HFS+ driver in Linux kernel with Mac OS X. It is very
important to keep compatibility, from my viewpoint. So, you are right
and I spend efforts on checking such compatibility when I implement and
testing code for HFS+ driver.
>
> So again, I must thank you for spending the effort on verifying and validating the
> journal entries.
>
> On separate/somewhat unrelated matters, I came across this bug report in
> the recent activities:
>
> https://bugs.launchpad.net/ubuntu/+source/hfsprogs/+bug/680606
>
> I think we addressed both of the 2nd and 3rd entries:
> errors ("Invalid volume file count", "Invalid volume free block count")
> error ("Unused node is not erased")
>
> The first entry looks suspiciously like the problem which I have/had:
> (restore a large directory from time-machine)
> Invoke "rm -r" targeting the directory
> - segfault occurs
> - Various applications freeze, and it's impossible to cleanly shut down
> - OS X Disk Utility reports that the disk can't be repaired
>
> So the disk I manually edited and fixed, still can get the kernel confused (i.e
> the kernel just get confused, if one unmount and reboot, the disk is fsck-clean still),
> just by repeatedly running "du". I found that I need to put the system under
> some memory stress: I can run "du" for a dozen of times on an idle system
> freshly rebooted, but as soon as I have google-chrome running with a few browser
> windows opening (or mozilla/firefox). The kernel driver can get confused if one transverse
> the file system quickly (with a du, or in the above case, with an r"m -rf large directory"),
> especially the system is relatively loaded.
>
I didn't try such way of reproducing the bug. So, maybe I'll try to
reproduce the issue. Unfortunately, I don't know when I'll try it
because I will be really busy during next several weeks.
> Oh, there is still the folder count issue with case-sensitive HFS+ (which AFAIK, just a
> convenience for some file system transversal usage, not actually of any critical function),
> but that's a relatively harmless issue. We can deal with it at some point.
>
Yes, sure.
Thanks,
Vyacheslav Dubeyko.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-01-20 8:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-26 9:41 [PATCH 00/14] hfsplus: introduce journal replay functionality Vyacheslav Dubeyko
-- strict thread matches above, loose matches on Subject: below --
2013-12-26 15:57 Hin-Tak Leung
2013-12-27 6:39 ` Vyacheslav Dubeyko
2014-01-19 1:50 Hin-Tak Leung
2014-01-20 7:54 ` Vyacheslav Dubeyko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).