From: Hin-Tak Leung <hintak.leung@gmail.com>
To: 415fox@gmail.com, linux-fsdevel@vger.kernel.org
Subject: Fwd: [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free
Date: Tue, 30 Jun 2015 16:55:15 +0100 [thread overview]
Message-ID: <CAJMB+Nid61a1toRWFFSckEBwfWbAgJEOS1RQL2Efg2DgqhPieQ@mail.gmail.com> (raw)
In-Reply-To: <1435455570-8577-2-git-send-email-HinTak.Leung@gmail.com>
Michael Fox: FYI. If you feel like responding (and possibly add a "Tested-By:"),
please include liux-fs-devel linux-fsdevel in the reply.
---------- Forwarded message ----------
From: Hin-Tak Leung <hintak.leung@gmail.com>
Date: 28 June 2015 at 02:39
Subject: [PATCH v2] hfs,hfsplus: cache pages correctly between
bnode_create and bnode_free
To: linux-fsdevel@vger.kernel.org
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>, Sergei Antonov
<saproj@gmail.com>, Al Viro <viro@zeniv.linux.org.uk>, Christoph
Hellwig <hch@infradead.org>, Andrew Morton
<akpm@linux-foundation.org>, Vyacheslav Dubeyko <slava@dubeyko.com>,
Sougata Santra <sougata@tuxera.com>
From: Hin-Tak Leung <htl10@users.sourceforge.net>
Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode) are
immediately kmap()'ed and used (both read and write) and kunmap()'ed, and
should not be page_cache_release()'ed until hfs_bnode_free().
This patch fixes a problem I first saw in July 2012: merely running "du" on a
large hfsplus-mounted directory a few times on a reasonably loaded system would
get the hfsplus driver all confused and complaining about B-tree
inconsistencies, and generates a "BUG: Bad page state". Most recently, I can
generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5, by
running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du /mnt"
simultaneously on two windows, where /mnt is a lightly-used QEMU VM image of
the full Mac OS X 10.9:
$ df -i / /home /mnt
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 551665 2725135 17% /
/dev/mapper/fedora-home 52879360 716221 52163139 2% /home
/dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt
After applying the patch, I was able to run "du /" (60+ times) and
"du /mnt" (150+ times) continuously and simultaneously for 6+ hours.
There are many reports of the hfsplus driver getting confused under load and
generating "BUG: Bad page state" or other similar issues over the years. [1]
The unpatched code [2] has always been wrong since it entered the kernel tree.
The only reason why it gets away with it is that the kmap/memcpy/kunmap follow
very quickly after the page_cache_release() so the kernel has not had a chance
to reuse the memory for something else, most of the time.
The current RW driver appears to have followed the design and development of
the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001) had a
B-tree node-centric approach to read_cache_page()/page_cache_release() per
bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of caching
and releasing pages per inode extents. When the current RW code first entered
the kernel [2] in 2005, there was an REF_PAGES conditional (and "//" commented
out code) to switch between B-node centric paging to inode-centric paging.
There was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating the
"REF_PAGES <-> !REF_PAGE" mistake), and the commented-out page_cache_release()
in bnode_release() (which should be spanned by !REF_PAGES) was never enabled.
References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")
Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761
[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:36 2004 -0800
[PATCH] HFS rewrite
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd
commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:48 2004 -0800
[PATCH] HFS+ support
[3]:
http://sourceforge.net/projects/linux-hfsplus/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5
Date: Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.
[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d
commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date: Tue Sep 6 15:18:47 2005 -0700
[PATCH] hfs: remove debug code
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
---
fs/hfs/bnode.c | 9 ++++-----
fs/hfsplus/bnode.c | 3 ---
2 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index d3fa6bd..221719e 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct
hfs_btree *tree, u32 cnid)
page_cache_release(page);
goto fail;
}
- page_cache_release(page);
node->page[i] = page;
}
@@ -398,11 +397,11 @@ node_error:
void hfs_bnode_free(struct hfs_bnode *node)
{
- //int i;
+ int i;
- //for (i = 0; i < node->tree->pages_per_bnode; i++)
- // if (node->page[i])
- // page_cache_release(node->page[i]);
+ for (i = 0; i < node->tree->pages_per_bnode; i++)
+ if (node->page[i])
+ page_cache_release(node->page[i]);
kfree(node);
}
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 759708f..6392466 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct
hfs_btree *tree, u32 cnid)
page_cache_release(page);
goto fail;
}
- page_cache_release(page);
node->page[i] = page;
}
@@ -566,13 +565,11 @@ node_error:
void hfs_bnode_free(struct hfs_bnode *node)
{
-#if 0
int i;
for (i = 0; i < node->tree->pages_per_bnode; i++)
if (node->page[i])
page_cache_release(node->page[i]);
-#endif
kfree(node);
}
--
2.4.3
next prev parent reply other threads:[~2015-06-30 15:55 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-28 1:39 [PATCH 0/2] two patches about B-tree corruptions in hfs and hfsplus Hin-Tak Leung
2015-06-28 1:39 ` [PATCH v2] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Hin-Tak Leung
2015-06-28 18:52 ` Sergei Antonov
2015-06-30 15:40 ` Hin-Tak Leung
2015-07-01 16:09 ` Sergei Antonov
2015-07-01 23:24 ` Hin-Tak Leung
2015-07-02 17:04 ` Sergei Antonov
2015-07-02 18:02 ` Hin-Tak Leung
2015-07-07 22:19 ` Andrew Morton
2015-07-07 23:19 ` Sergei Antonov
2015-07-08 14:58 ` Sergei Antonov
2015-07-08 15:29 ` Hin-Tak Leung
2015-08-03 13:33 ` Luc Pionchon
2015-06-30 15:55 ` Hin-Tak Leung [this message]
[not found] ` <CABbL6oanheE9JAwevN8Mrf-5o=y-e7JV2Nt4VW_-iW9Pt3jQuQ@mail.gmail.com>
[not found] ` <CABbL6oZ2vQsU9xG+vh_GkmfHHdcZvSYPcZYjUpnZq-r3b_HjPQ@mail.gmail.com>
2015-07-01 23:29 ` Hin-Tak Leung
2015-06-28 1:39 ` [PATCH] hfs: fix B-tree corruption after insertion at position 0 Hin-Tak Leung
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJMB+Nid61a1toRWFFSckEBwfWbAgJEOS1RQL2Efg2DgqhPieQ@mail.gmail.com \
--to=hintak.leung@gmail.com \
--cc=415fox@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).