From: Willy Tarreau <w@1wt.eu>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>,
Sergei Antonov <saproj@gmail.com>,
Anton Altaparmakov <anton@tuxera.com>,
Sasha Levin <sasha.levin@oracle.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
Vyacheslav Dubeyko <slava@dubeyko.com>,
Sougata Santra <sougata@tuxera.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ben Hutchings <ben@decadent.org.uk>, Willy Tarreau <w@1wt.eu>
Subject: [PATCH 2.6.32 20/38] [PATCH 20/38] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free
Date: Sun, 29 Nov 2015 22:47:22 +0100 [thread overview]
Message-ID: <20151129214703.726752716@1wt.eu> (raw)
In-Reply-To: <8acf8256ccc72771a80b7851061027bc@local>
2.6.32-longterm review patch. If anyone has any objections, please let me know.
------------------
commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream.
Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().
This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state". Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:
$ df -i / /home /mnt
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 551665 2725135 17% /
/dev/mapper/fedora-home 52879360 716221 52163139 2% /home
/dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt
After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.
There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years. [1]
The unpatched code [2] has always been wrong since it entered the kernel
tree. The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.
The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents. When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging. There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.
References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")
Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761
[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:36 2004 -0800
[PATCH] HFS rewrite
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd
commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:48 2004 -0800
[PATCH] HFS+ support
[3]:
http://sourceforge.net/projects/linux-hfsplus/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5
Date: Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.
[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d
commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date: Tue Sep 6 15:18:47 2005 -0700
[PATCH] hfs: remove debug code
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit dd04e674cde34f570509b9e2a6b549af89897640)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
fs/hfs/bnode.c | 9 ++++-----
fs/hfsplus/bnode.c | 9 ++++-----
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index 0d20006..3136308 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -286,7 +286,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
page_cache_release(page);
goto fail;
}
- page_cache_release(page);
node->page[i] = page;
}
@@ -396,11 +395,11 @@ node_error:
void hfs_bnode_free(struct hfs_bnode *node)
{
- //int i;
+ int i;
- //for (i = 0; i < node->tree->pages_per_bnode; i++)
- // if (node->page[i])
- // page_cache_release(node->page[i]);
+ for (i = 0; i < node->tree->pages_per_bnode; i++)
+ if (node->page[i])
+ page_cache_release(node->page[i]);
kfree(node);
}
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 29da657..7d75904 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -446,7 +446,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
page_cache_release(page);
goto fail;
}
- page_cache_release(page);
node->page[i] = page;
}
@@ -556,11 +555,11 @@ node_error:
void hfs_bnode_free(struct hfs_bnode *node)
{
- //int i;
+ int i;
- //for (i = 0; i < node->tree->pages_per_bnode; i++)
- // if (node->page[i])
- // page_cache_release(node->page[i]);
+ for (i = 0; i < node->tree->pages_per_bnode; i++)
+ if (node->page[i])
+ page_cache_release(node->page[i]);
kfree(node);
}
--
1.7.12.2.21.g234cd45.dirty
next prev parent reply other threads:[~2015-11-29 21:47 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <8acf8256ccc72771a80b7851061027bc@local>
2015-11-29 21:47 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 01/38] [PATCH 01/38] dcache: Handle escaped paths in prepend_path Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 03/38] [PATCH 03/38] md: use kzalloc() when bitmap is disabled Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 04/38] [PATCH 04/38] ipv6: addrconf: validate new MTU before applying it Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 05/38] [PATCH 05/38] virtio-net: drop NETIF_F_FRAGLIST Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 06/38] [PATCH 06/38] USB: whiteheat: fix potential null-deref at probe Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 07/38] [PATCH 07/38] ipc/sem.c: fully initialize sem_array before making it visible Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 08/38] [PATCH 08/38] Initialize msg/shm IPC objects before doing ipc_addid() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 10/38] [PATCH 10/38] rds: fix an integer overflow test in rds_info_getsockopt() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 11/38] [PATCH 11/38] net: Clone skb before setting peeked flag Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 12/38] [PATCH 12/38] net: Fix skb_set_peeked use-after-free bug Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 13/38] [PATCH 13/38] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 14/38] [PATCH 14/38] devres: fix devres_get() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 15/38] [PATCH 15/38] windfarm: decrement client count when unregistering Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 16/38] [PATCH 16/38] xfs: Fix xfs_attr_leafblock definition Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 17/38] [PATCH 17/38] SUNRPC: xs_reset_transport must mark the connection as disconnected Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 18/38] [PATCH 18/38] Input: evdev - do not report errors form flush() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
2015-11-30 1:54 ` Ben Hutchings
2015-11-30 7:01 ` Willy Tarreau
2015-11-30 11:30 ` Willy Tarreau
2015-11-30 11:49 ` Konstantin Khlebnikov
2015-11-30 12:13 ` Willy Tarreau
2015-11-30 14:55 ` Ben Hutchings
2015-11-30 15:14 ` Willy Tarreau
2015-11-29 21:47 ` Willy Tarreau [this message]
2015-11-29 21:47 ` [PATCH 2.6.32 21/38] [PATCH 21/38] hfs: fix B-tree corruption after insertion at position 0 Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 22/38] [PATCH 22/38] x86/paravirt: Replace the paravirt nop with a bona fide empty function Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 23/38] [PATCH 23/38] RDS: verify the underlying transport exists before creating a connection Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 24/38] [PATCH 24/38] net: Fix skb csum races when peeking Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 25/38] [PATCH 25/38] net: add length argument to skb_copy_and_csum_datagram_iovec Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 26/38] [PATCH 26/38] module: Fix locking in symbol_put_addr() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 27/38] [PATCH 27/38] x86/process: Add proper bound checks in 64bit get_wchan() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 28/38] [PATCH 28/38] mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 29/38] [PATCH 29/38] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 31/38] [PATCH 31/38] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 32/38] [PATCH 32/38] HID: core: Avoid uninitialized buffer access Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 33/38] [PATCH 33/38] devres: fix a for loop bounds check Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 34/38] [PATCH 34/38] binfmt_elf: Dont clobber passed executables file header Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 35/38] [PATCH 35/38] RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 36/38] [PATCH 36/38] ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 37/38] [PATCH 37/38] net: avoid NULL deref in inet_ctl_sock_destroy() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 38/38] [PATCH 38/38] splice: sendfile() at once fails for big files Willy Tarreau
2015-11-30 1:25 ` [PATCH 2.6.32 09/38] [PATCH 09/38] xhci: fix off by one error in TRB DMA address boundary check Willy Tarreau
2015-11-30 2:04 ` [PATCH 2.6.32 30/38] [PATCH 30/38] mvsas: Fix NULL pointer dereference in mvs_slot_task_free Willy Tarreau
2015-11-30 2:42 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
2015-11-30 6:51 ` Willy Tarreau
2015-11-30 11:23 ` Willy Tarreau
2015-11-30 14:43 ` Ben Hutchings
2015-11-30 15:10 ` Willy Tarreau
[not found] ` <20151129214702.957590241@1wt.eu>
2015-11-30 6:44 ` [PATCH 2.6.32 02/38] [PATCH 02/38] Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount Willy Tarreau
[not found] ` <abcd326bbd6833b04b56738cd1734419@local>
2015-11-30 16:04 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Willy Tarreau
2015-11-30 16:04 ` [PATCH 2.6.32 39/38] vfs: Test for and handle paths that are unreachable from their mnt_root Willy Tarreau
2015-11-30 16:05 ` [PATCH 2.6.32 40/38] security: add cred argument to security_capable() Willy Tarreau
2015-11-30 16:05 ` [PATCH 2.6.32 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
2015-12-01 0:43 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
2015-12-01 6:57 ` Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151129214703.726752716@1wt.eu \
--to=w@1wt.eu \
--cc=akpm@linux-foundation.org \
--cc=anton@tuxera.com \
--cc=ben@decadent.org.uk \
--cc=hch@infradead.org \
--cc=htl10@users.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=saproj@gmail.com \
--cc=sasha.levin@oracle.com \
--cc=slava@dubeyko.com \
--cc=sougata@tuxera.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).