From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, stable@kernel.org,
Dave Chinner <dchinner@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Aaron Lu <aaron.lu@linux.alibaba.com>
Subject: [PATCH 4.4 55/65] fs: dont scan the inode cache before SB_BORN is set
Date: Mon, 4 Feb 2019 11:36:48 +0100 [thread overview]
Message-ID: <20190204103619.707605776@linuxfoundation.org> (raw)
In-Reply-To: <20190204103610.583715954@linuxfoundation.org>
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner <dchinner@redhat.com>
commit 79f546a696bff2590169fb5684e23d65f4d9f591 upstream.
We recently had an oops reported on a 4.14 kernel in
xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
and so the m_perag_tree lookup walked into lala land. It produces
an oops down this path during the failed mount:
radix_tree_gang_lookup_tag+0xc4/0x130
xfs_perag_get_tag+0x37/0xf0
xfs_reclaim_inodes_count+0x32/0x40
xfs_fs_nr_cached_objects+0x11/0x20
super_cache_count+0x35/0xc0
shrink_slab.part.66+0xb1/0x370
shrink_node+0x7e/0x1a0
try_to_free_pages+0x199/0x470
__alloc_pages_slowpath+0x3a1/0xd20
__alloc_pages_nodemask+0x1c3/0x200
cache_grow_begin+0x20b/0x2e0
fallback_alloc+0x160/0x200
kmem_cache_alloc+0x111/0x4e0
The problem is that the superblock shrinker is running before the
filesystem structures it depends on have been fully set up. i.e.
the shrinker is registered in sget(), before ->fill_super() has been
called, and the shrinker can call into the filesystem before
fill_super() does it's setup work. Essentially we are exposed to
both use-after-free and use-before-initialisation bugs here.
To fix this, add a check for the SB_BORN flag in super_cache_count.
In general, this flag is not set until ->fs_mount() completes
successfully, so we know that it is set after the filesystem
setup has completed. This matches the trylock_super() behaviour
which will not let super_cache_scan() run if SB_BORN is not set, and
hence will not allow the superblock shrinker from entering the
filesystem while it is being set up or after it has failed setup
and is being torn down.
Cc: stable@kernel.org
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Aaron Lu <aaron.lu@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/super.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
--- a/fs/super.c
+++ b/fs/super.c
@@ -118,13 +118,23 @@ static unsigned long super_cache_count(s
sb = container_of(shrink, struct super_block, s_shrink);
/*
- * Don't call trylock_super as it is a potential
- * scalability bottleneck. The counts could get updated
- * between super_cache_count and super_cache_scan anyway.
- * Call to super_cache_count with shrinker_rwsem held
- * ensures the safety of call to list_lru_shrink_count() and
- * s_op->nr_cached_objects().
+ * We don't call trylock_super() here as it is a scalability bottleneck,
+ * so we're exposed to partial setup state. The shrinker rwsem does not
+ * protect filesystem operations backing list_lru_shrink_count() or
+ * s_op->nr_cached_objects(). Counts can change between
+ * super_cache_count and super_cache_scan, so we really don't need locks
+ * here.
+ *
+ * However, if we are currently mounting the superblock, the underlying
+ * filesystem might be in a state of partial construction and hence it
+ * is dangerous to access it. trylock_super() uses a MS_BORN check to
+ * avoid this situation, so do the same here. The memory barrier is
+ * matched with the one in mount_fs() as we don't hold locks here.
*/
+ if (!(sb->s_flags & MS_BORN))
+ return 0;
+ smp_rmb();
+
if (sb->s_op && sb->s_op->nr_cached_objects)
total_objects = sb->s_op->nr_cached_objects(sb, sc);
@@ -1133,6 +1143,14 @@ mount_fs(struct file_system_type *type,
sb = root->d_sb;
BUG_ON(!sb);
WARN_ON(!sb->s_bdi);
+
+ /*
+ * Write barrier is for super_cache_count(). We place it before setting
+ * MS_BORN as the data dependency between the two functions is the
+ * superblock structure contents that we just set up, not the MS_BORN
+ * flag.
+ */
+ smp_wmb();
sb->s_flags |= MS_BORN;
error = security_sb_kern_mount(sb, flags, secdata);
next prev parent reply other threads:[~2019-02-04 10:41 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-04 10:35 [PATCH 4.4 00/65] 4.4.173-stable review Greg Kroah-Hartman
2019-02-04 10:35 ` [PATCH 4.4 01/65] net: Fix usage of pskb_trim_rcsum Greg Kroah-Hartman
2019-02-04 10:35 ` [PATCH 4.4 02/65] openvswitch: Avoid OOB read when parsing flow nlattrs Greg Kroah-Hartman
2019-02-04 10:35 ` [PATCH 4.4 03/65] net: ipv4: Fix memory leak in network namespace dismantle Greg Kroah-Hartman
2019-02-04 10:35 ` [PATCH 4.4 04/65] net_sched: refetch skb protocol for each filter Greg Kroah-Hartman
2019-02-04 10:35 ` [PATCH 4.4 05/65] net: bridge: Fix ethernet header pointer before check skb forwardable Greg Kroah-Hartman
2019-02-04 10:35 ` [PATCH 4.4 06/65] mmc: Kconfig: Enable CONFIG_MMC_SDHCI_IO_ACCESSORS Greg Kroah-Hartman
2019-02-04 11:05 ` Georgi Djakov
2019-02-04 11:13 ` Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 07/65] USB: serial: simple: add Motorola Tetra TPG2200 device id Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 08/65] USB: serial: pl2303: add new PID to support PL2303TB Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 09/65] ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 10/65] ARC: perf: map generic branches to correct hardware condition Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 11/65] s390/early: improve machine detection Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 12/65] s390/smp: fix CPU hotplug deadlock with CPU rescan Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 13/65] char/mwave: fix potential Spectre v1 vulnerability Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 14/65] staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1 Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 15/65] tty: Handle problem if line discipline does not have receive_buf Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 16/65] tty/n_hdlc: fix __might_sleep warning Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 17/65] CIFS: Fix possible hang during async MTU reads and writes Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 18/65] Input: xpad - add support for SteelSeries Stratus Duo Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 19/65] KVM: x86: Fix single-step debugging Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 20/65] x86/kaslr: Fix incorrect i8254 outb() parameters Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 21/65] can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 22/65] can: bcm: check timer values before ktime conversion Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 23/65] vt: invoke notifier on screen size change Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 24/65] perf unwind: Unwind with libdw doesnt take symfs into account Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 25/65] perf unwind: Take pgoff into account when reporting elf to libdwfl Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 26/65] irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 27/65] arm64: mm: remove page_mapping check in __sync_icache_dcache Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 28/65] f2fs: read page index before freeing Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 29/65] Revert "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()" Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 30/65] Revert "loop: Get rid of loop_index_mutex" Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 31/65] Revert "loop: Fold __loop_release into loop_release" Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 32/65] s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 33/65] fs: add the fsnotify call to vfs_iter_write Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 34/65] ipv6: Consider sk_bound_dev_if when binding a socket to an address Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 35/65] l2tp: copy 4 more bytes to linear part if necessary Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 36/65] net/mlx4_core: Add masking for a few queries on HCA caps Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 37/65] netrom: switch to sock timer API Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 38/65] net/rose: fix NULL ax25_cb kernel panic Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 39/65] ucc_geth: Reset BQL queue when stopping device Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 40/65] l2tp: remove l2specific_len dependency in l2tp_core Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 41/65] l2tp: fix reading optional fields of L2TPv3 Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 42/65] CIFS: Do not count -ENODATA as failure for query directory Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 43/65] fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 44/65] ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 45/65] arm64: hyp-stub: Forbid kprobing of the hyp-stub Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 46/65] gfs2: Revert "Fix loop in gfs2_rbm_find" Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 47/65] platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 48/65] platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 49/65] mmc: sdhci-iproc: handle mmc_of_parse() errors during probe Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 50/65] kernel/exit.c: release ptraced tasks before zap_pid_ns_processes Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 51/65] mm, oom: fix use-after-free in oom_kill_process Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 52/65] cifs: Always resolve hostname before reconnecting Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 53/65] drivers: core: Remove glue dirs from sysfs earlier Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 54/65] mm: migrate: dont rely on __PageMovable() of newpage after unlocking it Greg Kroah-Hartman
2019-02-04 10:36 ` Greg Kroah-Hartman [this message]
2019-02-04 10:36 ` [PATCH 4.4 56/65] ip: discard IPv4 datagrams with overlapping segments Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 57/65] net: modify skb_rbtree_purge to return the truesize of all purged skbs Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 58/65] inet: frags: get rif of inet_frag_evicting() Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 59/65] ip: use rb trees for IP frag queue Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 60/65] ipv6: defrag: drop non-last frags smaller than min mtu Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 61/65] ip: add helpers to process in-order fragments faster Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 62/65] ip: process in-order fragments efficiently Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 63/65] net: ipv4: do not handle duplicate fragments as overlapping Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 64/65] ip: frags: fix crash in ip_do_fragment() Greg Kroah-Hartman
2019-02-04 10:36 ` [PATCH 4.4 65/65] ipv4: frags: precedence bug in ip_expire() Greg Kroah-Hartman
2019-02-04 22:48 ` [PATCH 4.4 00/65] 4.4.173-stable review Guenter Roeck
2019-02-05 14:42 ` Greg Kroah-Hartman
2019-02-05 15:12 ` Guenter Roeck
2019-02-05 6:24 ` Naresh Kamboju
2019-02-05 10:17 ` Jon Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190204103619.707605776@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=aaron.lu@linux.alibaba.com \
--cc=dchinner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@kernel.org \
--cc=stable@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).