From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg KH <gregkh@linuxfoundation.org>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Jian Li <jiali@redhat.com>,
Jeff Layton <jlayton@redhat.com>,
Steve French <smfrench@gmail.com>
Subject: [ 23/23] cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
Date: Thu, 26 Jul 2012 14:20:02 -0700 [thread overview]
Message-ID: <20120726211407.996441174@linuxfoundation.org> (raw)
In-Reply-To: <20120726211405.959857593@linuxfoundation.org>
From: Greg KH <gregkh@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton <jlayton@redhat.com>
commit 3cf003c08be785af4bee9ac05891a15bcbff856a upstream.
[The async read code was broadened to include uncached reads in 3.5, so
the mainline patch did not apply directly. This patch is just a backport
to account for that change.]
Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:
crash> bt
PID: 2789 TASK: f02edaa0 CPU: 3 COMMAND: "fsx"
#0 [eed63cbc] schedule at c083c5b3
#1 [eed63d80] kmap_high at c0500ec8
#2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
#3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
#4 [eed63e50] do_writepages at c04f3e32
#5 [eed63e54] __filemap_fdatawrite_range at c04e152a
#6 [eed63ea4] filemap_fdatawrite at c04e1b3e
#7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
#8 [eed63ecc] do_sync_write at c052d202
#9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
EAX: 00000004 EBX: 00000003 ECX: abd73b73 EDX: 012a65c6
DS: 007b ESI: 012a65c6 ES: 007b EDI: 00000000
SS: 007b ESP: bf8db178 EBP: bf8db1f8 GS: 0033
CS: 0073 EIP: 40000424 ERR: 00000004 EFLAGS: 00000246
Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.
This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.
There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/cifs/cifssmb.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -89,6 +89,32 @@ static struct {
/* Forward declarations */
static void cifs_readv_complete(struct work_struct *work);
+#ifdef CONFIG_HIGHMEM
+/*
+ * On arches that have high memory, kmap address space is limited. By
+ * serializing the kmap operations on those arches, we ensure that we don't
+ * end up with a bunch of threads in writeback with partially mapped page
+ * arrays, stuck waiting for kmap to come back. That situation prevents
+ * progress and can deadlock.
+ */
+static DEFINE_MUTEX(cifs_kmap_mutex);
+
+static inline void
+cifs_kmap_lock(void)
+{
+ mutex_lock(&cifs_kmap_mutex);
+}
+
+static inline void
+cifs_kmap_unlock(void)
+{
+ mutex_unlock(&cifs_kmap_mutex);
+}
+#else /* !CONFIG_HIGHMEM */
+#define cifs_kmap_lock() do { ; } while(0)
+#define cifs_kmap_unlock() do { ; } while(0)
+#endif /* CONFIG_HIGHMEM */
+
/* Mark as invalid, all open files on tree connections since they
were closed when session to server was lost */
static void mark_open_files_invalid(struct cifs_tcon *pTcon)
@@ -1557,6 +1583,7 @@ cifs_readv_receive(struct TCP_Server_Inf
eof_index = eof ? (eof - 1) >> PAGE_CACHE_SHIFT : 0;
cFYI(1, "eof=%llu eof_index=%lu", eof, eof_index);
+ cifs_kmap_lock();
list_for_each_entry_safe(page, tpage, &rdata->pages, lru) {
if (remaining >= PAGE_CACHE_SIZE) {
/* enough data to fill the page */
@@ -1606,6 +1633,7 @@ cifs_readv_receive(struct TCP_Server_Inf
page_cache_release(page);
}
}
+ cifs_kmap_unlock();
/* issue the read if we have any iovecs left to fill */
if (rdata->nr_iov > 1) {
@@ -2194,7 +2222,9 @@ cifs_async_writev(struct cifs_writedata
* and set the iov_len properly for each one. It may also set
* wdata->bytes too.
*/
+ cifs_kmap_lock();
wdata->marshal_iov(iov, wdata);
+ cifs_kmap_unlock();
cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
prev parent reply other threads:[~2012-07-26 21:22 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-26 21:14 [ 00/23] 3.4.7-stable review Greg KH
2012-07-26 21:19 ` [ 01/23] md: avoid crash when stopping md array races with closing other open fds Greg Kroah-Hartman
2012-07-26 21:19 ` [ 02/23] md/raid1: close some possible races on write errors during resync Greg Kroah-Hartman
2012-07-26 21:19 ` [ 03/23] cifs: always update the inode cache with the results from a FIND_* Greg Kroah-Hartman
2012-07-26 21:19 ` [ 04/23] cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space Greg Kroah-Hartman
2012-07-26 21:19 ` [ 05/23] target: Clean up returning errors in PR handling code Greg Kroah-Hartman
2012-07-26 21:19 ` [ 06/23] target: Fix range calculation in WRITE SAME emulation when num blocks == 0 Greg Kroah-Hartman
2012-07-26 21:19 ` [ 07/23] ntp: Fix STA_INS/DEL clearing bug Greg Kroah-Hartman
2012-07-26 21:19 ` [ 08/23] tcm_fc: Fix crash seen with aborts and large reads Greg Kroah-Hartman
2012-07-26 21:19 ` [ 09/23] ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT Greg Kroah-Hartman
2012-07-26 21:19 ` [ 10/23] mm: fix lost kswapd wakeup in kswapd_stop() Greg Kroah-Hartman
2012-07-26 21:19 ` [ 11/23] HID: add battery quirk for Apple Wireless ANSI Greg Kroah-Hartman
2012-07-26 21:19 ` [ 12/23] HID: add Sennheiser BTD500USB device support Greg Kroah-Hartman
2012-07-26 21:19 ` [ 13/23] HID: multitouch: Add support for Baanto touchscreen Greg Kroah-Hartman
2012-07-26 21:19 ` [ 14/23] MIPS: Properly align the .data..init_task section Greg Kroah-Hartman
2012-07-26 21:19 ` [ 15/23] UBIFS: fix a bug in empty space fix-up Greg Kroah-Hartman
2012-07-26 21:19 ` [ 16/23] ore: Fix NFS crash by supporting any unaligned RAID IO Greg Kroah-Hartman
2012-07-26 21:19 ` [ 17/23] ore: Remove support of partial IO request (NFS crash) Greg Kroah-Hartman
2012-07-26 21:19 ` [ 18/23] pnfs-obj: dont leak objio_state if ore_write/read fails Greg Kroah-Hartman
2012-07-26 21:19 ` [ 19/23] dm thin: do not send discards to shared blocks Greg Kroah-Hartman
2012-07-26 21:19 ` [ 20/23] dm raid1: fix crash with mirror recovery and discard Greg Kroah-Hartman
2012-07-26 21:20 ` [ 21/23] dm raid1: set discard_zeroes_data_unsupported Greg Kroah-Hartman
2012-07-26 21:20 ` [ 22/23] ARM: SAMSUNG: Update default rate for xusbxti clock Greg Kroah-Hartman
2012-07-26 21:20 ` Greg Kroah-Hartman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120726211407.996441174@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=jiali@redhat.com \
--cc=jlayton@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=smfrench@gmail.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox