From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg KH <gregkh@linuxfoundation.org>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Jian Li <jiali@redhat.com>,
Jeff Layton <jlayton@redhat.com>,
Steve French <smfrench@gmail.com>
Subject: [ 23/23] cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
Date: Thu, 26 Jul 2012 14:20:02 -0700 [thread overview]
Message-ID: <20120726211407.996441174@linuxfoundation.org> (raw)
In-Reply-To: <20120726211405.959857593@linuxfoundation.org>
From: Greg KH <gregkh@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton <jlayton@redhat.com>
commit 3cf003c08be785af4bee9ac05891a15bcbff856a upstream.
[The async read code was broadened to include uncached reads in 3.5, so
the mainline patch did not apply directly. This patch is just a backport
to account for that change.]
Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:
crash> bt
PID: 2789 TASK: f02edaa0 CPU: 3 COMMAND: "fsx"
#0 [eed63cbc] schedule at c083c5b3
#1 [eed63d80] kmap_high at c0500ec8
#2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
#3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
#4 [eed63e50] do_writepages at c04f3e32
#5 [eed63e54] __filemap_fdatawrite_range at c04e152a
#6 [eed63ea4] filemap_fdatawrite at c04e1b3e
#7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
#8 [eed63ecc] do_sync_write at c052d202
#9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
EAX: 00000004 EBX: 00000003 ECX: abd73b73 EDX: 012a65c6
DS: 007b ESI: 012a65c6 ES: 007b EDI: 00000000
SS: 007b ESP: bf8db178 EBP: bf8db1f8 GS: 0033
CS: 0073 EIP: 40000424 ERR: 00000004 EFLAGS: 00000246
Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.
This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.
There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/cifs/cifssmb.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -89,6 +89,32 @@ static struct {
/* Forward declarations */
static void cifs_readv_complete(struct work_struct *work);
+#ifdef CONFIG_HIGHMEM
+/*
+ * On arches that have high memory, kmap address space is limited. By
+ * serializing the kmap operations on those arches, we ensure that we don't
+ * end up with a bunch of threads in writeback with partially mapped page
+ * arrays, stuck waiting for kmap to come back. That situation prevents
+ * progress and can deadlock.
+ */
+static DEFINE_MUTEX(cifs_kmap_mutex);
+
+static inline void
+cifs_kmap_lock(void)
+{
+ mutex_lock(&cifs_kmap_mutex);
+}
+
+static inline void
+cifs_kmap_unlock(void)
+{
+ mutex_unlock(&cifs_kmap_mutex);
+}
+#else /* !CONFIG_HIGHMEM */
+#define cifs_kmap_lock() do { ; } while(0)
+#define cifs_kmap_unlock() do { ; } while(0)
+#endif /* CONFIG_HIGHMEM */
+
/* Mark as invalid, all open files on tree connections since they
were closed when session to server was lost */
static void mark_open_files_invalid(struct cifs_tcon *pTcon)
@@ -1557,6 +1583,7 @@ cifs_readv_receive(struct TCP_Server_Inf
eof_index = eof ? (eof - 1) >> PAGE_CACHE_SHIFT : 0;
cFYI(1, "eof=%llu eof_index=%lu", eof, eof_index);
+ cifs_kmap_lock();
list_for_each_entry_safe(page, tpage, &rdata->pages, lru) {
if (remaining >= PAGE_CACHE_SIZE) {
/* enough data to fill the page */
@@ -1606,6 +1633,7 @@ cifs_readv_receive(struct TCP_Server_Inf
page_cache_release(page);
}
}
+ cifs_kmap_unlock();
/* issue the read if we have any iovecs left to fill */
if (rdata->nr_iov > 1) {
@@ -2194,7 +2222,9 @@ cifs_async_writev(struct cifs_writedata
* and set the iov_len properly for each one. It may also set
* wdata->bytes too.
*/
+ cifs_kmap_lock();
wdata->marshal_iov(iov, wdata);
+ cifs_kmap_unlock();
cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
prev parent reply other threads:[~2012-07-26 21:22 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-26 21:14 [ 00/23] 3.4.7-stable review Greg KH
2012-07-26 21:19 ` [ 01/23] md: avoid crash when stopping md array races with closing other open fds Greg Kroah-Hartman
2012-07-26 21:19 ` [ 02/23] md/raid1: close some possible races on write errors during resync Greg Kroah-Hartman
2012-07-26 21:19 ` [ 03/23] cifs: always update the inode cache with the results from a FIND_* Greg Kroah-Hartman
2012-07-26 21:19 ` [ 04/23] cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space Greg Kroah-Hartman
2012-07-26 21:19 ` [ 05/23] target: Clean up returning errors in PR handling code Greg Kroah-Hartman
2012-07-26 21:19 ` [ 06/23] target: Fix range calculation in WRITE SAME emulation when num blocks == 0 Greg Kroah-Hartman
2012-07-26 21:19 ` [ 07/23] ntp: Fix STA_INS/DEL clearing bug Greg Kroah-Hartman
2012-07-26 21:19 ` [ 08/23] tcm_fc: Fix crash seen with aborts and large reads Greg Kroah-Hartman
2012-07-26 21:19 ` [ 09/23] ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT Greg Kroah-Hartman
2012-07-26 21:19 ` [ 10/23] mm: fix lost kswapd wakeup in kswapd_stop() Greg Kroah-Hartman
2012-07-26 21:19 ` [ 11/23] HID: add battery quirk for Apple Wireless ANSI Greg Kroah-Hartman
2012-07-26 21:19 ` [ 12/23] HID: add Sennheiser BTD500USB device support Greg Kroah-Hartman
2012-07-26 21:19 ` [ 13/23] HID: multitouch: Add support for Baanto touchscreen Greg Kroah-Hartman
2012-07-26 21:19 ` [ 14/23] MIPS: Properly align the .data..init_task section Greg Kroah-Hartman
2012-07-26 21:19 ` Greg Kroah-Hartman
2012-07-26 21:19 ` Greg Kroah-Hartman
2012-07-26 21:19 ` [ 15/23] UBIFS: fix a bug in empty space fix-up Greg Kroah-Hartman
2012-07-26 21:19 ` [ 16/23] ore: Fix NFS crash by supporting any unaligned RAID IO Greg Kroah-Hartman
2012-07-26 21:19 ` [ 17/23] ore: Remove support of partial IO request (NFS crash) Greg Kroah-Hartman
2012-07-26 21:19 ` [ 18/23] pnfs-obj: dont leak objio_state if ore_write/read fails Greg Kroah-Hartman
2012-07-26 21:19 ` [ 19/23] dm thin: do not send discards to shared blocks Greg Kroah-Hartman
2012-07-26 21:19 ` [ 20/23] dm raid1: fix crash with mirror recovery and discard Greg Kroah-Hartman
2012-07-26 21:20 ` [ 21/23] dm raid1: set discard_zeroes_data_unsupported Greg Kroah-Hartman
2012-07-26 21:20 ` [ 22/23] ARM: SAMSUNG: Update default rate for xusbxti clock Greg Kroah-Hartman
2012-07-26 21:20 ` Greg Kroah-Hartman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120726211407.996441174@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=jiali@redhat.com \
--cc=jlayton@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=smfrench@gmail.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.