From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Nikos Tsironis <ntsironis@arrikto.com>,
Ilias Tsitsimpis <iliastsi@arrikto.com>,
Mike Snitzer <snitzer@redhat.com>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 3.18 47/52] dm kcopyd: Fix bug causing workqueue stalls
Date: Thu, 24 Jan 2019 20:20:11 +0100 [thread overview]
Message-ID: <20190124190150.494710956@linuxfoundation.org> (raw)
In-Reply-To: <20190124190140.879495253@linuxfoundation.org>
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit d7e6b8dfc7bcb3f4f3a18313581f67486a725b52 ]
When using kcopyd to run callbacks through dm_kcopyd_do_callback() or
submitting copy jobs with a source size of 0, the jobs are pushed
directly to the complete_jobs list, which could be under processing by
the kcopyd thread. As a result, the kcopyd thread can continue running
completed jobs indefinitely, without releasing the CPU, as long as
someone keeps submitting new completed jobs through the aforementioned
paths. Processing of work items, queued for execution on the same CPU as
the currently running kcopyd thread, is thus stalled for excessive
amounts of time, hurting performance.
Running the following test, from the device mapper test suite [1],
dmtest run --suite snapshot -n parallel_io_to_many_snaps_N
, with 8 active snapshots, we get, in dmesg, messages like the
following:
[68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s!
[68899.949282] Showing busy workqueues and worker pools:
[68899.949288] workqueue events: flags=0x0
[68899.949295] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[68899.949306] pending: vmstat_shepherd, cache_reap
[68899.949331] workqueue mm_percpu_wq: flags=0x8
[68899.949337] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949345] pending: vmstat_update
[68899.949387] workqueue dm_bufio_cache: flags=0x8
[68899.949392] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[68899.949400] pending: work_fn [dm_bufio]
[68899.949423] workqueue kcopyd: flags=0x8
[68899.949429] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949437] pending: do_work [dm_mod]
[68899.949452] workqueue kcopyd: flags=0x8
[68899.949458] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[68899.949466] in-flight: 13:do_work [dm_mod]
[68899.949474] pending: do_work [dm_mod]
[68899.949487] workqueue kcopyd: flags=0x8
[68899.949493] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949501] pending: do_work [dm_mod]
[68899.949515] workqueue kcopyd: flags=0x8
[68899.949521] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949529] pending: do_work [dm_mod]
[68899.949541] workqueue kcopyd: flags=0x8
[68899.949547] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949555] pending: do_work [dm_mod]
[68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084
Fix this by splitting the complete_jobs list into two parts: A user
facing part, named callback_jobs, and one used internally by kcopyd,
retaining the name complete_jobs. dm_kcopyd_do_callback() and
dispatch_job() now push their jobs to the callback_jobs list, which is
spliced to the complete_jobs list once, every time the kcopyd thread
wakes up. This prevents kcopyd from hogging the CPU indefinitely and
causing workqueue stalls.
Re-running the aforementioned test:
* Workqueue stalls are eliminated
* The maximum writing time among all targets is reduced from 09m37.10s
to 06m04.85s and the total run time of the test is reduced from
10m43.591s to 7m19.199s
[1] https://github.com/jthornber/device-mapper-test-suite
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/dm-kcopyd.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 77833b65e01e..9e626e5df2f6 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -55,15 +55,17 @@ struct dm_kcopyd_client {
struct dm_kcopyd_throttle *throttle;
/*
- * We maintain three lists of jobs:
+ * We maintain four lists of jobs:
*
* i) jobs waiting for pages
* ii) jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that have completed.
+ * iii) jobs that don't need to do any IO and just run a callback
+ * iv) jobs that have completed.
*
- * All three of these are protected by job_lock.
+ * All four of these are protected by job_lock.
*/
spinlock_t job_lock;
+ struct list_head callback_jobs;
struct list_head complete_jobs;
struct list_head io_jobs;
struct list_head pages_jobs;
@@ -583,6 +585,7 @@ static void do_work(struct work_struct *work)
struct dm_kcopyd_client *kc = container_of(work,
struct dm_kcopyd_client, kcopyd_work);
struct blk_plug plug;
+ unsigned long flags;
/*
* The order that these are called is *very* important.
@@ -591,6 +594,10 @@ static void do_work(struct work_struct *work)
* list. io jobs call wake when they complete and it all
* starts again.
*/
+ spin_lock_irqsave(&kc->job_lock, flags);
+ list_splice_tail_init(&kc->callback_jobs, &kc->complete_jobs);
+ spin_unlock_irqrestore(&kc->job_lock, flags);
+
blk_start_plug(&plug);
process_jobs(&kc->complete_jobs, kc, run_complete_job);
process_jobs(&kc->pages_jobs, kc, run_pages_job);
@@ -608,7 +615,7 @@ static void dispatch_job(struct kcopyd_job *job)
struct dm_kcopyd_client *kc = job->kc;
atomic_inc(&kc->nr_jobs);
if (unlikely(!job->source.count))
- push(&kc->complete_jobs, job);
+ push(&kc->callback_jobs, job);
else if (job->pages == &zero_page_list)
push(&kc->io_jobs, job);
else
@@ -795,7 +802,7 @@ void dm_kcopyd_do_callback(void *j, int read_err, unsigned long write_err)
job->read_err = read_err;
job->write_err = write_err;
- push(&kc->complete_jobs, job);
+ push(&kc->callback_jobs, job);
wake(kc);
}
EXPORT_SYMBOL(dm_kcopyd_do_callback);
@@ -825,6 +832,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
return ERR_PTR(-ENOMEM);
spin_lock_init(&kc->job_lock);
+ INIT_LIST_HEAD(&kc->callback_jobs);
INIT_LIST_HEAD(&kc->complete_jobs);
INIT_LIST_HEAD(&kc->io_jobs);
INIT_LIST_HEAD(&kc->pages_jobs);
@@ -874,6 +882,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
/* Wait for completion of all jobs submitted by this client. */
wait_event(kc->destroyq, !atomic_read(&kc->nr_jobs));
+ BUG_ON(!list_empty(&kc->callback_jobs));
BUG_ON(!list_empty(&kc->complete_jobs));
BUG_ON(!list_empty(&kc->io_jobs));
BUG_ON(!list_empty(&kc->pages_jobs));
--
2.19.1
next prev parent reply other threads:[~2019-01-24 19:23 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-24 19:19 [PATCH 3.18 00/52] 3.18.133-stable review Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 01/52] sparc32: Fix inverted invalid_frame_pointer checks on sigreturns Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 02/52] CIFS: Do not hide EINTR after sending network packets Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 03/52] cifs: Fix potential OOB access of lock element array Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 04/52] usb: cdc-acm: send ZLP for Telit 3G Intel based modems Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 05/52] USB: storage: dont insert sane sense for SPC3+ when bad sense specified Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 06/52] USB: storage: add quirk for SMI SM3350 Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 07/52] slab: alien caches must not be initialized if the allocation of the alien cache failed Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 08/52] ACPI: power: Skip duplicate power resource references in _PRx Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 09/52] i2c: dev: prevent adapter retries and timeout being set as minus value Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 10/52] crypto: cts - fix crash on short inputs Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 11/52] sunrpc: use-after-free in svc_process_common() Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 12/52] tty/ldsem: Wake up readers after timed out down_write() Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 13/52] can: gw: ensure DLC boundaries after CAN frame modification Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 14/52] media: em28xx: Fix misplaced reset of dev->v4l::field_count Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 15/52] ipv6: fix kernel-infoleak in ipv6_local_error() Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 16/52] packet: Do not leak dev refcounts on error exit Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 17/52] net: bridge: fix a bug on using a neighbour cache entry without checking its state Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 18/52] crypto: authenc - fix parsing key with misaligned rta_len Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 19/52] btrfs: wait on ordered extents on abort cleanup Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 20/52] Yama: Check for pid death before checking ancestry Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 21/52] scsi: sd: Fix cache_type_store() Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 22/52] mfd: tps6586x: Handle interrupts on suspend Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 23/52] Disable MSI also when pcie-octeon.pcie_disable on Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 24/52] omap2fb: Fix stack memory disclosure Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 25/52] media: vivid: fix error handling of kthread_run Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 26/52] media: vivid: set min width/height to a value > 0 Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 27/52] media: vb2: vb2_mmap: move lock up Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 28/52] sunrpc: handle ENOMEM in rpcb_getport_async Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 29/52] selinux: fix GPF on invalid policy Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 30/52] sctp: allocate sctp_sockaddr_entry with kzalloc Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 31/52] block/loop: Use global lock for ioctl() operation Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 32/52] drm/fb-helper: Ignore the value of fb_var_screeninfo.pixclock Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 33/52] media: vb2: be sure to unlock mutex on errors Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 34/52] r8169: Add support for new Realtek Ethernet Greg Kroah-Hartman
2019-01-24 19:19 ` [PATCH 3.18 35/52] MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 36/52] jffs2: Fix use of uninitialized delayed_work, lockdep breakage Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 37/52] pstore/ram: Do not treat empty buffers as valid Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 38/52] powerpc/pseries/cpuidle: Fix preempt warning Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 39/52] media: firewire: Fix app_info parameter type in avc_ca{,_app}_info Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 40/52] net: call sk_dst_reset when set SO_DONTROUTE Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 41/52] scsi: target: use consistent left-aligned ASCII INQUIRY data Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 42/52] clk: imx6q: reset exclusive gates on init Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 43/52] kconfig: fix memory leak when EOF is encountered in quotation Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 44/52] mmc: atmel-mci: do not assume idle after atmci_request_end Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 45/52] perf svghelper: Fix unchecked usage of strncpy() Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 46/52] perf parse-events: " Greg Kroah-Hartman
2019-01-24 19:20 ` Greg Kroah-Hartman [this message]
2019-01-24 19:20 ` [PATCH 3.18 48/52] dm snapshot: Fix excessive memory usage and workqueue stalls Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 49/52] ALSA: bebob: fix model-id of unit for Apogee Ensemble Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 50/52] sysfs: Disable lockdep for driver bind/unbind files Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 51/52] ocfs2: fix panic due to unrecovered local alloc Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 3.18 52/52] mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps Greg Kroah-Hartman
2019-01-25 14:39 ` [PATCH 3.18 00/52] 3.18.133-stable review shuah
2019-01-25 23:16 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190124190150.494710956@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=iliastsi@arrikto.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ntsironis@arrikto.com \
--cc=sashal@kernel.org \
--cc=snitzer@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.