From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
<linux-kernel@vger.kernel.org>
Cc: <stable@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
Eric Sandeen <sandeen@sandeen.net>,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 3.18 11/61] workqueue: fix subtle pool management issue which can stall whole worker_pool
Date: Wed, 28 Jan 2015 09:51:54 +0800 [thread overview]
Message-ID: <54C840BA.7020207@cn.fujitsu.com> (raw)
In-Reply-To: <20150128012638.187292688@linuxfoundation.org>
On 01/28/2015 09:26 AM, Greg Kroah-Hartman wrote:
> 3.18-stable review patch. If anyone has any objections, please let me know.
I don't think it is a bug-fix.
It is just a good cleanup.
>
> ------------------
>
> From: Tejun Heo <tj@kernel.org>
>
> commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream.
>
> A worker_pool's forward progress is guaranteed by the fact that the
> last idle worker assumes the manager role to create more workers and
> summon the rescuers if creating workers doesn't succeed in timely
> manner before proceeding to execute work items.
>
> This manager role is implemented in manage_workers(), which indicates
> whether the worker may proceed to work item execution with its return
> value. This is necessary because multiple workers may contend for the
> manager role, and, if there already is a manager, others should
> proceed to work item execution.
>
> Unfortunately, the function also indicates that the worker may proceed
> to work item execution if need_to_create_worker() is false at the head
> of the function. need_to_create_worker() tests the following
> conditions.
>
> pending work items && !nr_running && !nr_idle
>
> The first and third conditions are protected by pool->lock and thus
> won't change while holding pool->lock; however, nr_running can change
> asynchronously as other workers block and resume and while it's likely
> to be zero, as someone woke this worker up in the first place, some
> other workers could have become runnable inbetween making it non-zero.
>
> If this happens, manage_worker() could return false even with zero
> nr_idle making the worker, the last idle one, proceed to execute work
> items. If then all workers of the pool end up blocking on a resource
> which can only be released by a work item which is pending on that
> pool, the whole pool can deadlock as there's no one to create more
> workers or summon the rescuers.
>
> This patch fixes the problem by removing the early exit condition from
> maybe_create_worker() and making manage_workers() return false iff
> there's already another manager, which ensures that the last worker
> doesn't start executing work items.
>
> We can leave the early exit condition alone and just ignore the return
> value but the only reason it was put there is because the
> manage_workers() used to perform both creations and destructions of
> workers and thus the function may be invoked while the pool is trying
> to reduce the number of workers. Now that manage_workers() is called
> only when more workers are needed, the only case this early exit
> condition is triggered is rare race conditions rendering it pointless.
>
> Tested with simulated workload and modified workqueue code which
> trigger the pool deadlock reliably without this patch.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Eric Sandeen <sandeen@sandeen.net>
> Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> ---
> kernel/workqueue.c | 25 ++++++++-----------------
> 1 file changed, 8 insertions(+), 17 deletions(-)
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1841,17 +1841,11 @@ static void pool_mayday_timeout(unsigned
> * spin_lock_irq(pool->lock) which may be released and regrabbed
> * multiple times. Does GFP_KERNEL allocations. Called only from
> * manager.
> - *
> - * Return:
> - * %false if no action was taken and pool->lock stayed locked, %true
> - * otherwise.
> */
> -static bool maybe_create_worker(struct worker_pool *pool)
> +static void maybe_create_worker(struct worker_pool *pool)
> __releases(&pool->lock)
> __acquires(&pool->lock)
> {
> - if (!need_to_create_worker(pool))
> - return false;
> restart:
> spin_unlock_irq(&pool->lock);
>
> @@ -1877,7 +1871,6 @@ restart:
> */
> if (need_to_create_worker(pool))
> goto restart;
> - return true;
> }
>
> /**
> @@ -1897,16 +1890,14 @@ restart:
> * multiple times. Does GFP_KERNEL allocations.
> *
> * Return:
> - * %false if the pool don't need management and the caller can safely start
> - * processing works, %true indicates that the function released pool->lock
> - * and reacquired it to perform some management function and that the
> - * conditions that the caller verified while holding the lock before
> - * calling the function might no longer be true.
> + * %false if the pool doesn't need management and the caller can safely
> + * start processing works, %true if management function was performed and
> + * the conditions that the caller verified before calling the function may
> + * no longer be true.
> */
> static bool manage_workers(struct worker *worker)
> {
> struct worker_pool *pool = worker->pool;
> - bool ret = false;
>
> /*
> * Anyone who successfully grabs manager_arb wins the arbitration
> @@ -1919,12 +1910,12 @@ static bool manage_workers(struct worker
> * actual management, the pool may stall indefinitely.
> */
> if (!mutex_trylock(&pool->manager_arb))
> - return ret;
> + return false;
>
> - ret |= maybe_create_worker(pool);
> + maybe_create_worker(pool);
>
> mutex_unlock(&pool->manager_arb);
> - return ret;
> + return true;
> }
>
> /**
>
>
> .
>
next prev parent reply other threads:[~2015-01-28 1:50 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-28 1:26 [PATCH 3.18 00/61] 3.18.5-stable review Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 01/61] can: dev: fix crtlmode_supported check Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 02/61] can: m_can: tag current CAN FD controllers as non-ISO Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 03/61] pinctrl: qcom: Dont iterate past end of function array Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 04/61] pinctrl: Fix two deadlocks Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 05/61] mfd: tps65218: Make INT[12] and STATUS registers volatile Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 06/61] mfd: tps65218: Make INT1 our status_base register Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 07/61] mfd: rtsx_usb: Fix runtime PM deadlock Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 08/61] libata: allow sata_sil24 to opt-out of tag ordered submission Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 09/61] libata: prevent HSM state change race between ISR and PIO Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 10/61] ALSA: usb-audio: Add mic volume fix quirk for Logitech Webcam C210 Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 11/61] workqueue: fix subtle pool management issue which can stall whole worker_pool Greg Kroah-Hartman
2015-01-28 1:51 ` Lai Jiangshan [this message]
2015-01-28 2:24 ` Tejun Heo
2015-01-28 3:15 ` Lai Jiangshan
2015-01-28 15:07 ` Tejun Heo
2015-01-28 17:54 ` Greg Kroah-Hartman
2015-01-29 20:33 ` Tejun Heo
2015-02-02 11:28 ` Luis Henriques
2015-01-28 1:26 ` [PATCH 3.18 12/61] scripts/recordmcount.pl: There is no -m32 gcc option on Super-H anymore Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 13/61] drm/i915: Ban Haswell from using RCS flips Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 14/61] drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 15/61] drm/radeon: add a dpm quirk list Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 16/61] drm/radeon: add si " Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 17/61] drm/radeon: use rv515_ring_start on r5xx Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 18/61] PCI: Pass bridge device, not bus, when updating bridge windows Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 19/61] PCI: Add pci_claim_bridge_resource() to clip window if necessary Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 20/61] PCI: Add pci_bus_clip_resource() to clip to fit upstream window Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 21/61] x86/PCI: Clip bridge windows to fit in upstream windows Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 22/61] PCI: Add flag for devices where we cant use bus reset Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 23/61] PCI: Mark Atheros AR93xx to avoid " Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 24/61] ipr: wait for aborted command responses Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 25/61] [media] cx23885: Split Hauppauge WinTV Starburst from HVR4400 card entry Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 26/61] [media] vb2: fix vb2_thread_stop race conditions Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 27/61] dm cache: share cache-metadata object across inactive and active DM tables Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 28/61] dm cache: fix problematic dual use of a single migration count variable Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 29/61] irqchip: omap-intc: Fix legacy DMA regression Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 30/61] time: settimeofday: Validate the values of tv from user Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 31/61] time: adjtimex: Validate the ADJ_FREQUENCY values Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 32/61] ARM: dts: imx25: Fix PWM "per" clocks Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 33/61] ARM: mvebu: completely disable hardware I/O coherency Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 34/61] bus: mvebu-mbus: fix support of MBus window 13 Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 35/61] fix deadlock in cifs_ioctl_clone() Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 36/61] irqchip: atmel-aic-common: Prevent clobbering of priority when changing IRQ type Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 37/61] x86, irq: Properly tag virtualization entry in /proc/interrupts Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 38/61] clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 39/61] x86, hyperv: Mark the Hyper-V clocksource as being continuous Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 40/61] x86/tsc: Change Fast TSC calibration failed from error to info Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 41/61] x86, boot: Skip relocs when load address unchanged Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 42/61] KVM: x86: SYSENTER emulation is broken Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 43/61] KVM: x86: Fix of previously incomplete fix for CVE-2014-8480 Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 44/61] x86, tls, ldt: Stop checking lm in LDT_empty Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 45/61] x86, tls: Interpret an all-zero struct user_desc as "no segment" Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 46/61] x86/apic: Re-enable PCI_MSI support for non-SMP X86_32 Greg Kroah-Hartman
2015-01-28 1:26 ` [PATCH 3.18 47/61] sata_dwc_460ex: fix resource leak on error path Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 48/61] ahci_xgene: Fix the endianess issue in APM X-Gene SoC AHCI SATA controller driver Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 49/61] KEYS: close race between key lookup and freeing Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 50/61] mm: get rid of radix tree gfp mask for pagecache_get_page Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 51/61] netfilter: nfnetlink: validate nfnetlink header from batch Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 52/61] netfilter: nf_tables: fix flush ruleset chain dependencies Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 53/61] netfilter: nfnetlink: relax strict multicast group check from netlink_bind Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 54/61] netfilter: conntrack: fix race between confirmation and flush Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 55/61] ipvs: uninitialized data with IP_VS_IPV6 Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 56/61] Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 57/61] iwlwifi: mvm: add a flag to enable match found notification Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 58/61] ACPI / PM: Do not disable wakeup GPEs that have not been enabled Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 59/61] crypto: prefix module autoloading with "crypto-" Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 60/61] crypto: include crypto- module prefix in template Greg Kroah-Hartman
2015-01-28 1:27 ` [PATCH 3.18 61/61] crypto: add missing crypto module aliases Greg Kroah-Hartman
2015-01-28 14:15 ` [PATCH 3.18 00/61] 3.18.5-stable review Guenter Roeck
2015-01-28 17:55 ` Greg Kroah-Hartman
2015-01-28 16:50 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54C840BA.7020207@cn.fujitsu.com \
--to=laijs@cn.fujitsu.com \
--cc=david@fromorbit.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sandeen@sandeen.net \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).