From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, John Sperbeck <jsperbeck@google.com>,
Dennis Zhou <dennis@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.14 30/81] percpu: remove spurious lock dependency between percpu and sched
Date: Thu, 13 Jun 2019 10:33:13 +0200 [thread overview]
Message-ID: <20190613075651.351802042@linuxfoundation.org> (raw)
In-Reply-To: <20190613075649.074682929@linuxfoundation.org>
[ Upstream commit 198790d9a3aeaef5792d33a560020861126edc22 ]
In free_percpu() we sometimes call pcpu_schedule_balance_work() to
queue a work item (which does a wakeup) while holding pcpu_lock.
This creates an unnecessary lock dependency between pcpu_lock and
the scheduler's pi_lock. There are other places where we call
pcpu_schedule_balance_work() without hold pcpu_lock, and this case
doesn't need to be different.
Moving the call outside the lock prevents the following lockdep splat
when running tools/testing/selftests/bpf/{test_maps,test_progs} in
sequence with lockdep enabled:
======================================================
WARNING: possible circular locking dependency detected
5.1.0-dbg-DEV #1 Not tainted
------------------------------------------------------
kworker/23:255/18872 is trying to acquire lock:
000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520
but task is already holding lock:
00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (pcpu_lock){..-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
pcpu_alloc+0xfa/0x780
__alloc_percpu_gfp+0x12/0x20
alloc_htab_elem+0x184/0x2b0
__htab_percpu_map_update_elem+0x252/0x290
bpf_percpu_hash_update+0x7c/0x130
__do_sys_bpf+0x1912/0x1be0
__x64_sys_bpf+0x1a/0x20
do_syscall_64+0x59/0x400
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #3 (&htab->buckets[i].lock){....}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
htab_map_update_elem+0x1af/0x3a0
-> #2 (&rq->lock){-.-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
task_fork_fair+0x37/0x160
sched_fork+0x211/0x310
copy_process.part.43+0x7b1/0x2160
_do_fork+0xda/0x6b0
kernel_thread+0x29/0x30
rest_init+0x22/0x260
arch_call_rest_init+0xe/0x10
start_kernel+0x4fd/0x520
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x6f/0x72
secondary_startup_64+0xa4/0xb0
-> #1 (&p->pi_lock){-.-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
try_to_wake_up+0x41/0x600
wake_up_process+0x15/0x20
create_worker+0x16b/0x1e0
workqueue_init+0x279/0x2ee
kernel_init_freeable+0xf7/0x288
kernel_init+0xf/0x180
ret_from_fork+0x24/0x30
-> #0 (&(&pool->lock)->rlock){-.-.}:
__lock_acquire+0x101f/0x12a0
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
__queue_work+0xb2/0x520
queue_work_on+0x38/0x80
free_percpu+0x221/0x260
pcpu_freelist_destroy+0x11/0x20
stack_map_free+0x2a/0x40
bpf_map_free_deferred+0x3c/0x50
process_one_work+0x1f7/0x580
worker_thread+0x54/0x410
kthread+0x10f/0x150
ret_from_fork+0x24/0x30
other info that might help us debug this:
Chain exists of:
&(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcpu_lock);
lock(&htab->buckets[i].lock);
lock(pcpu_lock);
lock(&(&pool->lock)->rlock);
*** DEADLOCK ***
3 locks held by kworker/23:255/18872:
#0: 00000000b36a6e16 ((wq_completion)events){+.+.},
at: process_one_work+0x17a/0x580
#1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
at: process_one_work+0x17a/0x580
#2: 00000000e3e7a6aa (pcpu_lock){..-.},
at: free_percpu+0x36/0x260
stack backtrace:
CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
Hardware name: ...
Workqueue: events bpf_map_free_deferred
Call Trace:
dump_stack+0x67/0x95
print_circular_bug.isra.38+0x1c6/0x220
check_prev_add.constprop.50+0x9f6/0xd20
__lock_acquire+0x101f/0x12a0
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
__queue_work+0xb2/0x520
queue_work_on+0x38/0x80
free_percpu+0x221/0x260
pcpu_freelist_destroy+0x11/0x20
stack_map_free+0x2a/0x40
bpf_map_free_deferred+0x3c/0x50
process_one_work+0x1f7/0x580
worker_thread+0x54/0x410
kthread+0x10f/0x150
ret_from_fork+0x24/0x30
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/percpu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/percpu.c b/mm/percpu.c
index 0c06e2f549a7..bc58bcbe4b60 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1702,6 +1702,7 @@ void free_percpu(void __percpu *ptr)
struct pcpu_chunk *chunk;
unsigned long flags;
int off;
+ bool need_balance = false;
if (!ptr)
return;
@@ -1723,7 +1724,7 @@ void free_percpu(void __percpu *ptr)
list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list)
if (pos != chunk) {
- pcpu_schedule_balance_work();
+ need_balance = true;
break;
}
}
@@ -1731,6 +1732,9 @@ void free_percpu(void __percpu *ptr)
trace_percpu_free_percpu(chunk->base_addr, off, ptr);
spin_unlock_irqrestore(&pcpu_lock, flags);
+
+ if (need_balance)
+ pcpu_schedule_balance_work();
}
EXPORT_SYMBOL_GPL(free_percpu);
--
2.20.1
next prev parent reply other threads:[~2019-06-13 16:28 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-13 8:32 [PATCH 4.14 00/81] 4.14.126-stable review Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 01/81] rapidio: fix a NULL pointer dereference when create_workqueue() fails Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 02/81] fs/fat/file.c: issue flush after the writeback of FAT Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 03/81] sysctl: return -EINVAL if val violates minmax Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 04/81] ipc: prevent lockup on alloc_msg and free_msg Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 05/81] ARM: prevent tracing IPI_CPU_BACKTRACE Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 06/81] mm/hmm: select mmu notifier when selecting HMM Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 07/81] hugetlbfs: on restore reserve error path retain subpool reservation Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 08/81] mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 09/81] mm/cma.c: fix crash on CMA allocation if bitmap allocation fails Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 10/81] mm/cma.c: fix the bitmap status to show failed allocation reason Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 11/81] mm/cma_debug.c: fix the break condition in cma_maxchunk_get() Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 12/81] mm/slab.c: fix an infinite loop in leaks_show() Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 13/81] kernel/sys.c: prctl: fix false positive in validate_prctl_map() Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 14/81] thermal: rcar_gen3_thermal: disable interrupt in .remove Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 15/81] drivers: thermal: tsens: Dont print error message on -EPROBE_DEFER Greg Kroah-Hartman
2019-06-13 8:32 ` [PATCH 4.14 16/81] mfd: tps65912-spi: Add missing of table registration Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 17/81] mfd: intel-lpss: Set the device in reset state when init Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 18/81] drm/nouveau/disp/dp: respect sink limits when selecting failsafe link configuration Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 19/81] mfd: twl6040: Fix device init errors for ACCCTL register Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 20/81] perf/x86/intel: Allow PEBS multi-entry in watermark mode Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 21/81] drm/bridge: adv7511: Fix low refresh rate selection Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 22/81] objtool: Dont use ignore flag for fake jumps Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 23/81] EDAC/mpc85xx: Prevent building as a module Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 24/81] pwm: meson: Use the spin-lock only to protect register modifications Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 25/81] ntp: Allow TAI-UTC offset to be set to zero Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 26/81] f2fs: fix to avoid panic in do_recover_data() Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 27/81] f2fs: fix to clear dirty inode in error path of f2fs_iget() Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 28/81] f2fs: fix to avoid panic in dec_valid_block_count() Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 29/81] f2fs: fix to do sanity check on valid block count of segment Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman [this message]
2019-06-13 8:33 ` [PATCH 4.14 31/81] configfs: fix possible use-after-free in configfs_register_group Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 32/81] uml: fix a boot splat wrt use of cpu_all_mask Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 33/81] mmc: mmci: Prevent polling for busy detection in IRQ context Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 34/81] mips: Make sure dt memory regions are valid Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 35/81] watchdog: imx2_wdt: Fix set_timeout for big timeout values Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 36/81] watchdog: fix compile time error of pretimeout governors Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 37/81] blk-mq: move cancel of requeue_work into blk_mq_release Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 38/81] iommu/vt-d: Set intel_iommu_gfx_mapped correctly Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 39/81] misc: pci_endpoint_test: Fix test_reg_bar to be updated in pci_endpoint_test Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 40/81] nvme-pci: unquiesce admin queue on shutdown Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 41/81] ALSA: hda - Register irq handler after the chip initialization Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 42/81] nvmem: core: fix read buffer in place Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 43/81] fuse: retrieve: cap requested size to negotiated max_write Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 44/81] nfsd: allow fh_want_write to be called twice Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 45/81] vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING" Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 46/81] x86/PCI: Fix PCI IRQ routing table memory leak Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 47/81] platform/chrome: cros_ec_proto: check for NULL transfer function Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 48/81] PCI: keystone: Prevent ARM32 specific code to be compiled for ARM64 Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 49/81] soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 50/81] clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288 Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 51/81] soc: rockchip: Set the proper PWM for rk3288 Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 52/81] ARM: dts: imx51: Specify IMX5_CLK_IPG as "ahb" clock to SDMA Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 53/81] ARM: dts: imx50: " Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 54/81] ARM: dts: imx53: " Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 55/81] ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG " Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 56/81] ARM: dts: imx7d: Specify IMX7D_CLK_IPG as "ipg" " Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 57/81] ARM: dts: imx6ul: Specify IMX6UL_CLK_IPG " Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 58/81] ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG " Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 59/81] ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG " Greg Kroah-Hartman
2019-06-13 8:33 ` Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 60/81] PCI: rpadlpar: Fix leaked device_node references in add/remove paths Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 61/81] ALSA: seq: Protect in-kernel ioctl calls with mutex Greg Kroah-Hartman
2019-06-13 9:02 ` Takashi Iwai
2019-06-13 9:11 ` Greg Kroah-Hartman
2019-06-13 9:13 ` Takashi Iwai
2019-06-13 15:39 ` Sasha Levin
2019-06-13 15:44 ` Takashi Iwai
2019-06-13 16:28 ` Sasha Levin
2019-06-13 8:33 ` [PATCH 4.14 62/81] platform/x86: intel_pmc_ipc: adding error handling Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 63/81] power: supply: max14656: fix potential use-before-alloc Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 64/81] PCI: rcar: Fix a potential NULL pointer dereference Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 65/81] PCI: rcar: Fix 64bit MSI message address handling Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 66/81] video: hgafb: fix potential NULL pointer dereference Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 67/81] video: imsttfb: fix potential NULL pointer dereferences Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 68/81] block, bfq: increase idling for weight-raised queues Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 69/81] PCI: xilinx: Check for __get_free_pages() failure Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 70/81] gpio: gpio-omap: add check for off wake capable gpios Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 71/81] dmaengine: idma64: Use actual device for DMA transfers Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 72/81] pwm: tiehrpwm: Update shadow register for disabling PWMs Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 73/81] ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 74/81] pwm: Fix deadlock warning when removing PWM device Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 75/81] ARM: exynos: Fix undefined instruction during Exynos5422 resume Greg Kroah-Hartman
2019-06-13 8:33 ` [PATCH 4.14 76/81] usb: typec: fusb302: Check vconn is off when we start toggling Greg Kroah-Hartman
2019-06-13 8:34 ` [PATCH 4.14 77/81] gpio: vf610: Do not share irq_chip Greg Kroah-Hartman
2019-06-13 8:34 ` [PATCH 4.14 78/81] percpu: do not search past bitmap when allocating an area Greg Kroah-Hartman
2019-06-13 8:34 ` [PATCH 4.14 79/81] Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections" Greg Kroah-Hartman
2019-06-13 8:34 ` [PATCH 4.14 80/81] Revert "drm/nouveau: add kconfig option to turn off nouveau legacy contexts. (v3)" Greg Kroah-Hartman
2019-06-13 8:34 ` [PATCH 4.14 81/81] drm: dont block fb changes for async plane updates Greg Kroah-Hartman
2019-06-13 15:11 ` [PATCH 4.14 00/81] 4.14.126-stable review Guenter Roeck
2019-06-13 15:37 ` Greg Kroah-Hartman
2019-06-13 16:38 ` Sasha Levin
2019-06-13 19:33 ` Naresh Kamboju
2019-06-13 16:30 ` kernelci.org bot
2019-06-13 22:38 ` Guenter Roeck
2019-06-14 2:38 ` shuah
2019-06-14 10:28 ` Jon Hunter
2019-06-14 10:28 ` Jon Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190613075651.351802042@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dennis@kernel.org \
--cc=jsperbeck@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.