All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Li Rongqing <lirongqing@baidu.com>,
	Zhang Yu <zhangyu31@baidu.com>, Davidlohr Bueso <dbueso@suse.de>,
	Manfred Spraul <manfred@colorfullife.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>,
	netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: [PATCH AUTOSEL 4.14 04/99] ipc: prevent lockup on alloc_msg and free_msg
Date: Sat,  1 Jun 2019 09:22:11 -0400	[thread overview]
Message-ID: <20190601132346.26558-4-sashal@kernel.org> (raw)
In-Reply-To: <20190601132346.26558-1-sashal@kernel.org>

From: Li Rongqing <lirongqing@baidu.com>

[ Upstream commit d6a2946a88f524a47cc9b79279667137899db807 ]

msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
enabled on large memory SMP systems, the pages initialization can take a
long time, if msgctl10 requests a huge block memory, and it will block
rcu scheduler, so release cpu actively.

After adding schedule() in free_msg, free_msg can not be called when
holding spinlock, so adding msg to a tmp list, and free it out of
spinlock

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
  rcu:     Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
  rcu:     (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
  msgctl10        R  running task    21608 32505   2794 0x00000082
  Call Trace:
   preempt_schedule_irq+0x4c/0xb0
   retint_kernel+0x1b/0x2d
  RIP: 0010:__is_insn_slot_addr+0xfb/0x250
  Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff <41> c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
  RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
  RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
  RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
  RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
  R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
  R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
   kernel_text_address+0xc1/0x100
   __kernel_text_address+0xe/0x30
   unwind_get_return_address+0x2f/0x50
   __save_stack_trace+0x92/0x100
   create_object+0x380/0x650
   __kmalloc+0x14c/0x2b0
   load_msg+0x38/0x1a0
   do_msgsnd+0x19e/0xcf0
   do_syscall_64+0x117/0x400
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
  rcu:     (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
  msgctl10        R  running task    21608 32170  32155 0x00000082
  Call Trace:
   preempt_schedule_irq+0x4c/0xb0
   retint_kernel+0x1b/0x2d
  RIP: 0010:lock_acquire+0x4d/0x340
  Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 <48> c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
  RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
  RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
  RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
  R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
   is_bpf_text_address+0x32/0xe0
   kernel_text_address+0xec/0x100
   __kernel_text_address+0xe/0x30
   unwind_get_return_address+0x2f/0x50
   __save_stack_trace+0x92/0x100
   save_stack+0x32/0xb0
   __kasan_slab_free+0x130/0x180
   kfree+0xfa/0x2d0
   free_msg+0x24/0x50
   do_msgrcv+0x508/0xe60
   do_syscall_64+0x117/0x400
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Davidlohr said:
 "So after releasing the lock, the msg rbtree/list is empty and new
  calls will not see those in the newly populated tmp_msg list, and
  therefore they cannot access the delayed msg freeing pointers, which
  is good. Also the fact that the node_cache is now freed before the
  actual messages seems to be harmless as this is wanted for
  msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
  info->lock the thing is freed anyway so it should not change things"

Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 ipc/mqueue.c  | 10 ++++++++--
 ipc/msgutil.c |  6 ++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index d240256263103..5c0ae912f2f25 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -374,7 +374,8 @@ static void mqueue_evict_inode(struct inode *inode)
 	struct user_struct *user;
 	unsigned long mq_bytes, mq_treesize;
 	struct ipc_namespace *ipc_ns;
-	struct msg_msg *msg;
+	struct msg_msg *msg, *nmsg;
+	LIST_HEAD(tmp_msg);
 
 	clear_inode(inode);
 
@@ -385,10 +386,15 @@ static void mqueue_evict_inode(struct inode *inode)
 	info = MQUEUE_I(inode);
 	spin_lock(&info->lock);
 	while ((msg = msg_get(info)) != NULL)
-		free_msg(msg);
+		list_add_tail(&msg->m_list, &tmp_msg);
 	kfree(info->node_cache);
 	spin_unlock(&info->lock);
 
+	list_for_each_entry_safe(msg, nmsg, &tmp_msg, m_list) {
+		list_del(&msg->m_list);
+		free_msg(msg);
+	}
+
 	/* Total amount of bytes accounted for the mqueue */
 	mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
 		min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index 84598025a6ade..e65593742e2be 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -18,6 +18,7 @@
 #include <linux/utsname.h>
 #include <linux/proc_ns.h>
 #include <linux/uaccess.h>
+#include <linux/sched.h>
 
 #include "util.h"
 
@@ -64,6 +65,9 @@ static struct msg_msg *alloc_msg(size_t len)
 	pseg = &msg->next;
 	while (len > 0) {
 		struct msg_msgseg *seg;
+
+		cond_resched();
+
 		alen = min(len, DATALEN_SEG);
 		seg = kmalloc(sizeof(*seg) + alen, GFP_KERNEL_ACCOUNT);
 		if (seg == NULL)
@@ -176,6 +180,8 @@ void free_msg(struct msg_msg *msg)
 	kfree(msg);
 	while (seg != NULL) {
 		struct msg_msgseg *tmp = seg->next;
+
+		cond_resched();
 		kfree(seg);
 		seg = tmp;
 	}
-- 
2.20.1


  parent reply	other threads:[~2019-06-01 13:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-01 13:22 [PATCH AUTOSEL 4.14 01/99] rapidio: fix a NULL pointer dereference when create_workqueue() fails Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 02/99] fs/fat/file.c: issue flush after the writeback of FAT Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 03/99] sysctl: return -EINVAL if val violates minmax Sasha Levin
2019-06-01 13:22 ` Sasha Levin [this message]
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 05/99] ARM: prevent tracing IPI_CPU_BACKTRACE Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 06/99] mm/hmm: select mmu notifier when selecting HMM Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 07/99] hugetlbfs: on restore reserve error path retain subpool reservation Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 08/99] mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 09/99] mm/cma.c: fix crash on CMA allocation if bitmap allocation fails Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 10/99] mm/cma.c: fix the bitmap status to show failed allocation reason Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 11/99] mm/cma_debug.c: fix the break condition in cma_maxchunk_get() Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 12/99] mm/slab.c: fix an infinite loop in leaks_show() Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 13/99] kernel/sys.c: prctl: fix false positive in validate_prctl_map() Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 14/99] thermal: rcar_gen3_thermal: disable interrupt in .remove Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 15/99] drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 16/99] mfd: tps65912-spi: Add missing of table registration Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 17/99] mfd: intel-lpss: Set the device in reset state when init Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 18/99] drm/nouveau/disp/dp: respect sink limits when selecting failsafe link configuration Sasha Levin
2019-06-01 13:22   ` Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 19/99] mfd: twl6040: Fix device init errors for ACCCTL register Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 20/99] perf/x86/intel: Allow PEBS multi-entry in watermark mode Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 21/99] drm/bridge: adv7511: Fix low refresh rate selection Sasha Levin
2019-06-01 13:22   ` Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 22/99] objtool: Don't use ignore flag for fake jumps Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 23/99] EDAC/mpc85xx: Prevent building as a module Sasha Levin
2019-06-01 13:22   ` Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 24/99] NFS4: Fix v4.0 client state corruption when mount Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 25/99] pwm: meson: Use the spin-lock only to protect register modifications Sasha Levin
2019-06-01 13:22   ` Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 26/99] ntp: Allow TAI-UTC offset to be set to zero Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 27/99] f2fs: fix to avoid panic in do_recover_data() Sasha Levin
2019-06-01 13:22   ` Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 28/99] f2fs: fix to clear dirty inode in error path of f2fs_iget() Sasha Levin
2019-06-01 13:22   ` Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 29/99] f2fs: fix to avoid panic in dec_valid_block_count() Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 30/99] f2fs: fix to do sanity check on valid block count of segment Sasha Levin
2019-06-01 13:22   ` Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 31/99] percpu: remove spurious lock dependency between percpu and sched Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 32/99] tracing: Fix partial reading of trace event's id file Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 33/99] configfs: fix possible use-after-free in configfs_register_group Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 34/99] uml: fix a boot splat wrt use of cpu_all_mask Sasha Levin
2019-06-01 13:22   ` Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 35/99] mmc: mmci: Prevent polling for busy detection in IRQ context Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 36/99] mips: Make sure dt memory regions are valid Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 37/99] watchdog: Use depends instead of select for pretimeout governors Sasha Levin
2019-06-01 13:22 ` [PATCH AUTOSEL 4.14 38/99] watchdog: imx2_wdt: Fix set_timeout for big timeout values Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190601132346.26558-4-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bpf@vger.kernel.org \
    --cc=dbueso@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=manfred@colorfullife.com \
    --cc=netdev@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=zhangyu31@baidu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.