From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760314AbbBIJgo (ORCPT ); Mon, 9 Feb 2015 04:36:44 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33857 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759849AbbBIJgn (ORCPT ); Mon, 9 Feb 2015 04:36:43 -0500 Message-ID: <54D87FA8.60408@suse.cz> Date: Mon, 09 Feb 2015 10:36:40 +0100 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Konstantin Khlebnikov , "linux-mm@kvack.org" , Linux Kernel Mailing List Subject: Re: BUG: stuck on mmap_sem in 3.18.6 References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/09/2015 08:14 AM, Konstantin Khlebnikov wrote: > Python was running under ptrace-based sandbox "sydbox" used exherbo > chroot. Kernel: 3.18.6 + my patch "mm: prevent endless growth of > anon_vma hierarchy" (patch seems stable). > > [ 4674.087780] INFO: task python:25873 blocked for more than 120 seconds. > [ 4674.087793] Tainted: G U 3.18.6-zurg+ #158 > [ 4674.087797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 4674.087801] python D ffff88041e2d2000 14176 25873 25630 0x00000102 > [ 4674.087817] ffff880286247b68 0000000000000086 ffff8803d5fe6b40 > 0000000000012000 > [ 4674.087824] ffff880286247fd8 0000000000012000 ffff88040c16eb40 > ffff8803d5fe6b40 > [ 4674.087830] 0000000300000003 ffff8803d5fe6b40 ffff880362888e78 > ffff880362888e60 > [ 4674.087836] Call Trace: > [ 4674.087854] [] schedule+0x29/0x70 > [ 4674.087865] [] rwsem_down_write_failed+0x1d5/0x2f0 > [ 4674.087873] [] call_rwsem_down_write_failed+0x13/0x20 > [ 4674.087881] [] ? down_write+0x31/0x50 > [ 4674.087891] [] do_coredump+0x144/0xee0 > [ 4674.087900] [] ? pick_next_task_fair+0x397/0x450 > [ 4674.087909] [] ? __switch_to+0x1d6/0x5f0 > [ 4674.087915] [] ? __schedule+0x3a6/0x880 > [ 4674.087924] [] ? klist_remove+0x40/0xd0 > [ 4674.087932] [] get_signal+0x298/0x6b0 > [ 4674.087940] [] do_signal+0x28/0xbb0 > [ 4674.087946] [] ? do_send_sig_info+0x5d/0x80 > [ 4674.087955] [] do_notify_resume+0x69/0xb0 > [ 4674.087963] [] int_signal+0x12/0x17 > > Maybe this guy did something wrong? Well he has do_coredump on stack, so he did something wrong in userspace? But here he's just waiting on down_write. Unless there's some bug in do_coredump that would lock for read and then for write, without an unlock in between? > Looks like mmap_sem is locked for read: So we have the python waiting for write, blocking all new readers (that's how read/write locks work, right?), but itself waiting for a prior reader to finish. The question is, who is/was the reader? You could search the mmap_sem or mm address in the rest of the processes' stacks, and maybe you'll find him?