From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751865AbbLLQYP (ORCPT ); Sat, 12 Dec 2015 11:24:15 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:13150 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751297AbbLLQYO (ORCPT ); Sat, 12 Dec 2015 11:24:14 -0500 Date: Sat, 12 Dec 2015 11:23:42 -0500 From: Chris Mason To: Linus Torvalds , Peter Zijlstra , Dave Jones , LKML , Jon Christopherson Subject: [PATCH] lock_page() doesn't lock if __wait_on_bit_lock returns -EINTR Message-ID: <20151212162342.GF11257@ret.masoncoding.com> Mail-Followup-To: Chris Mason , Linus Torvalds , Peter Zijlstra , Dave Jones , LKML , Jon Christopherson MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: Mutt/1.5.23.1 (2014-03-12) X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-12-12_14:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We have two reports of frequent crashes in btrfs where asserts in clear_page_dirty_for_io() were triggering on missing page locks. The crashes were much easier to trigger when processes were catching ctrl-c's, and after much debugging it really looked like lock_page was a noop. This recent commit looks pretty suspect to me, and I confirmed that we were exiting __wait_on_bit_lock() with -EINTR when it was called with TASK_UNINTERRUPTIBLE commit 68985633bccb6066bf1803e316fbc6c1f5b796d6 Author: Peter Zijlstra Date: Tue Dec 1 14:04:04 2015 +0100 sched/wait: Fix signal handling in bit wait helpers The patch below is mostly untested, and probably not the right solution. Dave's trinity run doesn't explode immediately anymore, and I wanted to get this out for discussion. A quick look on the list doesn't show anyone else has tracked this down, sorry if it's a dup. Reported-by: Dave Jones , Reported-by: Jon Christopherson Signed-off-by: Chris Mason diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index f10bd87..12f69df 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -434,6 +434,8 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q, ret = action(&q->key); if (!ret) continue; + if (ret == -EINTR && mode == TASK_UNINTERRUPTIBLE) + continue; abort_exclusive_wait(wq, &q->wait, mode, &q->key); return ret; } while (test_and_set_bit(q->key.bit_nr, q->key.flags));