From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755323AbcEXQKh (ORCPT ); Tue, 24 May 2016 12:10:37 -0400 Received: from mail-oi0-f66.google.com ([209.85.218.66]:35280 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753419AbcEXQKM (ORCPT ); Tue, 24 May 2016 12:10:12 -0400 Subject: Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c To: Al Viro References: <57437683.30008@lwfinger.net> <20160524001854.GW14480@ZenIV.linux.org.uk> Cc: LKML From: Larry Finger Message-ID: <57447CE1.9020207@lwfinger.net> Date: Tue, 24 May 2016 11:10:09 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160524001854.GW14480@ZenIV.linux.org.uk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/23/2016 07:18 PM, Al Viro wrote: > On Mon, May 23, 2016 at 04:30:43PM -0500, Larry Finger wrote: >> The mainline kernels past 4.6.0 fail hang when logging in. There are no >> error messages, and the machine seems to be waiting for some event that >> never happens. >> >> The problem has been bisected to commit dd254f5a382c ("fold checks into >> iterate_and_advance()"). The bisection has been verified. >> >> The problem is the call from iov_iter_advance(). When I reinstated the old >> macro with a new name and used it in that routine, the system works. >> Obviously, the call that seems to be incorrect has some benefits. My >> quich-and-dirty patch is attached. >> >> I will be willing to test any patch you prepare. > > Hangs where and how? A reproducer, please... This is really weird - the > only change there is in the cases when > * iov_iter_advance(i, n) is called with n greater than the remaining > amount. It's a bug, plain and simple - old variant would've been left in > seriously buggered state and at the very least we want to catch any such > places for the sake of backports > * iov_iter_advance(i, 0) - both old and new code leave *i unchanged, > but the old one dereferences i->iov[0], which be pointing beyond the end of > array by that point. The value read from there was not used by the old code, > at that. > > Could you slap WARN_ON(size > i->count) in the very beginning of > iov_iter_advance() (the mainline variant) and see what triggers on your > reproducer? As I wrote earlier, i->count was greater than zero, but size was zero, which caused the bulk of iterate_and_advance() to be skipped. For now, the following one-line hack allows my system to boot: diff --git a/fs/read_write.c b/fs/read_write.c index 933b53a..d5d64d9 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -721,6 +721,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter, ret += nr; if (nr != iovec.iov_len) break; + nr = max_t(ssize_t, nr, 1); iov_iter_advance(iter, nr); } I have no idea what subtle bug in do_loop_readv_writev() is causing nr to be zero, but it seems to have been exposed by commit dd254f5a382c. Larry