From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=0.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FSL_HELO_FAKE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7964DC3A59E for ; Sat, 24 Aug 2019 16:14:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3EDDE21670 for ; Sat, 24 Aug 2019 16:14:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1566663279; bh=nP0Hyrtr3QhEfYDM+zNKHZi1M+mwSZUgnfmmw+pVla8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=gEqXlSCWlL60dl6hDssOMTtjJECavZ9rCByN09gfSg2OuvAVw1hkHbOFZ4lRRojoE oKGUAioygrDIFoLfzN6dRi3MqYsiW4sYjPaRj9cdI3wqJJZJaGC/pQKq3EN/u0lvf6 7iBJY5kR9EHDobvagrrGcYfy3sIAftVo45WHe96M= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727620AbfHXQOi (ORCPT ); Sat, 24 Aug 2019 12:14:38 -0400 Received: from mail-wr1-f68.google.com ([209.85.221.68]:45698 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726211AbfHXQOh (ORCPT ); Sat, 24 Aug 2019 12:14:37 -0400 Received: by mail-wr1-f68.google.com with SMTP id q12so11306818wrj.12 for ; Sat, 24 Aug 2019 09:14:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=xofA0Q+2DhD6aYBYq1PxV+QCF4qL3qO+KVop1PqB7JM=; b=AZyuTpXxEq7JH6uYUefbSjthSH8ufi9nAapjtQNJSGxN997/FENHW4s+5huDvRdBnP 453saShqM7bGeDbxi3rv9tGVncwjAaG7niX40RivDfMBWjeHOESapTp7UO5KiVSaOBDa +CLq36LVX3s2RiuMH+hFytP2CvVQ20HRp+x1SrGvbqnu6GtZ9jIGEqFwJ+LafjASaBnV XbemaWP2S8MptZrOCg4yInLElWnXmd+O7B0aJSQ+l0DVMxEV8lXdJ8c+yVjs84Ftvobe HF6xCEFqUUqNHALjCfVskjRWBe7vsITTQx1YjNvMxMzGIQrokoZ3tlbGz3ixobokEY3/ Mnrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=xofA0Q+2DhD6aYBYq1PxV+QCF4qL3qO+KVop1PqB7JM=; b=tKoj62zSNQ1ZVU4D8k810fXAR97PigDlhpzCFNVnBnerMG4ixg1727TSj1EPUUHZ+q LfHMwUSL6ONpgnPG/oTjwVOcF7nvY2TQmOYNRvPt0VCj+UWDf3KUyY0M50ohB3pUtqGT xa7RPe3WkiVTRqk9NXz2fZSdXSmOUbDnOC8q6cb8Zhz9pil3jM+s1kvux3i3BzhcJ7/u cdvY3er236WlMa7GHUSZiX6W5hxh7EhpSWHMss+lZIrq73LNc+FiKV4eKDA//B/3OENN wQAmDW+lOkTESKPJwLqZlD0NM14TsnYWHpdPCOosmlOkbeFgj4NcLpTgKDzHirhpFeQL Vv0Q== X-Gm-Message-State: APjAAAVpOGMEQnUO8sWl8NkJRT8hjmlztxDzIaL5+fdEbg4pt/cbGE1O Cfgrudx028VniE09rRFJjJo= X-Google-Smtp-Source: APXvYqzVB9TKEfZz2yhKZYOrbQTs8bwXX2qqpU27M2L8iGJhtX4VHq4EpEiy7Ac9blpqDjwv/S1ElA== X-Received: by 2002:a5d:4d4c:: with SMTP id a12mr12183153wru.343.1566663274582; Sat, 24 Aug 2019 09:14:34 -0700 (PDT) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id a19sm18583010wra.2.2019.08.24.09.14.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Aug 2019 09:14:33 -0700 (PDT) Date: Sat, 24 Aug 2019 18:14:32 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Tetsuo Handa , Arnd Bergmann , Greg Kroah-Hartman , Linux List Kernel Mailing , syzbot Subject: Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory. Message-ID: <20190824161432.GA25950@gmail.com> References: <1566338811-4464-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <20190823091636.GA10064@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Linus Torvalds wrote: > On Fri, Aug 23, 2019 at 2:16 AM Ingo Molnar wrote: > > > > > + */ > > > + if (fatal_signal_pending(current)) { > > > + if (!(error_code & X86_PF_USER)) > > > + no_context(regs, error_code, address, 0, 0); > > > > Since the 'signal' parameter to no_context() is 0, will there be another > > signal generated? I don't think so - so the comment looks wrong to me, > > unless I'm missing something. > > The old case only handled the kernel case - which never added a signal > at all, it just failed the access (and results in EFAULT or similar, > but nobody cares since the whole point is that the process is going to > be killed). > > The *changed* case handles user space accesses too - by just returning > without any new signal being generated. The old case would fall > through to the "generate SIGSEGV", which seems pointless and wrong > (and also possibly generates misleading messages in the kernel logs). Indeed! > > Other than that, what we are skipping here if a KILL signal is pending is > > the printout of oops information if the fault is a kernel one. > > > > Not sure that's a good idea in general: carefully timing a KILL signal > > would allow the 'stealth probing' of otherwise oops generating addresses? > > That sounds really like a non-issue to me. Plus the old code ALREADY > did that exact thing. The only _new_ case is that it does is silently > for user page faults. > > > I.e. I'm not sure this hunk is necessary or even a good idea. > > As mentioned, the new code doesn't change the case you are talking about at all. > > The new code only changes the case of this happening from user space, > when it doesn't generate a pointless signal and a pointless possible > show_signal_msg() garbage for a user mode access that was denied due > to the new > > > > + if (flags & FAULT_FLAG_KILLABLE) { > > > + if (fatal_signal_pending(current)) > > > + return VM_FAULT_SIGSEGV; > > > + } > > code in handle_mm_fault(). > > And you said that new code looks fine to you, but you didn't seem to > realize that it causes stupid incorrect kernel messages to be printed > ("segfault at xyz" etc) that are completely wrong. > > Because it's not a "real" SIGSEGV. It gets repressed by the fact that > there's a fatal signal pending. > > Otherwise we'd have to add a whole new case of VM_FAULT_xyz.. Indeed :-/ At least I can report that I was able to reproduce a probable variant of the bug Tetsuo reported on a regular PC as well, with this command: dd if=/dev/mem of=/dev/null status=progress conv=noerror bs=100M The 'noerror' is needed to skip unreadable holes early in the memory map. What I noticed is that while reading regular RAM is reasonably fast even in very large chunks, accesses become very slow once they hit iomem - and delays even longer than the reported 145 seconds become possible if bs=100M is increased to say bs=1000M. With your patch applied these large read chunks are still possible and not interruptible within read_mem(), and because the source of the slowdown is the iomem access and memcpy()s, not the paging overhead, the new page fault code doesn't even trigger and the delays remain. I.e. this hypothesis is not true, on my testbox at least - and with my different testcase: > > > Also, if it takes minutes to delay killing things, that implies > > > that we're probably still faulting in pages for the read_mem(). > > > Which points to another possible thing we could do in general: just > > > don't bother to handle page faults when a fatal signal is pending. With Tetsuo's approach the delays are fixed, because the fatal signal is noticed within the 4K chunks of read_mem() immediately: [pid 1674] 1566662105.052485 write(1, "ZZZZ\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 104857600) = 104857600 <0.000008> [pid 1674] 1566662105.052512 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 104857600) = 104857600 <17.006740> [pid 1674] 1566662122.059306 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 104857600) = 104857600 <0.000009> ) = 1 <0.000010>662122.059334 write(2, "\r", 1 [pid 1674] 1566662122.059372 write(2, "3145728000 bytes (3.1 GB) copied", 323145728000 bytes (3.1 GB) copied) = 32 <0.000009> [pid 1674] 1566662122.059411 write(2, ", 18.214699 s, 173 MB/s", 23, 18.214699 s, 173 MB/s) = 23 <0.000008> [pid 1674] 1566662122.059433 read(0, [pid 1674] 1566662123.082491 +++ killed by SIGKILL +++ Note how the first 100MB chunk took 17 seconds, the second chunk was interrupted within ~2 seconds after I sent a SIGKILL. Find below a variant of Tetsuo's patch, that uses a more regular signal return pattern that you suggested in your other mail: > (Ok, in this case I think it wants > > err = -EINTR; > if (fatal_signal_pending(current)) > break; > > instead, but the point is to make it look like signal handling, just > with the special "fatal signals can sometimes be handled even when > regular signals might not make it through". Except that I think the regular pattern here is not 'break' but 'goto failed' - 'break' would just result in a partial read and 'err' gets ignored. I also agree that we should probably allow regular interrupts to interrupt /dev/mem accesses - while the 'dd' process is killable, it's not Ctrl-C-able. If I change the fatal_signal_pending() to signal_pending() it all behaves much better IMHO - assuming that no utility learned rely on non-interruptibility ... The second patch is attached below the first one. Independently of this I also agree with Tetsuo that a cond_resched() might be in order to break up this loop on non-preempt kernels. The third patch below implements this variant. Is this closer to what you had in mind? Thanks, Ingo --- drivers/char/mem.c | 9 +++++++++ 1 file changed, 9 insertions(+) Index: linux/drivers/char/mem.c =================================================================== --- linux.orig/drivers/char/mem.c +++ linux/drivers/char/mem.c @@ -175,6 +175,15 @@ static ssize_t read_mem(struct file *fil p += sz; count -= sz; read += sz; + + /* + * Reading from iomem areas of /dev/mem can be slow, + * depending on the hardware, so allow such read()s + * to be interrupted via fatal signals (SIGKILL): + */ + err = -EINTR; + if (fatal_signal_pending(current)) + goto failed; } kfree(bounce); drivers/char/mem.c | 9 +++++++++ 1 file changed, 9 insertions(+) Index: linux/drivers/char/mem.c =================================================================== --- linux.orig/drivers/char/mem.c +++ linux/drivers/char/mem.c @@ -175,6 +175,15 @@ static ssize_t read_mem(struct file *fil p += sz; count -= sz; read += sz; + + /* + * Reading from iomem areas of /dev/mem can be slow, + * depending on the hardware, so allow such read()s + * to be interruptible: + */ + err = -EINTR; + if (signal_pending(current)) + goto failed; } kfree(bounce); Index: linux/drivers/char/mem.c =================================================================== --- linux.orig/drivers/char/mem.c +++ linux/drivers/char/mem.c @@ -175,6 +175,18 @@ static ssize_t read_mem(struct file *fil p += sz; count -= sz; read += sz; + + /* + * Reading from iomem areas of /dev/mem can be slow, + * depending on the hardware, so allow such read()s + * to be interruptible and preemptible: + */ + err = -EINTR; + if (signal_pending(current)) + goto failed; + + if (need_resched()) + cond_resched(); } kfree(bounce);