From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AB37C001DF for ; Tue, 1 Aug 2023 16:46:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3941940027; Tue, 1 Aug 2023 12:46:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE966940010; Tue, 1 Aug 2023 12:46:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB1A6940027; Tue, 1 Aug 2023 12:46:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AC9FC940010 for ; Tue, 1 Aug 2023 12:46:27 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 541A81C9B26 for ; Tue, 1 Aug 2023 16:46:27 +0000 (UTC) X-FDA: 81076114014.19.49941CD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id CE2DB182F41 for ; Tue, 1 Aug 2023 16:02:13 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eCI4iOuU; spf=pass (imf06.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690905737; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2XkdV+BwvY5fppShBDqBRTS7/vq3kiC6iaJMHPy1Nf4=; b=snvtFywty3zO7G6KxHT7E01kDGYzw/ERyYmbb3UTCrlqSEtTveNlIhDJY2tYKP2a5e4Gue NMepNRhfjyeHMar5F7+YyXfWG443MKS4gC05m3LpN5C3iUoAKQCK/OsUF2ePgyoh2/S02x +/B91dMUkGtRVwdD6ePJSg9hMETOMt8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690905737; a=rsa-sha256; cv=none; b=xw12mjahss5Gbss1AWTU81Td3IaBXflQTXaIzUyBS3S9m3xjRK+r1A/jVjfK2DF3r0SVpg L7NNL6Y0+1KW1veOOR5768HhzsYyfpynDktB0b05M/TQJJv6NmKkrHgSPPKGVNzXMnkj4d cMWK2pjyBzOsCXZ2NMrXfijifQvGQs8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eCI4iOuU; spf=pass (imf06.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690905693; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2XkdV+BwvY5fppShBDqBRTS7/vq3kiC6iaJMHPy1Nf4=; b=eCI4iOuUPQzm2uyFWfy7WfU64mBA3pNsprXdCZiPE/kQ3Iyh2gSX5hQFXnEmhnqn1hFNCL 3XVDkenptA7s+vxpkqATlwWp6syJq/Q1uUKcnvhE+XqZaoR2agUhd4yhjEeMS+9AQmMKZY yhqBK8aGPGqTYInWNddBf5gqST7lgCU= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-670-zOa1_9nCO-uE6Aw5tIi0OQ-1; Tue, 01 Aug 2023 12:01:24 -0400 X-MC-Unique: zOa1_9nCO-uE6Aw5tIi0OQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 47C431C05AEC; Tue, 1 Aug 2023 16:01:20 +0000 (UTC) Received: from localhost (unknown [10.72.112.107]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5769E40C2063; Tue, 1 Aug 2023 16:01:19 +0000 (UTC) Date: Wed, 2 Aug 2023 00:01:16 +0800 From: Baoquan He To: Lorenzo Stoakes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Uladzislau Rezki , linux-fsdevel@vger.kernel.org, Jiri Olsa , Will Deacon , Mike Galbraith , Mark Rutland , wangkefeng.wang@huawei.com, catalin.marinas@arm.com, ardb@kernel.org, David Hildenbrand , Linux regression tracking , regressions@lists.linux.dev, Matthew Wilcox , Liu Shixin , Jens Axboe , Alexander Viro , stable@vger.kernel.org Subject: Re: [PATCH] fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions Message-ID: References: <20230731215021.70911-1-lstoakes@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Rspamd-Queue-Id: CE2DB182F41 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 34j6x9zgtodo19dxhk14shk8qkmdic47 X-HE-Tag: 1690905733-107064 X-HE-Meta: U2FsdGVkX1/88zeInxwM4HpCZv3WUKe/9MSNXOr6II6hvlyvRDsTPtTr0ipuPwcqfG1dkWKtS3qVOnenvKdSmzFC9DsRDu9lqDyNbjjcDIPmRVwkDc7FO1XSHv7MVkLQoOekXJWeugfWUZq1TatTp4ch8J0il0NwO6I1o/hkJvZ57QSz5DrBi4aYWrJckXfahbC+qZdvgy07LAl9fRdArVLx+4ct8YjqZD3ZyiOs44tFu3dCZp/gKUWwjmwnQF/dJobtHYcZnGj8Iiwq7jXVcXb3+kxL8R36hPFRP0f+UgACmNiHYHEPfuwtf9Smx+d8JGWqjvm72aSYjMWyejcBoBrgBKIrQL9Ka89jRubdzzZyLSqQjxf38EMI+fI+L2vUd75gxykPcwPDSA5AZdC1dYkexwL1rsClVYN6Tdhul7Hw6rS+gIqRNcEAgSIFP3cdpb60aHLVTSkFpOF5A4R+0JJ7ZkNkSba/FZD7ewWUfnssuNPkTtSMDElEoz7mraHF3QoQMUuXbga6SIpGJVsFlF+hguUj7y/4QxbvHJgAaiB+h4ceuCrstmuPHZY29Q/uonbD54t8WuCbIetAVaGOQXex83mUINpVrR4SFDXWxE2jU4KwlW5KhCpSoH4LGXkG+6tUf7WLMOAOOE4jMa4XRFNG2bJoeFEd1yCFVpPvGrhh9LAV7HpSjALqMQGqrBA6Dxv39cF638ehB063n0rQYkOvtIZ9wPTe7nMH5tFt27rHw6LCY1zWMZtJ9ZEvCfEWi7GbTZ1zomKnW/VOn81jtWPRC7k8v5+YO1l2YnWROyZSJLsJJaUSqM49ibiB32z6KDbS+c6BjMqAMruR9b0gBhcmCpFeuW/csxgpopgDr4RCj2Dvc1izIGhgBoRseQUHW998lWzKr6ahJKjBgOS6WtkRcxsKusDI3WMYQamQVs5wu5jq50ZqTXozcuI9Vaap X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 08/01/23 at 11:57pm, Baoquan He wrote: > On 07/31/23 at 10:50pm, Lorenzo Stoakes wrote: > > Some architectures do not populate the entire range categorised by > > KCORE_TEXT, so we must ensure that the kernel address we read from is > > valid. > > > > Unfortunately there is no solution currently available to do so with a > > purely iterator solution so reinstate the bounce buffer in this instance so > > we can use copy_from_kernel_nofault() in order to avoid page faults when > > regions are unmapped. > > > > This change partly reverts commit 2e1c0170771e ("fs/proc/kcore: avoid > > bounce buffer for ktext data"), reinstating the bounce buffer, but adapts > > the code to continue to use an iterator. > > > > Fixes: 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data") > > Reported-by: Jiri Olsa > > Closes: https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava > > Cc: stable@vger.kernel.org > > Signed-off-by: Lorenzo Stoakes > > --- > > fs/proc/kcore.c | 26 +++++++++++++++++++++++++- > > 1 file changed, 25 insertions(+), 1 deletion(-) > > > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c > > index 9cb32e1a78a0..3bc689038232 100644 > > --- a/fs/proc/kcore.c > > +++ b/fs/proc/kcore.c > > @@ -309,6 +309,8 @@ static void append_kcore_note(char *notes, size_t *i, const char *name, > > > > static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) > > { > > + struct file *file = iocb->ki_filp; > > + char *buf = file->private_data; > > loff_t *fpos = &iocb->ki_pos; > > size_t phdrs_offset, notes_offset, data_offset; > > size_t page_offline_frozen = 1; > > @@ -554,11 +556,22 @@ static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) > > fallthrough; > > case KCORE_VMEMMAP: > > case KCORE_TEXT: > > + /* > > + * Sadly we must use a bounce buffer here to be able to > > + * make use of copy_from_kernel_nofault(), as these > > + * memory regions might not always be mapped on all > > + * architectures. > > + */ > > + if (copy_from_kernel_nofault(buf, (void *)start, tsz)) { > > + if (iov_iter_zero(tsz, iter) != tsz) { > > + ret = -EFAULT; > > + goto out; > > + } > > /* > > * We use _copy_to_iter() to bypass usermode hardening > > * which would otherwise prevent this operation. > > */ > > - if (_copy_to_iter((char *)start, tsz, iter) != tsz) { > > + } else if (_copy_to_iter(buf, tsz, iter) != tsz) { > > ret = -EFAULT; > > goto out; > > } > > @@ -595,6 +608,10 @@ static int open_kcore(struct inode *inode, struct file *filp) > > if (ret) > > return ret; > > > > + filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL); > > + if (!filp->private_data) > > + return -ENOMEM; > > + > > if (kcore_need_update) > > kcore_update_ram(); > > if (i_size_read(inode) != proc_root_kcore->size) { > > @@ -605,9 +622,16 @@ static int open_kcore(struct inode *inode, struct file *filp) > > return 0; > > } > > > > +static int release_kcore(struct inode *inode, struct file *file) > > +{ > > + kfree(file->private_data); > > + return 0; > > +} > > + > > static const struct proc_ops kcore_proc_ops = { > > .proc_read_iter = read_kcore_iter, > > .proc_open = open_kcore, > > + .proc_release = release_kcore, > > .proc_lseek = default_llseek, > > }; > > On 6.5-rc4, the failures can be reproduced stably on a arm64 machine. > With patch applied, both makedumpfile and objdump test cases passed. > > And the code change looks good to me, thanks. > > Tested-by: Baoquan He > Reviewed-by: Baoquan He > > > =============================================== > [root@ ~]# makedumpfile --mem-usage /proc/kcore > The kernel version is not supported. > The makedumpfile operation may be incomplete. > > TYPE PAGES EXCLUDABLE DESCRIPTION > ---------------------------------------------------------------------- > ZERO 76234 yes Pages filled with zero > NON_PRI_CACHE 147613 yes Cache pages without private flag > PRI_CACHE 3847 yes Cache pages with private flag > USER 15276 yes User process pages > FREE 15809884 yes Free pages > KERN_DATA 459950 no Dumpable kernel data > > page size: 4096 > Total pages on system: 16512804 > Total size on system: 67636445184 Byte > > [root@ ~]# objdump -d --start-address=0x^C > [root@ ~]# cat /proc/kallsyms | grep ksys_read > ffffab3be77229d8 T ksys_readahead > ffffab3be782a700 T ksys_read > [root@ ~]# objdump -d --start-address=0xffffab3be782a700 --stop-address=0xffffab3be782a710 /proc/kcore > > /proc/kcore: file format elf64-littleaarch64 > > > Disassembly of section load1: > > ffffab3be782a700 : > ffffab3be782a700: aa1e03e9 mov x9, x30 > ffffab3be782a704: d503201f nop > ffffab3be782a708: d503233f paciasp > ffffab3be782a70c: a9bc7bfd stp x29, x30, [sp, #-64]! > objdump: error: /proc/kcore(load2) is too large (0x7bff70000000 bytes) > objdump: Reading section load2 failed because: memory exhausted By the way, I can still see the objdump error saying kcore is too large as above, at the same time there's console printing as below. Haven't checked it's objdump's issue or kernel's. [ 6631.575800] __vm_enough_memory: pid: 5321, comm: objdump, not enough memory for the allocation [ 6631.584469] __vm_enough_memory: pid: 5321, comm: objdump, not enough memory for the allocation