From: Baoquan He <bhe@redhat.com>
To: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Uladzislau Rezki <urezki@gmail.com>,
linux-fsdevel@vger.kernel.org, Jiri Olsa <olsajiri@gmail.com>,
Will Deacon <will@kernel.org>, Mike Galbraith <efault@gmx.de>,
Mark Rutland <mark.rutland@arm.com>,
wangkefeng.wang@huawei.com, catalin.marinas@arm.com,
ardb@kernel.org, David Hildenbrand <david@redhat.com>,
Linux regression tracking <regressions@leemhuis.info>,
regressions@lists.linux.dev, Matthew Wilcox <willy@infradead.org>,
Liu Shixin <liushixin2@huawei.com>, Jens Axboe <axboe@kernel.dk>,
Alexander Viro <viro@zeniv.linux.org.uk>,
stable@vger.kernel.org
Subject: Re: [PATCH] fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions
Date: Wed, 2 Aug 2023 00:01:16 +0800 [thread overview]
Message-ID: <ZMksTC6pewXDgkFe@MiWiFi-R3L-srv> (raw)
In-Reply-To: <ZMkrfBDARIAYFYwz@MiWiFi-R3L-srv>
On 08/01/23 at 11:57pm, Baoquan He wrote:
> On 07/31/23 at 10:50pm, Lorenzo Stoakes wrote:
> > Some architectures do not populate the entire range categorised by
> > KCORE_TEXT, so we must ensure that the kernel address we read from is
> > valid.
> >
> > Unfortunately there is no solution currently available to do so with a
> > purely iterator solution so reinstate the bounce buffer in this instance so
> > we can use copy_from_kernel_nofault() in order to avoid page faults when
> > regions are unmapped.
> >
> > This change partly reverts commit 2e1c0170771e ("fs/proc/kcore: avoid
> > bounce buffer for ktext data"), reinstating the bounce buffer, but adapts
> > the code to continue to use an iterator.
> >
> > Fixes: 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
> > Reported-by: Jiri Olsa <olsajiri@gmail.com>
> > Closes: https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> > ---
> > fs/proc/kcore.c | 26 +++++++++++++++++++++++++-
> > 1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> > index 9cb32e1a78a0..3bc689038232 100644
> > --- a/fs/proc/kcore.c
> > +++ b/fs/proc/kcore.c
> > @@ -309,6 +309,8 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
> >
> > static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
> > {
> > + struct file *file = iocb->ki_filp;
> > + char *buf = file->private_data;
> > loff_t *fpos = &iocb->ki_pos;
> > size_t phdrs_offset, notes_offset, data_offset;
> > size_t page_offline_frozen = 1;
> > @@ -554,11 +556,22 @@ static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
> > fallthrough;
> > case KCORE_VMEMMAP:
> > case KCORE_TEXT:
> > + /*
> > + * Sadly we must use a bounce buffer here to be able to
> > + * make use of copy_from_kernel_nofault(), as these
> > + * memory regions might not always be mapped on all
> > + * architectures.
> > + */
> > + if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
> > + if (iov_iter_zero(tsz, iter) != tsz) {
> > + ret = -EFAULT;
> > + goto out;
> > + }
> > /*
> > * We use _copy_to_iter() to bypass usermode hardening
> > * which would otherwise prevent this operation.
> > */
> > - if (_copy_to_iter((char *)start, tsz, iter) != tsz) {
> > + } else if (_copy_to_iter(buf, tsz, iter) != tsz) {
> > ret = -EFAULT;
> > goto out;
> > }
> > @@ -595,6 +608,10 @@ static int open_kcore(struct inode *inode, struct file *filp)
> > if (ret)
> > return ret;
> >
> > + filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL);
> > + if (!filp->private_data)
> > + return -ENOMEM;
> > +
> > if (kcore_need_update)
> > kcore_update_ram();
> > if (i_size_read(inode) != proc_root_kcore->size) {
> > @@ -605,9 +622,16 @@ static int open_kcore(struct inode *inode, struct file *filp)
> > return 0;
> > }
> >
> > +static int release_kcore(struct inode *inode, struct file *file)
> > +{
> > + kfree(file->private_data);
> > + return 0;
> > +}
> > +
> > static const struct proc_ops kcore_proc_ops = {
> > .proc_read_iter = read_kcore_iter,
> > .proc_open = open_kcore,
> > + .proc_release = release_kcore,
> > .proc_lseek = default_llseek,
> > };
>
> On 6.5-rc4, the failures can be reproduced stably on a arm64 machine.
> With patch applied, both makedumpfile and objdump test cases passed.
>
> And the code change looks good to me, thanks.
>
> Tested-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: Baoquan He <bhe@redhat.com>
>
>
> ===============================================
> [root@ ~]# makedumpfile --mem-usage /proc/kcore
> The kernel version is not supported.
> The makedumpfile operation may be incomplete.
>
> TYPE PAGES EXCLUDABLE DESCRIPTION
> ----------------------------------------------------------------------
> ZERO 76234 yes Pages filled with zero
> NON_PRI_CACHE 147613 yes Cache pages without private flag
> PRI_CACHE 3847 yes Cache pages with private flag
> USER 15276 yes User process pages
> FREE 15809884 yes Free pages
> KERN_DATA 459950 no Dumpable kernel data
>
> page size: 4096
> Total pages on system: 16512804
> Total size on system: 67636445184 Byte
>
> [root@ ~]# objdump -d --start-address=0x^C
> [root@ ~]# cat /proc/kallsyms | grep ksys_read
> ffffab3be77229d8 T ksys_readahead
> ffffab3be782a700 T ksys_read
> [root@ ~]# objdump -d --start-address=0xffffab3be782a700 --stop-address=0xffffab3be782a710 /proc/kcore
>
> /proc/kcore: file format elf64-littleaarch64
>
>
> Disassembly of section load1:
>
> ffffab3be782a700 <load1+0x41a700>:
> ffffab3be782a700: aa1e03e9 mov x9, x30
> ffffab3be782a704: d503201f nop
> ffffab3be782a708: d503233f paciasp
> ffffab3be782a70c: a9bc7bfd stp x29, x30, [sp, #-64]!
> objdump: error: /proc/kcore(load2) is too large (0x7bff70000000 bytes)
> objdump: Reading section load2 failed because: memory exhausted
By the way, I can still see the objdump error saying kcore is too large
as above, at the same time there's console printing as below. Haven't
checked it's objdump's issue or kernel's.
[ 6631.575800] __vm_enough_memory: pid: 5321, comm: objdump, not enough memory for the allocation
[ 6631.584469] __vm_enough_memory: pid: 5321, comm: objdump, not enough memory for the allocation
next prev parent reply other threads:[~2023-08-01 16:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-31 21:50 [PATCH] fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions Lorenzo Stoakes
2023-07-31 22:11 ` Jiri Olsa
2023-08-01 8:27 ` Will Deacon
2023-08-01 9:05 ` David Hildenbrand
2023-08-01 16:33 ` Lorenzo Stoakes
2023-08-01 16:34 ` David Hildenbrand
2023-08-01 16:39 ` Lorenzo Stoakes
2023-08-01 18:14 ` David Hildenbrand
2023-08-01 15:57 ` Baoquan He
2023-08-01 16:01 ` Baoquan He [this message]
2023-08-01 16:22 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZMksTC6pewXDgkFe@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=axboe@kernel.dk \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=efault@gmx.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liushixin2@huawei.com \
--cc=lstoakes@gmail.com \
--cc=mark.rutland@arm.com \
--cc=olsajiri@gmail.com \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=urezki@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.