All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [RFC PATCH] Fix: x86 unaligned __memcpy to/from virtual memory
Date: Wed, 24 Jun 2015 18:49:07 +0000 (UTC)	[thread overview]
Message-ID: <609198255.2568.1435171747039.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <CA+55aFwy0cN+3z5-4Oy4LGVaB=PnfSsyUcA+xU68K=0aKeOB9Q@mail.gmail.com>

----- On Jun 24, 2015, at 1:00 PM, Linus Torvalds torvalds@linux-foundation.org wrote:

> On Wed, Jun 24, 2015 at 9:14 AM, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>> When trying to change memory allocation from kmalloc to vmalloc to
>> handle memory fragmentation for reallocation of a growing string within
>> a kernel module, our testsuite started to trigger kernel OOPS. It
>> triggers when the string is copied into a ring buffer using memcpy,
>> piece-wise.
> 
> I hate your patch, just because it doesn't make sense. The "when
> non-aligned, don't do movsq" might make sense for performance, but it
> does *not* make sense for correctness.
> 
> Why would "rep movsq" trigger the oops, but memcpy_orig not? I think
> the fundamental bug is something else.
> 
> I don't see *what* the bug is, though.
> 
> Very odd.
> 
> x86 people, can you see anything there? It does look like
> vmalloc_fault() *should* have triggered, so why didn't it? The address
> is definitely in the VMALLOC_START/END range, and the error code is
> 0000, so how come didn't vmalloc_fault() handle this?
> 
>> This points to arch/x86/lib/memcpy_64.S:__memcpy rep movsq instruction.
>> This could be reproduced on my Lenovo x240 laptop (i7 CPU), and within a
>> virtual machine running on a Intel(R) Xeon(R) CPU E5-2630 v3 host.
>> Interestingly, with the VM having the rep_good flag (but not erms), the issue
>> triggers. However, if the VM has both rep_good and erms flags, the issue does
>> not trigger.
> 
> With ERMS, I think we end up using just "rep movsb" instead. But there
> should be absolutely no difference in fault patterns.
> 
> I see the QEMU part, is this just regular kvm?

Yes, this is just regular kvm.

> Could you add a debug
> printk to the vmalloc_fault() caller and then reproduce the oops? It
> shouldn't trigger enough to be a horrible logging problem.

Here is the output. I added the printk just after the initial range
check within vmalloc_fault. What is weird is that the fault happens
on an aligned source address. It's the destination which is unaligned.
Let me know if you need more info.

[   53.084521] DEBUG: vmalloc_fault at address 0xffffc9000746e000
[   53.085460] BUG: unable to handle kernel paging request at ffffc9000746e000
[   53.085460] IP:
[   53.090220]  [<ffffffff81316f12>] __memcpy+0x12/0x20
[   53.090220] PGD 236c92067 PUD 236c93067 PMD 22e840067 PTE 0
[   53.090220] Oops: 0000 [#1] SMP 
[   53.090220] Modules linked in: lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_types(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_kprobes(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) virtio_blk virtio_net virtio_pci virtio_ring virtio [last unloaded: lttng_statedump]
[   53.090220] CPU: 4 PID: 3532 Comm: lttng-consumerd Tainted: G           O    4.1.0+ #10
[   53.090220] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[   53.090220] task: ffff880235355aa0 ti: ffff8800bb6d0000 task.ti: ffff8800bb6d0000
[   53.090220] RIP: 0010:[<ffffffff81316f12>]  [<ffffffff81316f12>] __memcpy+0x12/0x20
[   53.090220] RSP: 0018:ffff8800bb6d3da0  EFLAGS: 00010206
[   53.090220] RAX: ffff8802355b3025 RBX: 0000000000000fdb RCX: 00000000000001fb
[   53.090220] RDX: 0000000000000003 RSI: ffffc9000746e000 RDI: ffff8802355b3025
[   53.090220] RBP: ffff8800bb6d3db8 R08: ffff880231cd7200 R09: 0000000000000025
[   53.090220] R10: 0000000000000000 R11: 0000000000001000 R12: ffff8800bb6d3dc8
[   53.090220] R13: ffff88022e437400 R14: 0000000000000fdb R15: 0000000000000fdb
[   53.090220] FS:  00007f24d8bbc700(0000) GS:ffff880237280000(0000) knlGS:0000000000000000
[   53.090220] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   53.090220] CR2: ffffc9000746e000 CR3: 00000000ba6d6000 CR4: 00000000000006e0
[   53.090220] Stack:
[   53.090220]  ffffffffa05ac797 ffff8802334fb300 ffff8802334fb350 ffff8800bb6d3e48
[   53.090220]  ffffffffa0473060 ffff88022e437400 0000000000000000 0000000000000fdb
[   53.090220]  ffffffff00000001 ffff880231cd7200 0000000000000fdb 0000000000000025
[   53.090220] Call Trace:
[   53.090220]  [<ffffffffa05ac797>] ? lttng_event_write+0x87/0xb0 [lttng_ring_buffer_metadata_client]
[   53.090220]  [<ffffffffa0473060>] lttng_metadata_output_channel+0xd0/0x120 [lttng_tracer]
[   53.090220]  [<ffffffffa04755f9>] lttng_metadata_ring_buffer_ioctl+0x79/0xd0 [lttng_tracer]
[   53.090220]  [<ffffffff8117ba10>] do_vfs_ioctl+0x2e0/0x4e0
[   53.090220]  [<ffffffff812b35c7>] ? file_has_perm+0x87/0xa0
[   53.090220]  [<ffffffff8117bc91>] SyS_ioctl+0x81/0xa0
[   53.090220]  [<ffffffff818bbd37>] tracesys_phase2+0x84/0x89
[   53.090220] Code: 5b 5d c3 66 0f 1f 44 00 00 e8 6b fc ff ff eb e1 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 
[   53.090220] RIP  [<ffffffff81316f12>] __memcpy+0x12/0x20
[   53.090220]  RSP <ffff8800bb6d3da0>
[   53.090220] CR2: ffffc9000746e000
[   53.090220] ---[ end trace 850d7bf1b41647ee ]---



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2015-06-24 18:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-24 16:14 [RFC PATCH] Fix: x86 unaligned __memcpy to/from virtual memory Mathieu Desnoyers
2015-06-24 17:00 ` Linus Torvalds
2015-06-24 18:49   ` Mathieu Desnoyers [this message]
2015-06-24 18:53     ` H. Peter Anvin
2015-06-24 19:15     ` Linus Torvalds
2015-06-24 23:54       ` Mathieu Desnoyers
2015-06-25  0:33         ` Mathieu Desnoyers
2015-06-25  0:37         ` Linus Torvalds
2015-06-25 12:58           ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=609198255.2568.1435171747039.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.