All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: lkp@lists.01.org
Subject: Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf
Date: Thu, 09 Mar 2017 18:51:03 +0100	[thread overview]
Message-ID: <58C19607.6000605@iogearbox.net> (raw)
In-Reply-To: <alpine.DEB.2.20.1703091547460.3521@nanos>

[-- Attachment #1: Type: text/plain, Size: 3444 bytes --]

On 03/09/2017 03:49 PM, Thomas Gleixner wrote:
> On Thu, 9 Mar 2017, Daniel Borkmann wrote:
>> On 03/09/2017 02:10 PM, Thomas Gleixner wrote:
>>> On Thu, 9 Mar 2017, Daniel Borkmann wrote:
>>>> With regard to CPA_FLUSHTLB that Linus mentioned, when I investigated
>>>> code paths in change_page_attr_set_clr(), I did see that CPA_FLUSHTLB
>>>> was set each time we switched attrs and a cpa_flush_range() was
>>>> performed (with the correct number of pages and cache set to 0). That
>>>> would be a __flush_tlb_all() eventually.
>>>>
>>>> Hmm, it indeed might seem likely that this could be an emulation bug.
>>>
>>> Which variant of __flush_tlb_all() is used when the test fails?
>>>
>>> Check for the following flags in /proc/cpuinfo: pge invpcid
>>
>> I added the following and booted with both variants:
>>
>> printk("X86_FEATURE_PGE:%u\n",     static_cpu_has(X86_FEATURE_PGE));
>> printk("X86_FEATURE_INVPCID:%u\n", static_cpu_has(X86_FEATURE_INVPCID));
>>
>> "-cpu host" gives:
>>
>> [    8.326117] X86_FEATURE_PGE:1
>> [    8.326381] X86_FEATURE_INVPCID:1
>>
>> "-cpu kvm64" gives:
>>
>> [    8.517069] X86_FEATURE_PGE:1
>> [    8.517393] X86_FEATURE_INVPCID:0
>
> That's the one which fails. So it's using the CR4 based flushing. Just ran
> a test on a physical system with PGE=1 and INVPCID=0. Works fine.
>
> Emulation problem?

So in the git qemu code base (target/i386/helper.c), cr3 vs cr4 looks
like the following, both sharing the tlb_flush() itself:

void cpu_x86_update_cr3(CPUX86State *env, target_ulong new_cr3)
{
     X86CPU *cpu = x86_env_get_cpu(env);

     env->cr[3] = new_cr3;
     if (env->cr[0] & CR0_PG_MASK) {
         qemu_log_mask(CPU_LOG_MMU,
                         "CR3 update: CR3=" TARGET_FMT_lx "\n", new_cr3);
         tlb_flush(CPU(cpu));
     }
}

void cpu_x86_update_cr4(CPUX86State *env, uint32_t new_cr4)
{
     X86CPU *cpu = x86_env_get_cpu(env);
     uint32_t hflags;

#if defined(DEBUG_MMU)
     printf("CR4 update: %08x -> %08x\n", (uint32_t)env->cr[4], new_cr4);
#endif
     if ((new_cr4 ^ env->cr[4]) &
         (CR4_PGE_MASK | CR4_PAE_MASK | CR4_PSE_MASK |
          CR4_SMEP_MASK | CR4_SMAP_MASK | CR4_LA57_MASK)) {
         tlb_flush(CPU(cpu));
     }

     [...]
}

I added some debugging around __native_flush_tlb_global_irq_disabled()
and if I understand it correctly, the idea of cr4 is that we need to
toggle X86_CR4_PGE in order to trigger a TLB flush.

What I see is that original cr4 is 0x610. The cpu_tlbstate.cr4 is
consistent to native_read_cr4() and since cr4 is != 0, it tells me
based on the comment in native_read_cr4() that cr4 seems to be
supported. Thus, meaning we end up with writing ...

   native_write_cr4(0x610);
   native_write_cr4(0x610);

... twice, and this just doesn't trigger the desired TLB flush. I
changed the code into the following ...

         cr4 = this_cpu_read(cpu_tlbstate.cr4);
         /* clear PGE */
-       native_write_cr4(cr4 & ~X86_CR4_PGE);
+       native_write_cr4(cr4 ^ X86_CR4_PGE);
         /* write old PGE again and flush TLBs */
         native_write_cr4(cr4);

... and the test cases seem to be working for me now with "-cpu kvm64",
so that seems to trigger the TLB we were missing.

I don't know enough about x86 internals to tell whether the change is
sane, though, but it seems at least for qemu fwiw. ;) Thoughts?

Thanks,
Daniel

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Borkmann <daniel@iogearbox.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>,
	Laura Abbott <labbott@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>, Peter Anvin <hpa@zytor.com>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, LKP <lkp@01.org>,
	ast@fb.com, the arch/x86 maintainers <x86@kernel.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf
Date: Thu, 09 Mar 2017 18:51:03 +0100	[thread overview]
Message-ID: <58C19607.6000605@iogearbox.net> (raw)
In-Reply-To: <alpine.DEB.2.20.1703091547460.3521@nanos>

On 03/09/2017 03:49 PM, Thomas Gleixner wrote:
> On Thu, 9 Mar 2017, Daniel Borkmann wrote:
>> On 03/09/2017 02:10 PM, Thomas Gleixner wrote:
>>> On Thu, 9 Mar 2017, Daniel Borkmann wrote:
>>>> With regard to CPA_FLUSHTLB that Linus mentioned, when I investigated
>>>> code paths in change_page_attr_set_clr(), I did see that CPA_FLUSHTLB
>>>> was set each time we switched attrs and a cpa_flush_range() was
>>>> performed (with the correct number of pages and cache set to 0). That
>>>> would be a __flush_tlb_all() eventually.
>>>>
>>>> Hmm, it indeed might seem likely that this could be an emulation bug.
>>>
>>> Which variant of __flush_tlb_all() is used when the test fails?
>>>
>>> Check for the following flags in /proc/cpuinfo: pge invpcid
>>
>> I added the following and booted with both variants:
>>
>> printk("X86_FEATURE_PGE:%u\n",     static_cpu_has(X86_FEATURE_PGE));
>> printk("X86_FEATURE_INVPCID:%u\n", static_cpu_has(X86_FEATURE_INVPCID));
>>
>> "-cpu host" gives:
>>
>> [    8.326117] X86_FEATURE_PGE:1
>> [    8.326381] X86_FEATURE_INVPCID:1
>>
>> "-cpu kvm64" gives:
>>
>> [    8.517069] X86_FEATURE_PGE:1
>> [    8.517393] X86_FEATURE_INVPCID:0
>
> That's the one which fails. So it's using the CR4 based flushing. Just ran
> a test on a physical system with PGE=1 and INVPCID=0. Works fine.
>
> Emulation problem?

So in the git qemu code base (target/i386/helper.c), cr3 vs cr4 looks
like the following, both sharing the tlb_flush() itself:

void cpu_x86_update_cr3(CPUX86State *env, target_ulong new_cr3)
{
     X86CPU *cpu = x86_env_get_cpu(env);

     env->cr[3] = new_cr3;
     if (env->cr[0] & CR0_PG_MASK) {
         qemu_log_mask(CPU_LOG_MMU,
                         "CR3 update: CR3=" TARGET_FMT_lx "\n", new_cr3);
         tlb_flush(CPU(cpu));
     }
}

void cpu_x86_update_cr4(CPUX86State *env, uint32_t new_cr4)
{
     X86CPU *cpu = x86_env_get_cpu(env);
     uint32_t hflags;

#if defined(DEBUG_MMU)
     printf("CR4 update: %08x -> %08x\n", (uint32_t)env->cr[4], new_cr4);
#endif
     if ((new_cr4 ^ env->cr[4]) &
         (CR4_PGE_MASK | CR4_PAE_MASK | CR4_PSE_MASK |
          CR4_SMEP_MASK | CR4_SMAP_MASK | CR4_LA57_MASK)) {
         tlb_flush(CPU(cpu));
     }

     [...]
}

I added some debugging around __native_flush_tlb_global_irq_disabled()
and if I understand it correctly, the idea of cr4 is that we need to
toggle X86_CR4_PGE in order to trigger a TLB flush.

What I see is that original cr4 is 0x610. The cpu_tlbstate.cr4 is
consistent to native_read_cr4() and since cr4 is != 0, it tells me
based on the comment in native_read_cr4() that cr4 seems to be
supported. Thus, meaning we end up with writing ...

   native_write_cr4(0x610);
   native_write_cr4(0x610);

... twice, and this just doesn't trigger the desired TLB flush. I
changed the code into the following ...

         cr4 = this_cpu_read(cpu_tlbstate.cr4);
         /* clear PGE */
-       native_write_cr4(cr4 & ~X86_CR4_PGE);
+       native_write_cr4(cr4 ^ X86_CR4_PGE);
         /* write old PGE again and flush TLBs */
         native_write_cr4(cr4);

... and the test cases seem to be working for me now with "-cpu kvm64",
so that seems to trigger the TLB we were missing.

I don't know enough about x86 internals to tell whether the change is
sane, though, but it seems at least for qemu fwiw. ;) Thoughts?

Thanks,
Daniel

  reply	other threads:[~2017-03-09 17:51 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01 12:54 [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf Fengguang Wu
2017-03-01 12:54 ` Fengguang Wu
2017-03-02 20:23 ` Fengguang Wu
2017-03-02 20:23   ` Fengguang Wu
2017-03-02 20:40   ` Daniel Borkmann
2017-03-02 20:40     ` Daniel Borkmann
2017-03-08 19:25     ` Linus Torvalds
2017-03-08 19:25       ` Linus Torvalds
2017-03-08 22:27       ` Daniel Borkmann
2017-03-08 22:27         ` Daniel Borkmann
2017-03-08 22:36         ` Kees Cook
2017-03-08 22:36           ` Kees Cook
2017-03-08 22:51           ` Daniel Borkmann
2017-03-08 22:51             ` Daniel Borkmann
2017-03-08 23:55           ` Laura Abbott
2017-03-08 23:55             ` Laura Abbott
2017-03-09  5:36             ` Kees Cook
2017-03-09  5:36               ` Kees Cook
2017-03-09 13:04               ` Daniel Borkmann
2017-03-09 13:04                 ` Daniel Borkmann
2017-03-09 13:10                 ` Thomas Gleixner
2017-03-09 13:10                   ` Thomas Gleixner
2017-03-09 13:25                   ` Daniel Borkmann
2017-03-09 13:25                     ` Daniel Borkmann
2017-03-09 14:49                     ` Thomas Gleixner
2017-03-09 14:49                       ` Thomas Gleixner
2017-03-09 17:51                       ` Daniel Borkmann [this message]
2017-03-09 17:51                         ` Daniel Borkmann
2017-03-09 18:08                         ` David Miller
2017-03-09 18:08                           ` David Miller
2017-03-09 18:10                         ` Linus Torvalds
2017-03-09 18:10                           ` Linus Torvalds
2017-03-09 18:15                           ` Linus Torvalds
2017-03-09 18:15                             ` Linus Torvalds
2017-03-09 18:31                             ` Daniel Borkmann
2017-03-09 18:31                               ` Daniel Borkmann
2017-03-09 21:32                               ` Daniel Borkmann
2017-03-09 21:32                                 ` Daniel Borkmann
2017-03-09 21:32                                 ` Daniel Borkmann
2017-03-09 21:55                                 ` Borislav Petkov
2017-03-09 21:55                                   ` Borislav Petkov
2017-03-09 22:07                                   ` Borislav Petkov
2017-03-09 22:07                                     ` Borislav Petkov
2017-03-09 22:11                                     ` Daniel Borkmann
2017-03-09 22:11                                       ` Daniel Borkmann
2017-03-09 22:48                                       ` Borislav Petkov
2017-03-09 22:48                                         ` Borislav Petkov
2017-03-09 23:26                                         ` Linus Torvalds
2017-03-09 23:26                                           ` Linus Torvalds
2017-03-09 23:44                                           ` Borislav Petkov
2017-03-09 23:44                                             ` Borislav Petkov
2017-03-10  0:13                                             ` Daniel Borkmann
2017-03-10  0:13                                               ` Daniel Borkmann
2017-03-12 21:40                                           ` Borislav Petkov
2017-03-12 21:40                                             ` Borislav Petkov
2017-03-09 14:53                     ` Daniel Borkmann
2017-03-09 14:53                       ` Daniel Borkmann
2017-03-09 17:48                       ` Linus Torvalds
2017-03-09 17:48                         ` Linus Torvalds
2017-03-08 22:43         ` Linus Torvalds
2017-03-08 22:43           ` Linus Torvalds
2017-03-09  1:34           ` Fengguang Wu
2017-03-09  1:34             ` Fengguang Wu
2017-03-09 13:09       ` Thomas Gleixner
2017-03-09 13:09         ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58C19607.6000605@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.