From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932121AbdESAo1 (ORCPT <rfc822;w@1wt.eu>);
        Thu, 18 May 2017 20:44:27 -0400
Received: from mx2.suse.de ([195.135.220.15]:55797 "EHLO mx1.suse.de"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1753652AbdESAoR (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 May 2017 20:44:17 -0400
Date: Fri, 19 May 2017 02:44:14 +0200
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
        Stephen Smalley <sds@tycho.nsa.gov>, Ingo Molnar <mingo@kernel.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Michal Hocko <mhocko@kernel.org>, Vlastimil Babka <vbabka@suse.cz>,
        Andrew Morton <akpm@linux-foundation.org>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        Mateusz Guzik <mguzik@redhat.com>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: next-20170515: WARNING: CPU: 0 PID: 1 at
 arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
Message-ID: <20170519004414.GD8951@wotan.suse.de>
References: <20170515220650.GD17314@wotan.suse.de>
 <20170515221505.GE17314@wotan.suse.de>
 <CAGXu5j+2Vgi77xvgOvnSKnEHPooJDPO3sWPbBWY-spfK=kWj7Q@mail.gmail.com>
 <CAB=NE6WAsJtpssSu-j_A93cWF==3NyHD_0f6N0-1OgKThwmwGg@mail.gmail.com>
 <CAGXu5j+WdmJtsS-Wrn9tdXPes0kGEGKeS4Zh8Kmp4QCP9gQEFA@mail.gmail.com>
 <20170517164017.GP17314@wotan.suse.de>
 <CAGXu5jL+5x2Ba_nLrbn-oeu43qPinTS=ZPeXre-s3e-s2--jEg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAGXu5jL+5x2Ba_nLrbn-oeu43qPinTS=ZPeXre-s3e-s2--jEg@mail.gmail.com>
User-Agent: Mutt/1.6.0 (2016-04-01)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > Yes, but I had killed that boot session again, so upon my next boot
> > I had a different layout, the ASLR gap was much larger:
> >
> > ---[ Modules ]---
> > 0xffffffffc0000000-0xffffffffc01b0000        1728K                               pte
> > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW                 GLB x  pte
> > 0xffffffffc01b1000-0xffffffffc01b2000           4K                               pte
> > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro                 GLB x  pte
> > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro                 GLB NX pte
> > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW                 GLB NX pte
> >
> > As you can guess if we follow similar pattern the RW hole is the one this boot
> > warned about:
> >
> > [    1.450483] x86/mm: Found insecure W+X mapping at address ffffffffc01b0000/0xffffffffc01b0000
> > [    1.451280] ------------[ cut here ]------------
> > [    1.451721] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > [    1.452499] Modules linked in:
> > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc1-next-20170515+ #145
> >
> > I checked and indeed 0xffffffffc01b2000 is part of a module, it was not the first one
> > on the /proc/modules list but then again /proc/modules does not seem to have a specific
> > order other than perhaps being pegged into a linked list of modules once they go live,
> > and it seems its typically output backwards from when that happened, sorting that
> > by address we get:
> 
> Right, sorry, I'd expect it at the bottom of the list in
> /proc/modules, but that's fine, it's there.
> 
> >
> > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffc01df000 (E)
> >
> > And this then seems to be the first module loaded:
> >
> > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> >
> > The output of dmesg seems to confirm this as per the list of modules sorted
> > as per above.
> >
> >> Something touched the module gap and left is RW+x...
> >
> > Lemme try booting with e1000 renamed to e1000.ko.ignore and see how that goes.
> 
> Is it possible a module got loaded before e1000 and then unloaded?
> That seems odd, but maybe unload isn't cleaning up?
> 
> >> Are you able to bisect this?
> >
> > This issue has been present for a while so since I recall this I might be
> > able to reduce the number of needed target kernels to bisect. Lemme tinker
> > a bit and if no clear culprit comes up then will try bisect.
> 
> Okay, thanks!

Sorry to report that this issue is present since the feature's addition. So
the issue is there since its addition and is still present today. *But* it
may also be a configuration issue, given I have booted this guest *without*
this issue ...

So:

git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db

That boots with the warning. To help debug further I've minimized my modules
to only a few: scsi_mod, e1000, libata.

I suspect at this point this is not the fault of a particular module but
instead just an accounting semantic (>= or <= on an edge) but let's see.

I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
mappings") and I with:

[    0.949435] ------------[ cut here ]------------                             
[    0.949992] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
[    0.950996] x86/mm: Found insecure W+X mapping at address ffffffffc0000000/0xffffffffc0000000
[    0.951814] Modules linked in:                                               
[    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
[    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 ffffffff812ff335
[    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 ffff88013a5d7e90
[    0.955522]  0000000000000000 0000000000000004 0000000000000000 0000000000000000
[    0.956256] Call Trace:                                                      
[    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f                       
[    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0             
[    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80                
[    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0                      
[    0.958595]  [<ffffffff8106c5eb>] ptdump_walk_pgd_level_core+0x2eb/0x410     
[    0.959219]  [<ffffffff8106c7b7>] ptdump_walk_pgd_level_checkwx+0x17/0x20    
[    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100                  
[    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
[    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0                      
[    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70                    
[    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80                      
[    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                             
[    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.         
                                                                                
                                                                                
---[ High Kernel Mapping ]---                                                   
0xffffffff80000000-0xffffffff81000000          16M                               pmd
0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE     GLB NX pmd
0xffffffff81a00000-0xffffffff81c00000           2M     RW                 GLB NX pte
0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE     GLB NX pmd
0xffffffff82200000-0xffffffff82400000           2M     RW                 GLB NX pte
0xffffffff82400000-0xffffffffc0000000         988M                               pmd
---[ Modules ]---                                                               
0xffffffffc0000000-0xffffffffc0001000           4K     RW                 GLB x  pte
0xffffffffc0001000-0xffffffffc0002000           4K                               pte
0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte

root@piggy:~# cat /proc/modules | sort -k 6 | head -3                           
scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 (E)          
e1000 127757 0 - Live 0xffffffffc004d000 (E)                                    
libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 

So that 4K RW seems suspect of getting used for allocation purpose on edge
for a particular reason and it also happens to be on the edge of the high
kernel mapping. Could it be the boundary semantic issue ?

For instance can it be that since 0xffffffffc0002000 is given to the first
module by the allocator, scsi_mod, and since that address is *technically*
part of two boundaries we get a splat ?

0xffffffffc0001000-0xffffffffc0002000           4K                               pte
0xffffffffc0002000-0xffffffffc0039000         220K     RW                 GLB x  pte

  Luis