From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753652AbbCBKRL (ORCPT ); Mon, 2 Mar 2015 05:17:11 -0500 Received: from service87.mimecast.com ([91.220.42.44]:42021 "EHLO service87.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbbCBKRH convert rfc822-to-8bit (ORCPT ); Mon, 2 Mar 2015 05:17:07 -0500 Message-ID: <54F438AE.7010301@arm.com> Date: Mon, 02 Mar 2015 10:17:18 +0000 From: Sudeep Holla User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Andre Przywara , Borislav Petkov CC: Sudeep Holla , "linux-kernel@vger.kernel.org" , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , "x86@kernel.org" Subject: Re: [PATCH RFT v2] x86: move cacheinfo sysfs to generic cacheinfo infrastructure References: <20150224175749.GC3575@pd.tnic> <1425249564-28347-1-git-send-email-andre.przywara@arm.com> In-Reply-To: <1425249564-28347-1-git-send-email-andre.przywara@arm.com> X-OriginalArrivalTime: 02 Mar 2015 10:17:02.0650 (UTC) FILETIME=[06BA91A0:01D054D2] X-MC-Unique: 115030210170401201 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andre, On 01/03/15 22:39, Andre Przywara wrote: > On Tue, 24 Feb 2015 18:57:49 +0100, Borislav Petkov wrote: >> On Mon, Feb 23, 2015 at 06:14:25PM +0000, Sudeep Holla wrote: >>> - Rebased on v4.0-rc1 >>> - Fixed lockdep warning reported by Borislav >> You probably have fixed the lockdep splat but not the NULL pointer >> dereference which was there in the first mail I sent you. I'd suggest >> you find an AMD box with an L3 and do some testing yourself before >> sending out another version. > > So I gave that a spin on my old PhenomII X6 box, and indeed there is a > NULL pointer dereference. I tracked it down to > __cache_amd_cpumap_setup(), where the sibling map for the L3 is > populated. When the function is called for the first CPU, subsequent > CPU's cacheinfo structures obviously haven't been initialized yet, so > the struct cpu_cacheinfo we obtain for the other CPUs is empty > (num_levels and num_leaves are 0). Dereferencing info_list + index is > then a bad idea. Thanks a lot for spending time on debugging this and providing the fix. > I fixed it by simply skipping the sibling map setup in this case (see > the patch below). Later we revisit this code path because the most > outer loop iterates over all CPUs anyway, so the fields are valid then > and the sibling map is build up properly. Not sure if that is a proper > fix (should we check for info_list being NULL?, is num_levels the > right value?), but I don't feel like looking into this code deeper > than I already did (Sudeep should be decorated for doing that!). > > Boris, can you try this on a Bulldozer class machine? > > Sudeep, if Boris acknowledges this, can you merge those two lines (or > something similar) in your patch and repost it? I can give it a try > then again. > I am fine squashing this change into the patch if Boris is fine with the change. >> >> Also, when testing, do a snapshot of the sysfs cache hierarchy by >> doing: >> >> grep . -EriIn /sys/devices/system/cpu/cpu?/cache/* >> before your changes and after and check for differences. >> >> We can't allow ourselves any differences because this is an API. > > But as Sudeep mentioned, this is an orthogonal issue not directly > related to this patch. > Indeed I see a regression between 3.19.0 and a clean 4.0.0-rc1 in this > respect, the diff is like: > -cpu5/cache/index3/shared_cpu_map:00000000,0000003f > +cpu5/cache/index3/shared_cpu_map:3f > (for each cpu and index) > > Can someone confirm (and investigate) this? > I saw this even on ARM64 platforms with 4.0-rc1 and did narrow down to series of formatting functions changes introduced by Tejun Heo, mainly commit 4a0792b0e7a6("bitmap: use %*pb[l] to print bitmaps including cpumasks and nodemasks") Regards, Sudeep