From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD1ACC197A0 for ; Mon, 20 Nov 2023 21:22:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231204AbjKTVWQ (ORCPT ); Mon, 20 Nov 2023 16:22:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232453AbjKTVWH (ORCPT ); Mon, 20 Nov 2023 16:22:07 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6641E1AA for ; Mon, 20 Nov 2023 13:22:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700515322; x=1732051322; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=j8B1ITWEeAkXv6+ECAk3eA35WeV+XBuBQ2iHt/ZPIRE=; b=iJ0gN0oWB9eydN3czY0ij9jfkKK8/z7zo4nciTqWHVR5zXGBIPqW9SlE o27c/4Q2pOpfONftDF6jEovVaoxMLWx/U4xcy5F4sbvCGMJhgAfpUhBwY 7uLjGxuWoZI02xbkjQp7G90AS6kJFAw2gBmVU0laKL6USmLhFr/2hdZvh HhXz2HQSwp46DjPqiihswQEqhWdYS6TzZXO93KPu1gdKiq3qR80RpIDmw o7QVezJsH327GPODA+2IWZ3pFC6+MvK21QyPYNqMXM9aSKDTpK/bf/v// DVltEKHHaKC7VMk7DQmvj7vx6hIsvb22j0k5V2fDT8D50KFjFLxmA1/jJ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="391488628" X-IronPort-AV: E=Sophos;i="6.04,214,1695711600"; d="scan'208";a="391488628" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 13:22:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832403594" X-IronPort-AV: E=Sophos;i="6.04,214,1695711600"; d="scan'208";a="832403594" Received: from linux.intel.com ([10.54.29.200]) by fmsmga008.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 13:22:01 -0800 Received: from [10.213.161.18] (kliang2-mobl1.ccr.corp.intel.com [10.213.161.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id 183AC580D93; Mon, 20 Nov 2023 13:22:00 -0800 (PST) Message-ID: <7caf86b8-f050-4d0f-8aba-e2d725a0ab64@linux.intel.com> Date: Mon, 20 Nov 2023 16:21:59 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology() Content-Language: en-US To: Alexander Antonov , peterz@infradead.org, linux-kernel@vger.kernel.org Cc: kyle.meyer@hpe.com, alexey.v.bayduraev@linux.intel.com References: <20231115151327.1874060-1-alexander.antonov@linux.intel.com> <50ce6fce-c2fc-4392-b405-5c9a7a93f061@linux.intel.com> From: "Liang, Kan" In-Reply-To: <50ce6fce-c2fc-4392-b405-5c9a7a93f061@linux.intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023-11-20 2:49 p.m., Alexander Antonov wrote: > > On 11/15/2023 8:00 PM, Liang, Kan wrote: >> >> On 2023-11-15 10:13 a.m., alexander.antonov@linux.intel.com wrote: >>> From: Alexander Antonov >>> >>> The NULL dereference happens inside upi_fill_topology() procedure in >>> case of disabling one of the sockets on the system. >>> >>> For example, if you disable the 2nd socket on a 4-socket system then >>> uncore_max_dies() returns 3 and inside pmu_alloc_topology() memory will >>> be allocated only for 3 sockets and stored in type->topology. >>> In discover_upi_topology() memory is accessed by socket id from >>> CPUNODEID >>> registers which contain physical ids (from 0 to 3) and on the line: >>> >>>      upi = &type->topology[nid][idx]; >>> >>> out-of-bound access will happen and the 'upi' pointer will be passed to >>> upi_fill_topology() where it will be dereferenced. >>> >>> To avoid this issue update the code to convert physical socket id to >>> logical socket id in discover_upi_topology() before accessing memory. >>> >>> Fixes: f680b6e6062e ("perf/x86/intel/uncore: Enable UPI topology >>> discovery for Icelake Server") >>> Reported-by: Kyle Meyer >>> Tested-by: Kyle Meyer >>> Signed-off-by: Alexander Antonov >>> --- >>>   arch/x86/events/intel/uncore_snbep.c | 10 ++++++++-- >>>   1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/x86/events/intel/uncore_snbep.c >>> b/arch/x86/events/intel/uncore_snbep.c >>> index 8250f0f59c2b..49bc27ab26ad 100644 >>> --- a/arch/x86/events/intel/uncore_snbep.c >>> +++ b/arch/x86/events/intel/uncore_snbep.c >>> @@ -5596,7 +5596,7 @@ static int discover_upi_topology(struct >>> intel_uncore_type *type, int ubox_did, i >>>       struct pci_dev *ubox = NULL; >>>       struct pci_dev *dev = NULL; >>>       u32 nid, gid; >>> -    int i, idx, ret = -EPERM; >>> +    int i, idx, lgc_pkg, ret = -EPERM; >>>       struct intel_uncore_topology *upi; >>>       unsigned int devfn; >>>   @@ -5614,8 +5614,13 @@ static int discover_upi_topology(struct >>> intel_uncore_type *type, int ubox_did, i >>>           for (i = 0; i < 8; i++) { >>>               if (nid != GIDNIDMAP(gid, i)) >>>                   continue; >>> +            lgc_pkg = topology_phys_to_logical_pkg(i); >>> +            if (lgc_pkg < 0) { >>> +                ret = -EPERM; >>> +                goto err; >>> +            } >> In the snbep_pci2phy_map_init(), there are similar codes to find the >> logical die id. Can we factor a common function for both of them? >> >> Thanks, >> Kan > Hi Kan, > > Thank you for your comment. > Yes, I think we can factor out the common loop where GIDNIDMAP is being > checked. > But inside snbep_pci2phy_map_init() we have a bit different procedure which > also does the following: > > if (topology_max_die_per_package() > 1) >     die_id = i; > > I think that having this code, at least, in our case could bring us to the > same issue which we are trying to fix. But of course we could > parametrize this checking. The topology_max_die_per_package() > 1 means there are more that 1 die in a socket. AFAIK, it only happens on the Cascade Lake AP. Did you observe it in the ICX? Thanks, Kan > > What do you think? > > Thanks, > Alexander >> >>>               for (idx = 0; idx < type->num_boxes; idx++) { >>> -                upi = &type->topology[nid][idx]; >>> +                upi = &type->topology[lgc_pkg][idx]; >>>                   devfn = PCI_DEVFN(dev_link0 + idx, >>> ICX_UPI_REGS_ADDR_FUNCTION); >>>                   dev = >>> pci_get_domain_bus_and_slot(pci_domain_nr(ubox->bus), >>>                                     ubox->bus->number, >>> @@ -5626,6 +5631,7 @@ static int discover_upi_topology(struct >>> intel_uncore_type *type, int ubox_did, i >>>                           goto err; >>>                   } >>>               } >>> +            break; >>>           } >>>       } >>>   err: >>> >>> base-commit: 9bacdd8996c77c42ca004440be610692275ff9d0