From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65176283FCD for ; Mon, 12 Jan 2026 12:49:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768222197; cv=none; b=RI5v1k8IMao3d5rpxnVyd7+hlhFZi/p7EFaWroD631HKt+h4fpueerdXO0xzQGM9Pb1ptSojB4mvBqnQgPqU9qkw0alIgXyH4CRFGp0kPJeF5dms5XARJow3oVg+jsRbTeZDswFDaxj2Duxl4pUrl8iD9nS/9MlOmmG7UbhQvT4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768222197; c=relaxed/simple; bh=k6CuEvapSJzan3n5FbnxsmbcgLhFr7TYoO6c3KMexFU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lp2UAtuUd1RR8ZGIIyg5EzMlyi/11Rl6Kxw9psqqinS18Ojrnln6zzOth/UWBA0jgrWapMhUfV7S2ZFCfqpzi8cpiz4ILJp/6v13G61eAwMNXOvZ5cezAnTG/SdqyeA73UU4XuvSDUYKiOVZPAEYwgbtO4Eom3oZ7Xz8ufQTO+U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=I+OJ5vxo; arc=none smtp.client-ip=209.85.128.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="I+OJ5vxo" Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-477632d9326so38371105e9.1 for ; Mon, 12 Jan 2026 04:49:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768222194; x=1768826994; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=OdnGGL/RXOJtS5w4R7RZPmgrr+FeiavZ0n9LQeOAkZQ=; b=I+OJ5vxob8nHvD9KdJv+ug7wmTIvVCWQOt2LCvNImLAveybzCh4R/BJHJjwk2Fc8Yi wfeeAi9qV2HOeofqsdXZjT5zst6lfQfWXL+IvMqdsLH5tPiitB7SzC5EFpWpALxzt7fr GqE4NE8fhxAFhFjCxOiUCasMJT5VxcjyUxMvHOiGfU7vf2JeSD9BD+osPZuQAp7lKOdq z+RnSAxitMRMMLUgZKwTNC69ZM4uijJ2BPXlANMOm0Coe43Q+tMAQIMPFHSxx/GxuIyx lzESN3I1xIwIX89wJBJk2MzcGzu4i5pw1dHHbUuNOWOkBvIDjoEKWmE3UzOU88Ru0Wkz nTZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768222194; x=1768826994; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OdnGGL/RXOJtS5w4R7RZPmgrr+FeiavZ0n9LQeOAkZQ=; b=jvU75eV72Dzt4gWHOCd62W98IAzsux9I6khmrRduUhiMRKDESXtzoVursJDlnQB9IF mgyMMFcXC6CM1HOWyEwJPNaarpGvPluROmgGyeiKsF873+111ILMddvEnRqqepmdwbGQ 0C08mwBGEu8jO6/Mu8equ0Qg9X5Z2YUyqDeCZVPdgM+wJDpBaf4Byse3ygZVXOLnWzzp QVZIHf6rdH8Ss1bX+lIPjfjJYgXM2cczOdmi3k3MjjAsHcqFH2/GgDPNSQPmNq3p6xqx B1AzIb7zkeJ7m8itdbpj+D8qKJIAr/4f3hE7E1euKaORfZegkRF4IMS9dIRHtbwE/zBc Islg== X-Forwarded-Encrypted: i=1; AJvYcCVg0Ua05GfXcJyrEoV/RxZAfKVVvyuvDZpK1iHx18Xl1vqWwFShPEk5Rp1A/S/2iYI7TekhuKCoWeiXkg==@vger.kernel.org X-Gm-Message-State: AOJu0Ywq4xYgL8BAaC2sMJWyZuM40lQ2j7XqsC8/Maqsn0TvFjC26Nwn 950DOzW3CUBmOXFrmuxFquYEedhQaq2B2woEgeZV3jUWnddAn8OiDgk0 X-Gm-Gg: AY/fxX4gD+p8OT7XeTe3KrK7jWuBuqMJt8MbwZspKctrt5vem0w14eD6vKO3Wqlbyoa 9ZQm/GXIkg5dDywylVIQpRm5nYcrJP1WK0ORSoKRhaWR4xZeR7livB2qe+uz5qzoCAYefd+Jh2X AmjAFDir2mEayHsy9QsnR731hI6zOoUzTv0LjHZfnSAXI95VrKC6DLyg4ZStD2GoS6oXvKZMNpn HaifTEnCJ4Dpdh2qoMdTThkO5l5wskivYKeAa+5LdhDUOG8wF0CLLHKnquiaMwx5tJ+qMGWcQAC sw7JNSaSDTnCJkOyDBMBgeXLfUrtfqlR5UzL9nPlJ9gHSwkOTjqSJZL09Andmcyq1Jb66ZVBg2D hnvefoHnxLQ9Ha4PEt/N0DnHkp61vxFaLMJp7yx5q7hctCK6KUdr7lPMRGzy13hvNGA5coXNuhl 90NEH9OGbv6aSZ++MVXV5jJkDdml0ORMEiL/bKR+xM/mPgXshnyBEM6ghNvHEw6ZTreYKHFpDzf Q== X-Google-Smtp-Source: AGHT+IHPYcxcbvPgNwHAnOLaq9BDJKoiZKTVLzIp9Vnbuwu0Fww8JDn5hMZo3lQq/ca9sxMAf7a0aQ== X-Received: by 2002:a05:600c:3e8e:b0:477:93f7:bbc5 with SMTP id 5b1f17b1804b1-47d84b18596mr207867735e9.10.1768222193302; Mon, 12 Jan 2026 04:49:53 -0800 (PST) Received: from orome (p200300e41f0ffa00f22f74fffe1f3a53.dip0.t-ipconnect.de. [2003:e4:1f0f:fa00:f22f:74ff:fe1f:3a53]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47d7f418538sm353730975e9.5.2026.01.12.04.49.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Jan 2026 04:49:52 -0800 (PST) Date: Mon, 12 Jan 2026 13:49:50 +0100 From: Thierry Reding To: Ketan Patil Cc: krzk@kernel.org, jonathanh@nvidia.com, linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org Subject: Re: [PATCH v5 4/4] memory: tegra: Add MC error logging support for Tegra264 Message-ID: References: <20251219114354.2727906-1-ketanp@nvidia.com> <20251219114354.2727906-5-ketanp@nvidia.com> Precedence: bulk X-Mailing-List: linux-tegra@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="hu2ktmx2rhux7m3y" Content-Disposition: inline In-Reply-To: <20251219114354.2727906-5-ketanp@nvidia.com> --hu2ktmx2rhux7m3y Content-Type: text/plain; protected-headers=v1; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: [PATCH v5 4/4] memory: tegra: Add MC error logging support for Tegra264 MIME-Version: 1.0 On Fri, Dec 19, 2025 at 11:43:54AM +0000, Ketan Patil wrote: > In Tegra264, different components from memory subsystems like Memory > Controller Fabric (MCF), HUB, HUB Common (HUBC), Side Band Shim (SBS) > and MC Channels have different interrupt lines for receiving memory > controller error interrupts. >=20 > Add support for logging memory controller errors on Tegra264. > - Add MC error handling flow for MCF, HUB, HUBC, SBS and MC Channel. > - Each of these components have different interrupt lines for MC error. > - Register interrupt handlers for interrupts from these different MC > components. > - There is no global interrupt status register in Tegra264 unlike older > Tegra chips. > - There are common interrupt status registers in case of MCF, HUB, HUBC > from which figure out the slice number or hub number responsible for > generating interrupt and then read interrupt status register to find out > type of violation. > - Introduce new SoC specific fields in tegra_mc_soc like interrupt mask > values for MCF, HUB, HUBC etc., which are SOC specific. >=20 > Signed-off-by: Ketan Patil > --- > drivers/memory/tegra/mc.c | 49 +++- > drivers/memory/tegra/mc.h | 78 +++++- > drivers/memory/tegra/tegra186.c | 10 +- > drivers/memory/tegra/tegra194.c | 2 + > drivers/memory/tegra/tegra20.c | 10 +- > drivers/memory/tegra/tegra234.c | 2 + > drivers/memory/tegra/tegra264.c | 439 +++++++++++++++++++++++++++++++- > drivers/memory/tegra/tegra30.c | 2 + > include/soc/tegra/mc.h | 7 + > 9 files changed, 569 insertions(+), 30 deletions(-) Hi Ketan, I share Jon's concerns about this becoming overwhelmingly complex to understand and review. Let me try to provide a few ideas below on how we could potentially improve this. > diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c [...] > @@ -976,11 +976,36 @@ static int tegra_mc_probe(struct platform_device *p= dev) > } > } > =20 > - if (mc->soc->num_channels) > - mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->intmask, > - MC_INTMASK); > - else > - mc_writel(mc, mc->soc->intmask, MC_INTMASK); > + if (mc->soc->ops->num_interrupts > 1) { > + /* Unmask MCF interrupts */ > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->mcf_intmask, MCF_INTM= ASK_0); > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->mcf_intmask, > + MCF_INTPRIORITY_0); > + > + /* Unmask HUB and HUBC interrupts */ > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->hub_intmask, > + MSS_HUB_INTRMASK_0); > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->hub_intmask, > + MSS_HUB_INTRPRIORITY_0); > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->hubc_intmask, > + MSS_HUB_HUBC_INTMASK_0); > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->hubc_intmask, > + MSS_HUB_HUBC_INTPRIORITY_0); > + > + /* Unmask SBS interrupts */ > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->sbs_intmask, > + MSS_SBS_INTMASK_0); > + > + /* Unmask MC channel interrupt */ > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->mc_ch_intmask, > + MC_CH_INTMASK_0); > + } else { > + if (mc->soc->num_channels) > + mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->intmask, > + MC_INTMASK); > + else > + mc_writel(mc, mc->soc->intmask, MC_INTMASK); > + } This isn't all that bad, but I think we could probably extract this into a per-SoC operation that is called from here instead. If for nothing else it would help with the indentation level. Something like mc->ops->setup_interrupts() or whatever other flavour you prefer should do. > } > =20 > if (mc->soc->reset_ops) { > diff --git a/drivers/memory/tegra/mc.h b/drivers/memory/tegra/mc.h > index 06ae3dd37a47..fabbebf8a36c 100644 > --- a/drivers/memory/tegra/mc.h > +++ b/drivers/memory/tegra/mc.h > @@ -25,6 +25,7 @@ > #define MC_INT_DECERR_MTS BIT(16) > #define MC_INT_DECERR_GENERALIZED_CARVEOUT BIT(17) > #define MC_INT_DECERR_ROUTE_SANITY BIT(20) > +#define MC_INT_DECERR_ROUTE_SANITY_GIC_MSI BIT(21) > =20 > #define MC_INTMASK 0x04 > #define MC_GART_ERROR_REQ 0x30 > @@ -63,6 +64,60 @@ > =20 > #define MC_GLOBAL_INTSTATUS 0xf24 > =20 > +/* Registers for MSS HUB */ > +#define MSS_HUB_GLOBAL_INTSTATUS_0 0x6000 > +#define MSS_HUBC_INTR BIT(0) > + > +#define MSS_HUB_HUBC_INTSTATUS_0 0x6008 > +#define MSS_HUB_INTRSTATUS_0 0x600c > +#define MSS_HUB_HUBC_INTMASK_0 0x6010 > +#define MSS_HUB_HUBC_SCRUB_DONE_INTMASK BIT(0) > + > +#define MSS_HUB_HUBC_INTPRIORITY_0 0x6014 > +#define MSS_HUB_INTRMASK_0 0x6018 > +#define MSS_HUB_COALESCER_ERR_INTMASK BIT(0) > +#define MSS_HUB_SMMU_BYPASS_ALLOW_ERR_INTMASK BIT(1) > +#define MSS_HUB_ILLEGAL_TBUGRP_ID_INTMASK BIT(2) > +#define MSS_HUB_MSI_ERR_INTMASK BIT(3) > +#define MSS_HUB_POISON_RSP_INTMASK BIT(4) > +#define MSS_HUB_RESTRICTED_ACCESS_ERR_INTMASK BIT(5) > +#define MSS_HUB_RESERVED_PA_ERR_INTMASK BIT(6) > + > +#define MSS_HUB_INTRPRIORITY_0 0x601c > +#define MSS_HUB_SMMU_BYPASS_ALLOW_ERR_STATUS_0 0x6020 > +#define MSS_HUB_MSI_ERR_STATUS_0 0x6024 > +#define MSS_HUB_POISON_RSP_STATUS_0 0x6028 > +#define MSS_HUB_COALESCE_ERR_STATUS_0 0x60e0 > +#define MSS_HUB_COALESCE_ERR_ADR_HI_0 0x60e4 > +#define MSS_HUB_COALESCE_ERR_ADR_0 0x60e8 > +#define MSS_HUB_RESTRICTED_ACCESS_ERR_STATUS_0 0x638c > +#define MSS_HUB_RESERVED_PA_ERR_STATUS_0 0x6390 > +#define MSS_HUB_ILLEGAL_TBUGRP_ID_ERR_STATUS_0 0x63b0 > + > +/* Registers for MC Channel */ > +#define MC_CH_INTSTATUS_0 0x82d4 > +#define MC_CH_INTMASK_0 0x82d8 > +#define WCAM_ERR_INTMASK BIT(19) > + > +#define MC_ERR_GENERALIZED_CARVEOUT_STATUS_1_0 0xbc74 > + > +/* Registers for MCF */ > +#define MCF_COMMON_INTSTATUS0_0_0 0xce04 > +#define MCF_INTSTATUS_0 0xce2c > +#define MCF_INTMASK_0 0xce30 > +#define MCF_INTPRIORITY_0 0xce34 > + > +/* Registers for SBS */ > +#define MSS_SBS_INTSTATUS_0 0xec08 > +#define MSS_SBS_INTMASK_0 0xec0c > +#define MSS_SBS_FILL_FIFO_ISO_OVERFLOW_INTMASK BIT(0) > +#define MSS_SBS_FILL_FIFO_SISO_OVERFLOW_INTMASK BIT(1) > +#define MSS_SBS_FILL_FIFO_NISO_OVERFLOW_INTMASK BIT(2) > + > +/* Bit field of MC_ERR_ROUTE_SANITY_STATUS_0 register */ > +#define MC_ERR_ROUTE_SANITY_RW BIT(12) > +#define MC_ERR_ROUTE_SANITY_SEC BIT(13) > + > /* Bit field of MC_ERR_STATUS_0 register */ > #define MC_ERR_STATUS_RW BIT(16) > #define MC_ERR_STATUS_SECURITY BIT(17) > @@ -70,12 +125,22 @@ > #define MC_ERR_STATUS_WRITABLE BIT(26) > #define MC_ERR_STATUS_READABLE BIT(27) > =20 > +#define MC_ERR_STATUS_GSC_ADR_HI_MASK 0xffff > +#define MC_ERR_STATUS_GSC_ADR_HI_SHIFT 16 > +#define MC_ERR_STATUS_RT_ADR_HI_SHIFT 15 > + > #define MC_ERR_STATUS_TYPE_SHIFT 28 > #define MC_ERR_STATUS_TYPE_INVALID_SMMU_PAGE (0x6 << 28) > #define MC_ERR_STATUS_TYPE_MASK (0x7 << 28) > +#define MC_ERR_STATUS_RT_TYPE_MASK (0xf << 28) > +#define MC_ERR_STATUS_RT_TYPE_SHIFT 28 > =20 > #define MC_ERR_STATUS_ADR_HI_SHIFT 20 > -#define MC_ERR_STATUS_ADR_HI_MASK 0x3 > + > +#define ERR_GENERALIZED_APERTURE_ID_SHIFT 0 > +#define ERR_GENERALIZED_APERTURE_ID_MASK 0x1F > +#define ERR_GENERALIZED_CARVEOUT_APERTURE_ID_SHIFT 5 > +#define ERR_GENERALIZED_CARVEOUT_APERTURE_ID_MASK 0x1F > =20 > #define MC_EMEM_ARB_CFG_CYCLES_PER_UPDATE(x) ((x) & 0x1ff) > #define MC_EMEM_ARB_CFG_CYCLES_PER_UPDATE_MASK 0x1ff > @@ -188,15 +253,18 @@ extern const struct tegra_mc_ops tegra30_mc_ops; > =20 > #if defined(CONFIG_ARCH_TEGRA_186_SOC) || \ > defined(CONFIG_ARCH_TEGRA_194_SOC) || \ > - defined(CONFIG_ARCH_TEGRA_234_SOC) || \ > - defined(CONFIG_ARCH_TEGRA_264_SOC) > + defined(CONFIG_ARCH_TEGRA_234_SOC) > extern const struct tegra_mc_ops tegra186_mc_ops; > #endif > =20 > irqreturn_t tegra30_mc_handle_irq(int irq, void *data); > extern const irq_handler_t tegra30_mc_irq_handlers[]; > -extern const char * const tegra_mc_status_names[32]; > -extern const char * const tegra_mc_error_names[8]; > +extern const char * const tegra20_mc_status_names[32]; > +extern const char * const tegra20_mc_error_names[8]; I nearly wonder if these should be extracted into a per-SoC field like mc->ops->status_names and mc->ops->error_names. I think we didn't do this at the time because they were isolated to individual SoC generations via the error handling functions, but if they are exposed like this in the header, might as well abstract them out for a bit more cleanliness, even if it's not going to be a great win. > +int tegra186_mc_probe(struct tegra_mc *mc); > +int tegra186_mc_probe_device(struct tegra_mc *mc, struct device *dev); > +int tegra186_mc_resume(struct tegra_mc *mc); > +void tegra186_mc_remove(struct tegra_mc *mc); Looks like the only reason why we need to export these is so that we can override the IRQ handlers for Tegra264. Have you considered maybe moving the IRQ handlers into tegra_mc_soc? It looks to me like that would be an equally good fit now. It's no longer really an "op" in the sense that we use the others, but rather just another collection of pointers that are different from chip to chip. > diff --git a/drivers/memory/tegra/tegra186.c b/drivers/memory/tegra/tegra= 186.c > @@ -916,5 +916,7 @@ const struct tegra_mc_soc tegra186_mc_soc =3D { > .ch_intmask =3D 0x0000000f, > .global_intstatus_channel_shift =3D 0, > .mc_regs =3D &tegra20_mc_regs, > + .mc_addr_hi_mask =3D 0x3, Can we perhaps derive the new values for this from the existing num_address_bits? Also, is this the correct value for Tegra186? We use the same value for Tegra20 > +static const char *const tegra264_mc_status_names[32] =3D { > + [6] =3D "EMEM address decode error", > + [8] =3D "Security violation", > + [12] =3D "VPR violation", > + [13] =3D "Secure carveout violation", > + [16] =3D "MTS carveout violation", > + [17] =3D "Generalized carveout violation", > + [20] =3D "Route Sanity error", > + [21] =3D "GIC_MSI error", > +}; Comparing this to tegra20_mc_status_names[] there's significant overlap. So I think maybe a better way to deal with this would be to complete the list of status names and use them across all SoC generations. If for some reason we really want to exclude certain ones for a given SoC generation, it might be better to use some sort of validity mask or whatever, rather than use different subsets of strings here. I'm curious: if we look at something like "EMEM arbitration error" (9) on Tegra20, which is not available here for Tegra264. Does it mean that these errors don't exist on Tegra264 (i.e. they cannot happen)? We have a few in tegra20_mc_status_names[] that no longer apply for even Tegra30 (such as the "GART page fault" (7)). I guess what I'm asking is if there is a potential problem with listing all of the possible errors in one table, since there's evidently no conflict in the enumeration values. > +/* > + * MC instance aperture mapping for hubc registers > + */ > +static const int mc_hubc_aperture_number[5] =3D { > + 7, 8, 9, 10, 11 > +}; I don't fully understand what this means. In the code this array is only indexed into with constants from the individual IRQ callbacks, no loops are iterating over this. So I think there's potential for improvement here. For example, the indexed nature of these arrays doesn't provide any context as to what they are. It's merely mapping one integer to another, so it's both error prone and difficult to understand/review. Instead, maybe we can define what a HUBC aperture is. If it's currently only an "ID" or index, maybe it can be as simple as: struct tegra_mc_aperture { unsigned int index; }; static const struct tegra_mc_aperture tegra264_mc_aperture_disp =3D { .index =3D 7, }; static const struct tegra_mc_aperture tegra264_mc_aperture_system =3D { .index =3D 8, }; static const struct tegra_mc_aperture tegra264_mc_aperture_vision =3D { .index =3D 9, }; ... static const struct tegra_mc_aperture tegra264_mc_apertures =3D { &tegra264_mc_aperture_disp, &tegra264_mc_aperture_system, &tegra264_mc_aperture_vision, ... }; static const struct tegra_mc_soc tegra264_mc_soc =3D { ... .apertures =3D tegra264_mc_apertures, .num_apertures =3D ARRAY_SIZE(tegra264_mc_apertures), ... }; This looks verbose at first and perhaps somewhat unnecessary, but I think it adds so much more extra context. And I'm sure we'll run into other data that can be added into these structures. > +static void mcf_log_fault(struct tegra_mc *mc, u32 channel, unsigned lon= g mcf_ch_intstatus) First a couple of observations. > +{ > + unsigned int bit; This could be called something like "interrupt" or "irq" to better reflect what it does. It's also slightly confusing that we first iterate over these bits in mcf_ch_intstatus to then convert back to a mask so that we can match against the MC_INT_* definitions. It would probably be better to do something similar to the apertures above and collate all of the related information into some sort of structure. > + > + for_each_set_bit(bit, &mcf_ch_intstatus, 32) { > + const char *error =3D tegra264_mc_status_names[bit] ?: "unknown"; > + u32 intmask =3D BIT(bit); > + u32 status_reg, status1_reg =3D 0, addr_reg, addr_hi_reg =3D 0; > + u32 addr_val, value, client_id, i, addr_hi_shift =3D 0, addr_hi_mask = =3D 0, status1; > + const char *direction, *secure; > + const char *client =3D "unknown", *desc =3D "NA"; > + phys_addr_t addr =3D 0; > + bool is_gsc =3D false, err_type_valid =3D false, err_rt_type_valid =3D= false; > + u8 type; > + u32 mc_rw_bit =3D MC_ERR_STATUS_RW, mc_sec_bit =3D MC_ERR_STATUS_SECUR= ITY; > + > + switch (intmask) { > + case MC_INT_DECERR_EMEM: > + status_reg =3D mc->soc->mc_regs->mc_err_status; > + addr_reg =3D mc->soc->mc_regs->mc_err_add; > + addr_hi_reg =3D mc->soc->mc_regs->mc_err_add_hi; > + err_type_valid =3D true; > + break; > + > + case MC_INT_SECURITY_VIOLATION: > + status_reg =3D mc->soc->mc_regs->mc_err_status; > + addr_reg =3D mc->soc->mc_regs->mc_err_add; > + addr_hi_reg =3D mc->soc->mc_regs->mc_err_add_hi; > + err_type_valid =3D true; > + break; These two are exactly the same, so they could be combined. > + > + case MC_INT_DECERR_VPR: > + status_reg =3D mc->soc->mc_regs->mc_err_vpr_status; > + addr_reg =3D mc->soc->mc_regs->mc_err_vpr_add; > + addr_hi_shift =3D MC_ERR_STATUS_ADR_HI_SHIFT; > + addr_hi_mask =3D mc->soc->mc_addr_hi_mask; > + break; > + > + case MC_INT_SECERR_SEC: > + status_reg =3D mc->soc->mc_regs->mc_err_sec_status; > + addr_reg =3D mc->soc->mc_regs->mc_err_sec_add; > + addr_hi_shift =3D MC_ERR_STATUS_ADR_HI_SHIFT; > + addr_hi_mask =3D mc->soc->mc_addr_hi_mask; > + break; > + > + case MC_INT_DECERR_MTS: > + status_reg =3D mc->soc->mc_regs->mc_err_mts_status; > + addr_reg =3D mc->soc->mc_regs->mc_err_mts_add; > + addr_hi_shift =3D MC_ERR_STATUS_ADR_HI_SHIFT; > + addr_hi_mask =3D mc->soc->mc_addr_hi_mask; > + break; > + > + case MC_INT_DECERR_GENERALIZED_CARVEOUT: > + status_reg =3D mc->soc->mc_regs->mc_err_gen_co_status; > + status1_reg =3D MC_ERR_GENERALIZED_CARVEOUT_STATUS_1_0; > + addr_reg =3D mc->soc->mc_regs->mc_err_gen_co_add; > + addr_hi_shift =3D MC_ERR_STATUS_GSC_ADR_HI_SHIFT; > + addr_hi_mask =3D MC_ERR_STATUS_GSC_ADR_HI_MASK; > + is_gsc =3D true; > + break; > + > + case MC_INT_DECERR_ROUTE_SANITY: > + status_reg =3D mc->soc->mc_regs->mc_err_route_status; > + addr_reg =3D mc->soc->mc_regs->mc_err_route_add; > + addr_hi_shift =3D MC_ERR_STATUS_RT_ADR_HI_SHIFT; > + addr_hi_mask =3D mc->soc->mc_addr_hi_mask; > + mc_sec_bit =3D MC_ERR_ROUTE_SANITY_SEC; > + mc_rw_bit =3D MC_ERR_ROUTE_SANITY_RW; > + err_rt_type_valid =3D true; > + break; > + > + case MC_INT_DECERR_ROUTE_SANITY_GIC_MSI: > + status_reg =3D mc->soc->mc_regs->mc_err_route_status; > + addr_reg =3D mc->soc->mc_regs->mc_err_route_add; > + addr_hi_shift =3D MC_ERR_STATUS_RT_ADR_HI_SHIFT; > + addr_hi_mask =3D mc->soc->mc_addr_hi_mask; > + mc_sec_bit =3D MC_ERR_ROUTE_SANITY_SEC; > + mc_rw_bit =3D MC_ERR_ROUTE_SANITY_RW; > + err_rt_type_valid =3D true; > + break; These last two are also the same and can be combined. > + > + default: > + dev_err_ratelimited(mc->dev, "Incorrect MC interrupt mask\n"); > + return; > + } > + > + value =3D mc_ch_readl(mc, channel, status_reg); > + if (addr_hi_reg) { > + addr =3D mc_ch_readl(mc, channel, addr_hi_reg); > + } else { > + if (!is_gsc) { > + addr =3D ((value >> addr_hi_shift) & addr_hi_mask); > + } else { > + status1 =3D mc_ch_readl(mc, channel, status1_reg); > + addr =3D ((status1 >> addr_hi_shift) & addr_hi_mask); > + } > + } > + > + addr <<=3D 32; > + addr_val =3D mc_ch_readl(mc, channel, addr_reg); > + addr |=3D addr_val; > + > + if (value & mc_rw_bit) > + direction =3D "write"; > + else > + direction =3D "read"; > + > + if (value & mc_sec_bit) > + secure =3D "secure"; > + else > + secure =3D "non-secure"; > + > + client_id =3D value & mc->soc->client_id_mask; > + for (i =3D 0; i < mc->soc->num_clients; i++) { > + if (mc->soc->clients[i].id =3D=3D client_id) { > + client =3D mc->soc->clients[i].name; > + break; > + } > + } > + > + if (err_type_valid) { > + type =3D (value & mc->soc->mc_err_status_type_mask) >> > + MC_ERR_STATUS_TYPE_SHIFT; > + desc =3D tegra264_mc_error_names[type]; > + } else if (err_rt_type_valid) { > + type =3D (value & MC_ERR_STATUS_RT_TYPE_MASK) >> > + MC_ERR_STATUS_RT_TYPE_SHIFT; > + desc =3D tegra_rt_error_names[type]; > + } > + > + dev_err_ratelimited(mc->dev, "%s: %s %s @%pa: %s (%s)\n", > + client, secure, direction, &addr, error, desc); > + if (is_gsc) { > + dev_err_ratelimited(mc->dev, "gsc_apr_id=3D%u gsc_co_apr_id=3D%u\n", > + ((status1 >> ERR_GENERALIZED_APERTURE_ID_SHIFT) > + & ERR_GENERALIZED_APERTURE_ID_MASK), > + ((status1 >> ERR_GENERALIZED_CARVEOUT_APERTURE_ID_SHIFT) > + & ERR_GENERALIZED_CARVEOUT_APERTURE_ID_MASK)); > + } > + } > + > + /* clear interrupts */ > + mc_ch_writel(mc, channel, mcf_ch_intstatus, MCF_INTSTATUS_0); > +} To further refactor this, here's another observation: this function is split up into three steps: 1) collect a number of register offsets, bit positions and flags, 2) perform some operations based on these parameters, extract additional information and 3) generate a report of the collected information. I think one option for refactoring this would be to make this more data driven. This is somewhat difficult because we have these SoC parameters that we cannot easily put into another structure, but I think we can employ a trick to get the same results. If we look at the above sequence of steps, we can split this into three (or possibly even two or just one) callbacks per type of interrupt: struct tegra_mc_error_handler { unsigned int id; const char *name; void (*get_offsets)(...); void (*collect_info)(...); void (*report)(...); }; If we can make the output of .collect_info() generic enough, we could probably get rid of .report() and make that completely generic, taking the generic output as input. We might even be able to make .get_offsets() optional and have =2Ecollect_info() handle everything itself. We can then define a table such as this: static const struct tegra_mc_error_handler[] =3D { { .id =3D MC_INT_DECERR_EMEM_BIT, .name =3D "external memory decode error", .collect_info =3D tegra_mc_emem_collect, }, ... }; That said, we might even be able to derive the tegra_mc_error_names from that new table instead of keeping a separate one. With all of the above, we now have the means to simply iterate over the bits set in the top-level interrupt status, find the handler corresponding to each one and run the appropriate sequence. Moving the code for this into separate functions will decrease the complexity of each function and make this a *lot* easier to review and understand in my opinion. We will need to take some care to reduce duplication (since we have a few cases in the switch above that share the same content, I suspect we'll be able to create .collect_info() callbacks that can be reused multiple times, or we might be able to split these callbacks into smaller pieces that can be more easily reused). > + > +static irqreturn_t handle_mcf_irq(int irq, void *data) > +{ > + struct tegra_mc *mc =3D data; > + unsigned long mcf_common_intstat, mcf_intstatus; > + unsigned int slice; > + > + /* Read MCF_COMMON_INTSTATUS0_0_0 from MCB block */ > + mcf_common_intstat =3D mc_ch_readl(mc, MC_BROADCAST_CHANNEL, MCF_COMMON= _INTSTATUS0_0_0); > + if (mcf_common_intstat =3D=3D 0) { > + dev_err(mc->dev, "No interrupt in MCF\n"); I'd think this would be better as a warning than an error. > + return IRQ_NONE; > + } > + > + for_each_set_bit(slice, &mcf_common_intstat, 32) { > + /* Find out the slice number on which interrupt occurred */ > + if (slice > 4) { This looks like it should be parameterized from the start. > + dev_err(mc->dev, "Invalid value in registeer MCF_COMMON_INTSTATUS0_0_= 0\n"); Make this more useful by adding the invalid value to the message. Also, no need to be specific about the register name, since nobody will know what that means. We also know quite specifically why the value is invalid. A better error message would be: dev_err(mc->dev, "slice index out of bounds: %u\n", slice); > + return IRQ_NONE; > + } > + > + mcf_intstatus =3D mc_ch_readl(mc, slice, MCF_INTSTATUS_0); > + if (mcf_intstatus !=3D 0) > + mcf_log_fault(mc, slice, mcf_intstatus); > + } > + > + return IRQ_HANDLED; > +} > + > +static void hub_log_fault(struct tegra_mc *mc, u32 hub, unsigned long hu= b_intstat) > +{ > + unsigned int bit; > + > + for_each_set_bit(bit, &hub_intstat, 32) { > + const char *error =3D tegra_hub_status_names[bit] ?: "unknown"; > + u32 intmask =3D BIT(bit), client_id; > + const char *client =3D "unknown"; > + u32 status_reg, addr_reg =3D 0, addr_hi_reg =3D 0; > + u32 value, addr_val, i; > + phys_addr_t addr =3D 0; > + > + switch (intmask) { > + case MSS_HUB_COALESCER_ERR_INTMASK: > + status_reg =3D MSS_HUB_COALESCE_ERR_STATUS_0; > + addr_reg =3D MSS_HUB_COALESCE_ERR_ADR_0; > + addr_hi_reg =3D MSS_HUB_COALESCE_ERR_ADR_HI_0; > + break; > + > + case MSS_HUB_SMMU_BYPASS_ALLOW_ERR_INTMASK: > + status_reg =3D MSS_HUB_SMMU_BYPASS_ALLOW_ERR_STATUS_0; > + break; > + > + case MSS_HUB_ILLEGAL_TBUGRP_ID_INTMASK: > + status_reg =3D MSS_HUB_ILLEGAL_TBUGRP_ID_ERR_STATUS_0; > + break; > + > + case MSS_HUB_MSI_ERR_INTMASK: > + status_reg =3D MSS_HUB_MSI_ERR_STATUS_0; > + break; > + > + case MSS_HUB_POISON_RSP_INTMASK: > + status_reg =3D MSS_HUB_POISON_RSP_STATUS_0; > + break; > + > + case MSS_HUB_RESTRICTED_ACCESS_ERR_INTMASK: > + status_reg =3D MSS_HUB_RESTRICTED_ACCESS_ERR_STATUS_0; > + break; > + > + case MSS_HUB_RESERVED_PA_ERR_INTMASK: > + status_reg =3D MSS_HUB_RESERVED_PA_ERR_STATUS_0; > + break; > + > + default: > + dev_err_ratelimited(mc->dev, "Incorrect HUB interrupt mask\n"); > + return; > + } > + > + value =3D mc_ch_readl(mc, hub, status_reg); > + if (addr_reg) { > + addr =3D mc_ch_readl(mc, hub, addr_hi_reg); > + addr <<=3D 32; > + addr_val =3D mc_ch_readl(mc, hub, addr_reg); > + addr |=3D addr_val; > + } > + > + client_id =3D value & mc->soc->client_id_mask; > + for (i =3D 0; i < mc->soc->num_clients; i++) { > + if (mc->soc->clients[i].id =3D=3D client_id) { > + client =3D mc->soc->clients[i].name; > + break; > + } > + } > + > + dev_err_ratelimited(mc->dev, "%s: @%pa: %s status:%u\n", > + client, &addr, error, value); > + } > + > + /* clear interrupts */ > + mc_ch_writel(mc, hub, hub_intstat, MSS_HUB_INTRSTATUS_0); > +} Similar comments apply to this as to mcf_log_fault(). > + > +static irqreturn_t handle_hub_irq(int irq, void *data, int mc_hubc_apert= ure_number) > +{ > + struct tegra_mc *mc =3D data; > + unsigned long hub_global_intstat, hub_intstat, hub_interrupted =3D 0; > + unsigned int hub_global_mask =3D 0x7F00, hub_global_shift =3D 8, hub; I think you can shorten a lot of these variables by dropping the hub_ prefix. It's in a handle_*hub*_irq() function, so you don't need extra context. > + > + /* Read MSS_HUB_GLOBAL_INTSTATUS_0 from mc_hubc_aperture_number block */ > + hub_global_intstat =3D mc_ch_readl(mc, mc_hubc_aperture_number, MSS_HUB= _GLOBAL_INTSTATUS_0); We seem to have different names for the same thing here. We use the "memory controller channel read register" function and pass the aperture number in as the channel number. Maybe we should settle on one name and use that going forward? Or are these different things historically? Are we reusing channels from earlier chips for apertures on newer chips? > + if (hub_global_intstat =3D=3D 0) { > + dev_err(mc->dev, "No interrupt in HUB/HUBC\n"); Again, this can probably just be a warning, no need for an error. Thierry --hu2ktmx2rhux7m3y Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiOrDCAFJzPfAjcif3SOs138+s6EFAmlk7eoACgkQ3SOs138+ s6FX0g/8C1FEPWb9GcPei+/fyauRvxEvonCIbA7bGcOwtA0uS1C4zpz2UZ11B7xi 1z9UGGp7cKWYUO/aHXYAF8P4Sq37vxEysr18mpP/tDuDzrSYKgiDHv8/zKzWS11S B9TPCPJpxfy8nS95d5kjYAuoE2VN5FJ1QFidM69BTmnGV67TicnVFO7clJqv26h8 FLbKyudWgXGz8El1AKWUcpjqAnUIelqTWTmsU9T+XwcEqg+h7q3SG7QkRyLHpU/A PH/rqPKkI80nWT87vC6f6EdYn9zoAY6gMnVNwEWmXMwK01E6Z6yI2FebsJ/67v+5 Fd5BXMcjRQTOMqRgvrBsIf3vq5Uih0nTch6sD681etf74sf0xIuNI8bg0Xv9WJR3 sadn6P8WGgtLxrrcuMjqWsbTqzsiOTukSwg+WItQGG9kzY3QxkOeJ7BKe703z+Kw fUDoj3UOtq3Kiixdm75TJv/NZ74MeE17GZASjV2WRe277lv24PaCvKd18SvARVUf vxpmg+KH3shcbuFIEOHE+nCWAlIcIr0BcTNwSCvMoXzXigj2nPiwm+yhorTLI2Wr FUPJYWVL/ct87rTQeyo54NFbUjEBocD2EjwWIJ1WmVdbP5ekER5YLhu1qnUSpTS0 YnHK2PfpWk3qZwFkJdRHZEkuNp3CJvNcLB6RvRjbcoBMh2IYpQk= =9fBt -----END PGP SIGNATURE----- --hu2ktmx2rhux7m3y--