From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FFE11EEE6; Mon, 22 Jan 2024 18:37:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705948620; cv=none; b=Myu6C00g8lMaVT31ernlNe2Ww4jvngvqmsBgGbwmJzS+glH2V6YA4Taoy3xoMLWTtnHaRSlpSXSvHd0At/OzeRhtUwoBvCqBGhlw37ZZDtu4a3d1RnbDqnNyqaRWYKH5FZWWTylcbBlru2L+KxKcqzVd0Xi6F+k70OUuyWM+TnI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705948620; c=relaxed/simple; bh=hrwjbRp1ado/yW8ClBvMvhONAHhRUekw/nstehGSnZM=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FzKE46IbBt7UQ8iC9L8oOtjAkNiKScKZh2/H1KweBlkreom4G0KRod9FEYTR9Tkw7U8iHUureUkqzhEaVR4ub7u77iK0HF7eWU2dky/y1TF1M+TrI/8zfFQGKWsNX6zuxBw0zqe90GHe6fBe+oQWnB42tgBqVzatHKU6vtSKxJY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=F+Ihr1J/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="F+Ihr1J/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B7198C433C7; Mon, 22 Jan 2024 18:36:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705948620; bh=hrwjbRp1ado/yW8ClBvMvhONAHhRUekw/nstehGSnZM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=F+Ihr1J/rinwBofu3hFnmERu66u6Qwd2ohTRAek0EaKIh8OOXpybSPROymf+a2WYD kTNKzxkLmXUsQvYgnvAZRD00i7/LvXcpfwOHAlJ/qwp49Mns8P0AbVLSpwepqAC4Og 1nGQYA5lDvN/Kn3i7jvxw6T8xhW6QEkMDeguab5SXgxxr06M0qbLNibIDV2PCEOljV JBUNxKlFRw1xs4kzZeAB72eSKKEfaTA9X//NR7/plFd6tNryUjO51hN1wWimXStzq7 lbyDMdQNh3OT4SWjJXlzGb5JVyxp9ZBA9YpvZrRPcal/fecMkOJWTxPXpGD8bXeoOp PJOrbj8Ih7pDA== Date: Mon, 22 Jan 2024 10:36:58 -0800 From: Jakub Kicinski To: Matthieu Baerts Cc: Eric Dumazet , Netdev , LKML Subject: Re: Kernel panic in netif_rx_internal after v6 pings between netns Message-ID: <20240122103658.592962d1@kernel.org> In-Reply-To: References: <98724dcd-ddf3-4f78-a386-f966ffbc9528@kernel.org> <65c4f6a2-207f-45e0-9ec3-bad81a05b196@kernel.org> <5340b60d-a09a-4865-a648-d1a45e9e6d5f@kernel.org> <20240122092804.3535b652@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, 22 Jan 2024 19:22:42 +0100 Matthieu Baerts wrote: > > Somewhat related. What do you do currently to ignore crashes? =20 >=20 > I was wondering why you wanted to ignore crashes :) ... but then I saw > the new "Test ignored" and "Crashes ignored" sections on the status > page. Just to be sure: you don't want to report issues that have not > been introduced by the new patches, right? Initially, yes, but going forward I bet we'll always see crashes and breakage introduced downstream. So we need some knobs to selectively silence failing things. In an ideal world we'd also have some form of "last seen" stat displayed to know when to retire these entries.. > We don't need to do that on MPTCP side: > - either it is a new crash with patches that are in reviewed and that's > not impacting others =E2=86=92 we test each series individually, not a ba= tch of > series. > - or there are issues with recent patches, not in netdev yet =E2=86=92 we= fix, > or revert. > - or there is an issue elsewhere, like the kernel panic we reported > here: usually I try to quickly apply a workaround, e.g. applying a fix, > or a revert. I don't think we ever had an issue really impacting us > where we couldn't find a quick solution in one or two days. With the > panic we reported here, ~15% of the tests had an issue, that's "OK" to > have that for a few days/weeks >=20 > With fewer tests and a smaller community, it is easier for us to just > say on the ML and weekly meetings: "this is a known issue, please ignore > for the moment". But if possible, I try to add a workaround/fix in our > repo used by the CI and devs (not upstreamed). >=20 > For NIPA CI, do you want to do like with the build and compare with a > reference? Or multiple ones to take into account unstable tests? Or > maintain a list of known issues (I think you started to do that, > probably safer/easier for the moment)? Exactly - where we can a before/after diff is the best. We do that for all static checker / building kind of tests. But for selftests I'm not sure how effective and applicable that is. Even the stack trace I posted here happens somewhat unreliably :( We can try to develop more intelligent ways going forward, obviously :) > > I was seeing a lot of: > > https://netdev-2.bots.linux.dev/vmksft-net-mp/results/431181/vm-crash-t= hr0-2 > >=20 > > So I hacked up this function to filter the crash from NIPA CI: > > https://github.com/kuba-moo/nipa/blob/master/contest/remote/lib/vm.py#L= 50 > > It tries to get first 5 function names from the stack, to form=20 > > a "fingerprint". But I seem to recall a discussion at LPC's testing > > track that there are existing solutions for generating fingerprints. > > Are you aware of any? =20 >=20 > No, sorry. But I guess they are using that with syzkaller, no? >=20 > I have to admit that crashes (or warnings) are quite rare, so there was > no need to have an automation there. But if it is easy to have a > fingerprint, I will be interested as well, it can help for the tracking: > to find occurrences of crashes/warnings that are very hard to reproduce. Indeed, I'll keep my ear to the ground. I believe it was discussed in relation to KCIDB.