From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932958AbbI3N1Y (ORCPT ); Wed, 30 Sep 2015 09:27:24 -0400 Received: from tiger.mobileactivedefense.com ([217.174.251.109]:58371 "EHLO tiger.mobileactivedefense.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932434AbbI3N1W (ORCPT ); Wed, 30 Sep 2015 09:27:22 -0400 From: Rainer Weikusat To: Mathias Krause Cc: Rainer Weikusat , Jason Baron , netdev@vger.kernel.org, "linux-kernel\@vger.kernel.org" , Eric Wong , Eric Dumazet , Alexander Viro , Davide Libenzi , Davidlohr Bueso , Olivier Mauras , PaX Team , Linus Torvalds , "peterz\@infradead.org" , "davem\@davemloft.net" Subject: Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket In-Reply-To: (Mathias Krause's message of "Wed, 30 Sep 2015 13:55:57 +0200") References: <20150913195354.GA12352@jig.fritz.box> <20150914023949.GA15012@dcvr.yhbt.net> <560AE202.4020402@akamai.com> <87612skwfb.fsf@doppelsaurus.mobileactivedefense.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) Date: Wed, 30 Sep 2015 14:25:58 +0100 Message-ID: <87pp10t4wp.fsf@doppelsaurus.mobileactivedefense.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (tiger.mobileactivedefense.com [217.174.251.109]); Wed, 30 Sep 2015 14:26:06 +0100 (BST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mathias Krause writes: > On 30 September 2015 at 12:56, Rainer Weikusat wrote: >> Mathias Krause writes: >>> On 29 September 2015 at 21:09, Jason Baron wrote: >>>> However, if we call connect on socket 's', to connect to a new socket 'o2', we >>>> drop the reference on the original socket 'o'. Thus, we can now close socket >>>> 'o' without unregistering from epoll. Then, when we either close the ep >>>> or unregister 'o', we end up with this list corruption. Thus, this is not a >>>> race per se, but can be triggered sequentially. >>> >>> Sounds profound, but the reproducers calls connect only once per >>> socket. So there is no "connect to a new socket", no? >>> But w/e, see below. >> >> In case you want some information on this: This is a kernel warning I >> could trigger (more than once) on the single day I could so far spend >> looking into this (3.2.54 kernel): >> >> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 list_del+0x9/0x30() >> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam >> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should be ffff88022c38f078, but was dead000000100100 >> [snip] > > Is that with Jason's patch or a vanilla v3.2.54? That's a kernel warning which occurred repeatedly (among other "link pointer disorganization" warnings) when I tested the "program with unknown behaviour" you wrote with the kernel I'm currently supporting a while ago (as I already wrote in the original mail).