From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932603AbbIZDqG (ORCPT ); Fri, 25 Sep 2015 23:46:06 -0400 Received: from bh-25.webhostbox.net ([208.91.199.152]:48925 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932411AbbIZDqD (ORCPT ); Fri, 25 Sep 2015 23:46:03 -0400 Subject: Re: Glibc recvmsg from kernel netlink socket hangs forever To: Herbert Xu References: <20150925043653.GA29111@roeck-us.net> <20150925045853.GA5286@gondor.apana.org.au> <5604DCD2.4090600@roeck-us.net> <20150925155511.GA9575@gondor.apana.org.au> Cc: Steven Schlansker , linux-kernel@vger.kernel.org, Eric Dumazet , netdev@vger.kernel.org From: Guenter Roeck Message-ID: <560614F6.6050601@roeck-us.net> Date: Fri, 25 Sep 2015 20:45:58 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150925155511.GA9575@gondor.apana.org.au> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated_sender: linux@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: authenticated_id: linux@roeck-us.net X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Herbert, On 09/25/2015 08:55 AM, Herbert Xu wrote: > On Thu, Sep 24, 2015 at 10:34:10PM -0700, Guenter Roeck wrote: >> >> Any idea what may be needed for 4.1 ? >> I am currently trying https://patchwork.ozlabs.org/patch/473041/, > > This patch should not make any difference on 4.1 and later because > 4.1 is where I rewrote rhashtable resizing and it should work (or > if it is broken then the latest kernel should be broken too). > >> but I have no idea if that will help with the problem we are seeing there. > > Having looked at your message agin I don't think the issue I > alluded to is relevant since the symptom there ought to be a > straight kernel lock-up as opposed to just a user-space one because > you will end up with the kernel sending a message to itself. > > And the fact that 4.2 works is more indicative as the bug is > present in both 4.1 and 4.2. > > I'll try to reproduce this in 4.1 as time permits but no promises. > After applying your two patches, I don't see the problem in 4.1 anymore. We'll run the system through regression; the complete cycle may take a couple of weeks. I'll let you know if we find any further problems. If you submit additional patches in that area, it would be great if you can Cc: me. Thanks, Guenter