From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754691AbcBEP2j (ORCPT <rfc822;w@1wt.eu>);
	Fri, 5 Feb 2016 10:28:39 -0500
Received: from parrot.pmhahn.de ([88.198.50.102]:57047 "EHLO parrot.pmhahn.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754574AbcBEP2d (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 5 Feb 2016 10:28:33 -0500
Subject: Re: Bug 4.1.16: self-detected stall in net/unix/?
To: Hannes Frederic Sowa <hannes@stressinduktion.org>,
        Sasha Levin <sasha.levin@oracle.com>,
        Rainer Weikusat <rweikusat@mobileactivedefense.com>,
        Andrey Vagin <avagin@openvz.org>, Aaron Conole <aconole@bytheb.org>,
        "David S. Miller" <davem@davemloft.net>, linux-kernel@vger.kernel.org,
        Eric Dumazet <edumazet@google.com>
References: <56B0D88A.1020609@pmhahn.de>
 <56B15B32.1020701@stressinduktion.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
From: Philipp Hahn <pmhahn@pmhahn.de>
X-Enigmail-Draft-Status: N1110
Message-ID: <56B4BF9D.9070609@pmhahn.de>
Date: Fri, 5 Feb 2016 16:28:29 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Icedove/38.5.0
MIME-Version: 1.0
In-Reply-To: <56B15B32.1020701@stressinduktion.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello Hannes,

thank you for your previous reply.

Am 03.02.2016 um 02:43 schrieb Hannes Frederic Sowa:
> On 02.02.2016 17:25, Philipp Hahn wrote:
>> we recently updated our kernel to 4.1.16 + patch for "unix: properly
>> account for FDs passed over unix sockets" and have since then
>> self-detected stalls triggered by the Samba daemon:
...
>> We have not yet been able to reproduce the hang, but going back to our
>> previous kernel 4.1.12 makes the problem go away.
> 
> Can you remove the patch "unix: properly account for FDs passed over
> unix sockets" and see if the problem still happens?

I will try.
The problem is that I can't trigger the bug reliably. It always happens
to "smbd", but I don't know the triggering condition.

> I couldn't quickly see any problems with your added patch. I currently
> suspect a tight loop because of a SOCK_DEAD flag set but the socket not
> removed from unix_socket_table or the vfs. Hmmm...

All occurrences of the bug show _raw_spin_lock() as the sleeping
function, never anything other thus far.
I have seen the bug both on amd64 and i386.

If it would spin in
> 1097 restart:
> 1098 »···»···other = unix_find_other(net, sunaddr, alen, sock->type, hash, &err);
> 1099 »···»···if (!other)
> 1100 »···»···»···goto out;
> 1101 
> 1102 »···»···unix_state_double_lock(sk, other);
> 1103 
> 1104 »···»···/* Apparently VFS overslept socket death. Retry. */
> 1105 »···»···if (sock_flag(other, SOCK_DEAD)) {
> 1106 »···»···»···unix_state_double_unlock(sk, other);
> 1107 »···»···»···sock_put(other);
> 1108 »···»···»···goto restart;
> 1109 »···»···}

I would expect some other function to show up in the trace, but it's
always the call chain "unix_dgram_connect -> unix_state_double_lock ->
_raw_spin_lock".

The only case I see is that unix_find_other() returns the same "other"
again and again.

Q: How will that dead "other" be garbage collected finally? The kernel is
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set


During my hunt I found the following difference between 4.4 and 4.1.16
in net/unix/af_unix.c:

> commit 1586a5877db9eee313379738d6581bc7c6ffb5e3
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri Oct 23 10:59:16 2015 -0700
> 
>     af_unix: do not report POLLOUT on listeners
>     
>     poll(POLLOUT) on a listener should not report fd is ready for
>     a write().
>     
>     This would break some applications using poll() and pfd.events = -1,
>     as they would not block in poll()
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Reported-by: Alan Burlison <Alan.Burlison@oracle.com>
>     Tested-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
...
> -static inline int unix_writable(struct sock *sk)
> +static int unix_writable(const struct sock *sk)
>  {
> -       return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
> +       return sk->sk_state != TCP_LISTEN &&
> +              (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
>  }

Q: That *TCP*_LISTEN in net/*unix*/ looks strange to my eyes, but I'm
not a kernel guru (yet).

There is one hunk in 5c77e26862ce604edea05b3442ed765e9756fe0f, which
uses the result of that function, which might explain the difference,
why it shows with 4.1.16 but not with 4.4?

> @@ -2245,14 +2388,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
>         writable = unix_writable(sk);
> -       other = unix_peer_get(sk);
> -       if (other) {
> -               if (unix_peer(other) != sk) {
> -                       sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
> -                       if (unix_recvq_full(other))
> -                               writable = 0;
> -               }
> -               sock_put(other);
> +       if (writable) {
> +               unix_state_lock(sk);
> +
> +               other = unix_peer(sk);
> +               if (other && unix_peer(other) != sk &&
> +                   unix_recvq_full(other) &&
> +                   unix_dgram_peer_wake_me(sk, other))
> +                       writable = 0;
> +
> +               unix_state_unlock(sk);
>         }
>         if (writable)


I will for now build a new kernel with
> $ git log --oneline  v4.1.12..v4.1.17 -- net/unix
> dc6b0ec unix: properly account for FDs passed over unix sockets
> cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code
> 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue
reverted to see if it still happens. The "middle" patch seems harmless,
as it only changes a code path for STREAMS, while the bug triggers with
DGRAMS only.

> The stack trace is rather unreliable, maybe something completely
> different happend. Do you happend to see better reports?

So far they look all the same.
Anything more I can do to prepare for collection better information next
time I get that bug?

Thanks again.

Philipp