All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Hahn <pmhahn@pmhahn.de>
To: Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	Rainer Weikusat <rweikusat@mobileactivedefense.com>,
	Andrey Vagin <avagin@openvz.org>,
	Aaron Conole <aconole@bytheb.org>,
	"David S. Miller" <davem@davemloft.net>,
	linux-kernel@vger.kernel.org, Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: Bug 4.1.16: self-detected stall in net/unix/?
Date: Fri, 5 Feb 2016 16:28:29 +0100	[thread overview]
Message-ID: <56B4BF9D.9070609@pmhahn.de> (raw)
In-Reply-To: <56B15B32.1020701@stressinduktion.org>

Hello Hannes,

thank you for your previous reply.

Am 03.02.2016 um 02:43 schrieb Hannes Frederic Sowa:
> On 02.02.2016 17:25, Philipp Hahn wrote:
>> we recently updated our kernel to 4.1.16 + patch for "unix: properly
>> account for FDs passed over unix sockets" and have since then
>> self-detected stalls triggered by the Samba daemon:
...
>> We have not yet been able to reproduce the hang, but going back to our
>> previous kernel 4.1.12 makes the problem go away.
> 
> Can you remove the patch "unix: properly account for FDs passed over
> unix sockets" and see if the problem still happens?

I will try.
The problem is that I can't trigger the bug reliably. It always happens
to "smbd", but I don't know the triggering condition.

> I couldn't quickly see any problems with your added patch. I currently
> suspect a tight loop because of a SOCK_DEAD flag set but the socket not
> removed from unix_socket_table or the vfs. Hmmm...

All occurrences of the bug show _raw_spin_lock() as the sleeping
function, never anything other thus far.
I have seen the bug both on amd64 and i386.

If it would spin in
> 1097 restart:
> 1098 »···»···other = unix_find_other(net, sunaddr, alen, sock->type, hash, &err);
> 1099 »···»···if (!other)
> 1100 »···»···»···goto out;
> 1101 
> 1102 »···»···unix_state_double_lock(sk, other);
> 1103 
> 1104 »···»···/* Apparently VFS overslept socket death. Retry. */
> 1105 »···»···if (sock_flag(other, SOCK_DEAD)) {
> 1106 »···»···»···unix_state_double_unlock(sk, other);
> 1107 »···»···»···sock_put(other);
> 1108 »···»···»···goto restart;
> 1109 »···»···}

I would expect some other function to show up in the trace, but it's
always the call chain "unix_dgram_connect -> unix_state_double_lock ->
_raw_spin_lock".

The only case I see is that unix_find_other() returns the same "other"
again and again.

Q: How will that dead "other" be garbage collected finally? The kernel is
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set


During my hunt I found the following difference between 4.4 and 4.1.16
in net/unix/af_unix.c:

> commit 1586a5877db9eee313379738d6581bc7c6ffb5e3
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri Oct 23 10:59:16 2015 -0700
> 
>     af_unix: do not report POLLOUT on listeners
>     
>     poll(POLLOUT) on a listener should not report fd is ready for
>     a write().
>     
>     This would break some applications using poll() and pfd.events = -1,
>     as they would not block in poll()
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Reported-by: Alan Burlison <Alan.Burlison@oracle.com>
>     Tested-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
...
> -static inline int unix_writable(struct sock *sk)
> +static int unix_writable(const struct sock *sk)
>  {
> -       return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
> +       return sk->sk_state != TCP_LISTEN &&
> +              (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
>  }

Q: That *TCP*_LISTEN in net/*unix*/ looks strange to my eyes, but I'm
not a kernel guru (yet).

There is one hunk in 5c77e26862ce604edea05b3442ed765e9756fe0f, which
uses the result of that function, which might explain the difference,
why it shows with 4.1.16 but not with 4.4?

> @@ -2245,14 +2388,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
>         writable = unix_writable(sk);
> -       other = unix_peer_get(sk);
> -       if (other) {
> -               if (unix_peer(other) != sk) {
> -                       sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
> -                       if (unix_recvq_full(other))
> -                               writable = 0;
> -               }
> -               sock_put(other);
> +       if (writable) {
> +               unix_state_lock(sk);
> +
> +               other = unix_peer(sk);
> +               if (other && unix_peer(other) != sk &&
> +                   unix_recvq_full(other) &&
> +                   unix_dgram_peer_wake_me(sk, other))
> +                       writable = 0;
> +
> +               unix_state_unlock(sk);
>         }
>         if (writable)


I will for now build a new kernel with
> $ git log --oneline  v4.1.12..v4.1.17 -- net/unix
> dc6b0ec unix: properly account for FDs passed over unix sockets
> cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code
> 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue
reverted to see if it still happens. The "middle" patch seems harmless,
as it only changes a code path for STREAMS, while the bug triggers with
DGRAMS only.

> The stack trace is rather unreliable, maybe something completely
> different happend. Do you happend to see better reports?

So far they look all the same.
Anything more I can do to prepare for collection better information next
time I get that bug?

Thanks again.

Philipp

  reply	other threads:[~2016-02-05 15:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-02 16:25 Bug 4.1.16: self-detected stall in net/unix/? Philipp Hahn
2016-02-03  1:43 ` Hannes Frederic Sowa
2016-02-05 15:28   ` Philipp Hahn [this message]
2016-02-11 13:47     ` Philipp Hahn
2016-02-11 15:55       ` Rainer Weikusat
2016-02-11 17:03         ` Ben Hutchings
2016-02-11 17:40           ` Rainer Weikusat
2016-02-11 17:54             ` Rainer Weikusat
2016-02-11 18:31             ` Rainer Weikusat
2016-02-11 19:37               ` [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg Rainer Weikusat
2016-02-12  9:19                 ` Philipp Hahn
2016-02-12 13:25                   ` Rainer Weikusat
2016-02-12 19:54                     ` Ben Hutchings
2016-02-12 20:17                       ` Rainer Weikusat
2016-02-12 20:47                         ` Ben Hutchings
2016-02-12 20:59                           ` Rainer Weikusat
2016-02-16 17:54                 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B4BF9D.9070609@pmhahn.de \
    --to=pmhahn@pmhahn.de \
    --cc=aconole@bytheb.org \
    --cc=avagin@openvz.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@stressinduktion.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rweikusat@mobileactivedefense.com \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.