Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jason Baron <jbaron@akamai.com>
To: Michal Kubecek <mkubecek@suse.cz>,
	Mathias Krause <minipli@googlemail.com>
Cc: netdev@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Eric Wong <normalperson@yhbt.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Rainer Weikusat <rweikusat@mobileactivedefense.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Davide Libenzi <davidel@xmailserver.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Olivier Mauras <olivier@mauras.ch>,
	PaX Team <pageexec@freemail.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
Date: Wed, 30 Sep 2015 22:55:59 -0400	[thread overview]
Message-ID: <560CA0BF.7090608@akamai.com> (raw)
In-Reply-To: <20150930073410.GA7339@unicorn.suse.cz>

On 09/30/2015 03:34 AM, Michal Kubecek wrote:
> On Wed, Sep 30, 2015 at 07:54:29AM +0200, Mathias Krause wrote:
>> On 29 September 2015 at 21:09, Jason Baron <jbaron@akamai.com> wrote:
>>> However, if we call connect on socket 's', to connect to a new socket 'o2', we
>>> drop the reference on the original socket 'o'. Thus, we can now close socket
>>> 'o' without unregistering from epoll. Then, when we either close the ep
>>> or unregister 'o', we end up with this list corruption. Thus, this is not a
>>> race per se, but can be triggered sequentially.
>>
>> Sounds profound, but the reproducers calls connect only once per
>> socket. So there is no "connect to a new socket", no?
> 
> I believe there is another scenario: 'o' becomes SOCK_DEAD while 's' is
> still connected to it. This is detected by 's' in unix_dgram_sendmsg()
> so that 's' releases its reference on 'o' and 'o' can be freed. If this
> happens before 's' is unregistered, we get use-after-free as 'o' has
> never been unregistered. And as the interval between freeing 'o' and
> unregistering 's' can be quite long, there is a chance for the memory to
> be reused. This is what one of our customers has seen:
> 
>     [exception RIP: _raw_spin_lock_irqsave+156]
>     RIP: ffffffff8040f5bc  RSP: ffff8800e929de78  RFLAGS: 00010082
>     RAX: 000000000000a32c  RBX: ffff88003954ab80  RCX: 0000000000001000
>     RDX: 00000000f2320000  RSI: 000000000000f232  RDI: ffff88003954ab80
>     RBP: 0000000000005220   R8: dead000000100100   R9: 0000000000000000
>     R10: 00007fff1a284960  R11: 0000000000000246  R12: 0000000000000000
>     R13: ffff8800e929de8c  R14: 000000000000000e  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 10000e030  SS: e02b
>  #8 [ffff8800e929de70] _raw_spin_lock_irqsave at ffffffff8040f5a9
>  #9 [ffff8800e929deb0] remove_wait_queue at ffffffff8006ad09
> #10 [ffff8800e929ded0] ep_unregister_pollwait at ffffffff80170043
> #11 [ffff8800e929def0] ep_remove at ffffffff80170073
> #12 [ffff8800e929df10] sys_epoll_ctl at ffffffff80171453
> #13 [ffff8800e929df80] system_call_fastpath at ffffffff80417553
> 
> In this case, crash happened on unregistering 's' which had null peer
> (i.e. not reconnected but rather disconnected) but there were still two
> items in the list, the other pointing to an unallocated page which has
> apparently been modified in between.
> 
> IMHO unix_dgram_disonnected() could be the place to handle this issue:
> it is called from both places where we disconnect from a peer (dead peer
> detection in unix_dgram_sendmsg() and reconnect in unix_dgram_connect())
> just before the reference to peer is released. I'm not familiar with the
> epoll implementation so I'm still trying to find what exactly needs to
> be done to unregister the peer at this moment.
> 

Indeed that is a path as well. The patch I posted here deals with that
case as well. It does a remove_wait_queue() in that case.

unix_dgram_disconnected() gets called as you point out when we are removing
the remote peer, and I have a remove_wait_queue() in that case. The patch
I posted converts us back to a polling() against a single wait queue.

The wait structure that epoll()/select()/poll() adds to the peer wait
queue is really opaque to the unix code. The normal pattern is for
epoll()/select()/poll() to do the unregister, not the socket/fd that
we are waiting on. Further we could not just release all of the wait
queues in unix_dgram_disconnected() b/c there could be multiple waiters
there. So the POLLFREE thing really has to be done from the
unix_sock_destructor() path, since it going to free the entire queue.

In addition, I think that removing the the wait queue from
unix_dgram_disconnected() will still be broken, b/c we would need to
re-add to the remote peer via subsequent poll(), to get events if the
socket was re-connected to a new peer.

Thanks,

-Jason

next prev parent reply	other threads:[~2015-10-01  2:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-13 19:53 List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket Mathias Krause
2015-09-14  2:39 ` Eric Wong
2015-09-29 18:09   ` Mathias Krause
2015-09-29 19:09     ` Jason Baron
2015-09-30  5:54       ` Mathias Krause
2015-09-30  7:34         ` Michal Kubecek
2015-10-01  2:55           ` Jason Baron [this message]
2015-09-30 10:56         ` Rainer Weikusat
2015-09-30 11:55           ` Mathias Krause
2015-09-30 13:25             ` Rainer Weikusat
2015-09-30 13:38               ` Mathias Krause
2015-09-30 13:51                 ` Rainer Weikusat
2015-10-01  2:39         ` Jason Baron
2015-10-01 10:33           ` Rainer Weikusat
2015-10-01 12:10             ` Rainer Weikusat
2015-10-01 12:58               ` Rainer Weikusat
2015-09-15 17:07 ` Rainer Weikusat
2015-09-15 18:15   ` Mathias Krause

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=560CA0BF.7090608@akamai.com \
    --to=jbaron@akamai.com \
    --cc=dave@stgolabs.net \
    --cc=davem@davemloft.net \
    --cc=davidel@xmailserver.org \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minipli@googlemail.com \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=normalperson@yhbt.net \
    --cc=olivier@mauras.ch \
    --cc=pageexec@freemail.hu \
    --cc=peterz@infradead.org \
    --cc=rweikusat@mobileactivedefense.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.