From: Michal Kubecek <mkubecek@suse.cz>
To: Mathias Krause <minipli@googlemail.com>
Cc: Jason Baron <jbaron@akamai.com>,
netdev@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Eric Wong <normalperson@yhbt.net>,
Eric Dumazet <eric.dumazet@gmail.com>,
Rainer Weikusat <rweikusat@mobileactivedefense.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Davide Libenzi <davidel@xmailserver.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Olivier Mauras <olivier@mauras.ch>,
PaX Team <pageexec@freemail.hu>,
Linus Torvalds <torvalds@linux-foundation.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
Date: Wed, 30 Sep 2015 09:34:10 +0200 [thread overview]
Message-ID: <20150930073410.GA7339@unicorn.suse.cz> (raw)
In-Reply-To: <CA+rthh9XU+5oN9Ua-+8XPF9Q0y03FZyxds53sxUjsvsJN2kaJA@mail.gmail.com>
On Wed, Sep 30, 2015 at 07:54:29AM +0200, Mathias Krause wrote:
> On 29 September 2015 at 21:09, Jason Baron <jbaron@akamai.com> wrote:
> > However, if we call connect on socket 's', to connect to a new socket 'o2', we
> > drop the reference on the original socket 'o'. Thus, we can now close socket
> > 'o' without unregistering from epoll. Then, when we either close the ep
> > or unregister 'o', we end up with this list corruption. Thus, this is not a
> > race per se, but can be triggered sequentially.
>
> Sounds profound, but the reproducers calls connect only once per
> socket. So there is no "connect to a new socket", no?
I believe there is another scenario: 'o' becomes SOCK_DEAD while 's' is
still connected to it. This is detected by 's' in unix_dgram_sendmsg()
so that 's' releases its reference on 'o' and 'o' can be freed. If this
happens before 's' is unregistered, we get use-after-free as 'o' has
never been unregistered. And as the interval between freeing 'o' and
unregistering 's' can be quite long, there is a chance for the memory to
be reused. This is what one of our customers has seen:
[exception RIP: _raw_spin_lock_irqsave+156]
RIP: ffffffff8040f5bc RSP: ffff8800e929de78 RFLAGS: 00010082
RAX: 000000000000a32c RBX: ffff88003954ab80 RCX: 0000000000001000
RDX: 00000000f2320000 RSI: 000000000000f232 RDI: ffff88003954ab80
RBP: 0000000000005220 R8: dead000000100100 R9: 0000000000000000
R10: 00007fff1a284960 R11: 0000000000000246 R12: 0000000000000000
R13: ffff8800e929de8c R14: 000000000000000e R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 10000e030 SS: e02b
#8 [ffff8800e929de70] _raw_spin_lock_irqsave at ffffffff8040f5a9
#9 [ffff8800e929deb0] remove_wait_queue at ffffffff8006ad09
#10 [ffff8800e929ded0] ep_unregister_pollwait at ffffffff80170043
#11 [ffff8800e929def0] ep_remove at ffffffff80170073
#12 [ffff8800e929df10] sys_epoll_ctl at ffffffff80171453
#13 [ffff8800e929df80] system_call_fastpath at ffffffff80417553
In this case, crash happened on unregistering 's' which had null peer
(i.e. not reconnected but rather disconnected) but there were still two
items in the list, the other pointing to an unallocated page which has
apparently been modified in between.
IMHO unix_dgram_disonnected() could be the place to handle this issue:
it is called from both places where we disconnect from a peer (dead peer
detection in unix_dgram_sendmsg() and reconnect in unix_dgram_connect())
just before the reference to peer is released. I'm not familiar with the
epoll implementation so I'm still trying to find what exactly needs to
be done to unregister the peer at this moment.
> That bug triggers since commit 3c73419c09 "af_unix: fix 'poll for
> write'/ connected DGRAM sockets". That's v2.6.26-rc7, as noted in the
> reproducer.
Sounds likely as this is the commit that introduced unix_dgram_poll()
with the code which adds the "asymmetric peer" to monitor its queue
state. More precisely, the asymmetricity check has been added by
ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
shortly after that.
Michal Kubecek
next prev parent reply other threads:[~2015-09-30 7:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-13 19:53 List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket Mathias Krause
2015-09-14 2:39 ` Eric Wong
2015-09-29 18:09 ` Mathias Krause
2015-09-29 19:09 ` Jason Baron
2015-09-30 5:54 ` Mathias Krause
2015-09-30 7:34 ` Michal Kubecek [this message]
2015-10-01 2:55 ` Jason Baron
2015-09-30 10:56 ` Rainer Weikusat
2015-09-30 11:55 ` Mathias Krause
2015-09-30 13:25 ` Rainer Weikusat
2015-09-30 13:38 ` Mathias Krause
2015-09-30 13:51 ` Rainer Weikusat
2015-10-01 2:39 ` Jason Baron
2015-10-01 10:33 ` Rainer Weikusat
2015-10-01 12:10 ` Rainer Weikusat
2015-10-01 12:58 ` Rainer Weikusat
2015-09-15 17:07 ` Rainer Weikusat
2015-09-15 18:15 ` Mathias Krause
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150930073410.GA7339@unicorn.suse.cz \
--to=mkubecek@suse.cz \
--cc=dave@stgolabs.net \
--cc=davem@davemloft.net \
--cc=davidel@xmailserver.org \
--cc=eric.dumazet@gmail.com \
--cc=jbaron@akamai.com \
--cc=linux-kernel@vger.kernel.org \
--cc=minipli@googlemail.com \
--cc=netdev@vger.kernel.org \
--cc=normalperson@yhbt.net \
--cc=olivier@mauras.ch \
--cc=pageexec@freemail.hu \
--cc=peterz@infradead.org \
--cc=rweikusat@mobileactivedefense.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox