Re: BUG: corrupted list in p9_read_work

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dominique Martinet <asmadeus@codewreck.org>
To: syzbot <syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com>
Cc: davem@davemloft.net, ericvh@gmail.com,
	linux-kernel@vger.kernel.org, lucho@ionkov.net,
	netdev@vger.kernel.org, rminnich@sandia.gov,
	syzkaller-bugs@googlegroups.com,
	v9fs-developer@lists.sourceforge.net
Subject: Re: BUG: corrupted list in p9_read_work
Date: Tue, 9 Oct 2018 04:09:49 +0200	[thread overview]
Message-ID: <20181009020949.GA29622@nautica> (raw)
In-Reply-To: <000000000000fddb150577c15af6@google.com>

syzbot wrote on Mon, Oct 08, 2018:
> syzbot has found a reproducer for the following crash on:
> 
> HEAD commit:    0854ba5ff5c9 Merge git://git.kernel.org/pub/scm/linux/kern..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1514ec06400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=88e9a8a39dc0be2d
> dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10b91685400000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com
> 
> list_del corruption, ffff88019ae36ee8->next is LIST_POISON1
> (dead000000000100)
> ------------[ cut here ]------------
> [...]
>  list_del include/linux/list.h:125 [inline]
>  p9_read_work+0xab6/0x10e0 net/9p/trans_fd.c:379

Hmm this looks very much like the report from
syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com 
which should have been fixed by Tomas in 9f476d7c540cb
("net/9p/trans_fd.c: fix race by holding the lock")...

It looks like another double list_del, looking at the code again there
actually are other ways this could happen around connection errors.
For example,
 - p9_read_work receives something and lookup works... meanwhile
 - p9_write_work fails to write and calls p9_conn_cancel, which deletes
from the req_list without waiting for other works to finish (could also
happen in p9_poll_mux)
 - p9_read_work finishes processing the read and deletes from list again

For this one the simplest fix would probably be to just not
list_del/call p9_client_cb at all if m->r?req->status isn't
REQ_STATUS_ERROR in p9_read_work after the "got new packet" debug print,
and frankly I think that's saner so I'll send a patch shortly doing
that, but I have zero confidence there aren't similar bugs around, the
tcp code is so messy... Most of the syzbot reports recently have been
around trans_fd which I don't think is used much in real life, and this
is not really motivating (i.e. I think it would probably need a more
extensive rewrite but nobody cares) :/

Dmitry, on that note, do you think syzbot could possibly test other
transports somehow? rdma or virtio cannot be faked as easily as passing
a fd around, but I'd be very interested in seeing these flayed a bit.

(I'm also curious what logic is used to generate the syz tests, the
write$P9_Rxx replies have nothing to do with what the client would
expect so it probably doesn't test very far; this test in particular
does not even get past the initial P9_TVERSION that the client would
expect immediately after mount, so it's basically only testing logic
around packet handling on error... Or if we're accepting a RREADDIR in
reply to TVERSION we have bigger problems, and now I'm looking at it I
think we just might never check that....... I'll look at that for the
next cycle)

Back to the current patch, since as I said I am not confident this is a
good enough fix for the current bug, will I get notified if the bug
happens again once the patch hits linux-next with the Reported-by tag ?
(I don't have the setup necessary to run a syz repro as there is no C
repro, and won't have much time to do that setup sorry)

> FS-Cache: N-cookie d=000000000a092700 n=00000000d8ee0022
> FS-Cache: N-key=[10] '34323935303034313132'
> FS-Cache: Duplicate cookie detected
> FS-Cache: O-cookie c=00000000911358e4 [p=000000006545c95d fl=222 nc=0 na=1]
> FS-Cache: O-cookie d=000000000a092700 n=000000007635356b
> FS-Cache: O-key=[10] '
> [...]

(on an unrelated topic, I got these FS-Cache warnings quite often when
testing with cache enabled and have no idea what they mean. I don't
normally use cache so haven't spent time looking at it, but I find these
rather worrying... If someone having a clue reads this, I'd love to hear
what they could mean and what we should look at)

Thanks,
-- 
Dominique

next prev parent reply	other threads:[~2018-10-09  2:10 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  5:59 BUG: corrupted list in p9_read_work syzbot
2018-10-09  1:07 ` syzbot
2018-10-09  2:09   ` Dominique Martinet [this message]
2018-10-09  4:05     ` [PATCH 1/2] 9p/trans_fd: abort p9_read_work if req status changed Dominique Martinet
2018-10-09  4:05       ` [PATCH 2/2] 9p/trans_fd: put worker reqs on destroy Dominique Martinet
2018-10-09 13:19         ` Tomas Bortoli
2018-10-15 10:46           ` Dominique Martinet
2018-10-10 14:03     ` BUG: corrupted list in p9_read_work Dmitry Vyukov
2018-10-10 14:40       ` Dominique Martinet
2018-10-10 14:51         ` Dmitry Vyukov
2018-10-10 15:58           ` Dominique Martinet
2018-10-11 12:33             ` Dmitry Vyukov
2018-10-11 13:10               ` Dominique Martinet
2018-10-11 13:27                 ` Dmitry Vyukov
2018-10-11 13:40                   ` Dmitry Vyukov
2018-10-11 14:28                     ` 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) Dominique Martinet
2018-10-12 14:42                       ` Dmitry Vyukov
2018-10-11 14:19                   ` Dominique Martinet
2018-10-12 14:50                     ` Dmitry Vyukov
2018-10-12 15:08                       ` Dominique Martinet
2018-11-17  8:46                         ` Dominique Martinet
2018-11-20 11:20                           ` Dmitry Vyukov
2018-11-20 11:28                             ` Dominique Martinet
2018-10-10 14:29     ` BUG: corrupted list in p9_read_work Dmitry Vyukov
2018-10-10 14:48       ` Dominique Martinet
2018-10-10 14:49         ` syzbot
2018-10-10 16:00           ` Dominique Martinet
2018-10-10 16:02             ` syzbot
2018-10-10 16:10             ` Dominique Martinet
2018-10-10 16:29               ` syzbot
2018-10-10 16:36               ` Dmitry Vyukov
2018-10-10 22:55                 ` Dominique Martinet
2018-10-10 14:42     ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181009020949.GA29622@nautica \
    --to=asmadeus@codewreck.org \
    --cc=davem@davemloft.net \
    --cc=ericvh@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=netdev@vger.kernel.org \
    --cc=rminnich@sandia.gov \
    --cc=syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=v9fs-developer@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.