All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlad Yasevich <vladislav.yasevich@hp.com>
To: linux-sctp@vger.kernel.org
Subject: Re: BUG in sctp crashes sles10sp2 kernel
Date: Tue, 23 Dec 2008 19:23:11 +0000	[thread overview]
Message-ID: <49513A9F.7010707@hp.com> (raw)
In-Reply-To: <20081211145209.GB5236@dhcp35.suse.cz>

Vlad Yasevich wrote:
> 
> At this point, I am starting to think that this is a race, but I am not sure
> what/who is racing what/who.
> 
> When I add an SCTP level setsockopt call in the accept() loop of the server, I
> get 4+ hours of normal operation (I killed the test at this point).  It doesn't
> matter what the socket option does.  As a test, I used SCTP_DISABLE_FRAGMENTS
> with a value of 0 which is essentially a no-op with locks around it, and it worked.
> 
> The few crashes I've received on 2.6.28-rc6 seem to always eminate from
> the retransmission timeout.  After poking around the crash dump, I see
> the following:
> 
> crash> sctp_transport.packet 0xffff88013dd830d8
>   packet = {
>     source_port = 10003,
>     destination_port = 36107,
>     vtag = 4043516048,
>     chunk_list = {
>       next = 0xffff88013c395e80,
>       prev = 0xffff88013c395e80
>     },
>     ...
> crash> struct sctp_chunk 0xffff88013c395e80
> struct sctp_chunk {
>   list = {
>     next = 0xffff88013c395e80,
>     prev = 0xffff88013c395e80
>   },
>   refcnt = {
>     counter = 2
>   },
>   transmitted_list = {
>     next = 0xffff88013dd83228,
>     prev = 0xffff88013dd83228
>   },
> 
> 
> Note that the transmitted_list is good (it points back to the association).
> However, the list{} in the sctp chunk points to itself, while chunk_list in
> the packet also points to it.  This results an infinite iteration over the same
> chunk while trying to copy it into the transmission skb and triggers the skb
> overflow that we BUG() with.
> 
> I am going to see if I can poison the chunk->list from the start and see who
> dies.
> 
> -vlad
> 
> p.s.  the crash I am seeing is with locks added around packet->chunk_list
> manipulations.
> 

Ok, I was able to prove that there is a race condition accessing the packet and
the chunk_list.  The way to do this is by adding a "void *last_thread" to the
sctp_chunk structure and then using the following code:

in scpt_packet_init:
	packet->last_thread = NULL;

in sctp_packet_free:
	packet->last_thread = 0xdeadbeef; /* to catch errors */

in sctp_packet_append:

	spin_lock_bh(&packet->lock);
	if (packet->last_thread && packet->last_thread != current) {
		/* print warning with interesting info. I printed the packet */
		BUG();
	}
	packet->last_thread = current;
	...
	spin_unlock_bh(&packet->lock);

in sctp_packet_reset:
	/* after the loop to free chunks */
	packet->last_thread = NULL;


In my builds with 2.6.28-rc6 + my patches, it tripped the BUG in sctp_packet_append() with two
different threads accessing the same packet structure.  Looking in crash, sure enough, one CPU was
spinning on a lock in packet_reset, while the other CPU was holding the lock while adding a chunk.

As you can tell, you need locks enabled around the chunk list handling.

I can't do much more since my company has shut down for the holidays and I'll restart in the new
year.  Meanwhile, if you still have access to equipment, you can do some looking to see if you can
figure out the race.

-vlad

  parent reply	other threads:[~2008-12-23 19:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-11 14:52 BUG in sctp crashes sles10sp2 kernel Michal Hocko
2008-12-11 15:28 ` Vlad Yasevich
2008-12-12 13:04 ` Karsten Keil
2008-12-15 15:38 ` Vlad Yasevich
2008-12-15 17:02 ` Karsten Keil
2008-12-15 17:41 ` Vlad Yasevich
2008-12-15 17:42 ` Vlad Yasevich
2008-12-18 12:35 ` Karsten Keil
2008-12-18 17:30 ` Karsten Keil
2008-12-18 18:03 ` Vlad Yasevich
2008-12-18 23:01 ` Vlad Yasevich
2008-12-23 19:23 ` Vlad Yasevich [this message]
2009-01-05 23:05 ` Vlad Yasevich
2009-01-06 13:30 ` Michal Hocko
2009-01-06 13:50 ` Vlad Yasevich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49513A9F.7010707@hp.com \
    --to=vladislav.yasevich@hp.com \
    --cc=linux-sctp@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.