Re: [Lksctp-developers] Is this a possible race condition.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vlad Yasevich <vladislav.yasevich@hp.com>
To: linux-sctp@vger.kernel.org
Subject: Re: [Lksctp-developers] Is this a possible race condition.
Date: Mon, 21 Apr 2008 19:41:33 +0000	[thread overview]
Message-ID: <480CEDED.3080608@hp.com> (raw)
In-Reply-To: <480CB247.1090902@hp.com>

Neil Horman wrote:
>>
> I believe you.  I completely agree with your observation that there is a small
> window during which our two accounting values are mismatches, and as a result we
> might call schedule_timeout without actually having to.  My assertion (which may
> well be wrong) is that the way the waitqueue api is constructed makes that
> irrelevant (or at least it should).
> 
>> The only difference between the 2 is that second accounting done by
>> sctp_packet_transmit().  It changes sk_wmem_alloc, but not assoc.sndbuf_used.
>> Thus in one case, there is an extra change against the send buffer that
>> causes us to exceed it temporarily and thus get scheduled out.  Since we
>> never wake up, I am now starting to think that the wake-up code is
>> called while the application is still running.  So the race is between
>> a wake-up and a call to schedule_timeout.
>>
> Yes, I think this is exactly what you described above, that we enter
> sctp_wait_for_sndbuf because we detect that we are out of sendbuffer space, and
> that this happens in parallel with a call to sctp_write_space, and its use of
> wake_up[_interruptible].  My assertion is that that should be, generally
> speaking, ok.  The pthread library has a very simmilar problem in userspace,
> in which they do in fact use a mutex to protect against this exact race (see
> pthread_cond_wait and pthread_cond_signal).  The kernel however, has no
> mechanism to atomically manipulate a spinlock or mutex, and remove oneself from
> the scheduling queue).  So in the kernel we use queues.  My assertion (or
> perhaps assumption), is that prepare_to_wait_exclusive (which is called from
> sctp_wait_for_sndbuf) adds the task in question to the wait queue, and sets its
> state to TASK_UNITERRUPTIBLE, or some other 'sleeping' state.  At this point the
> wait mechanism is 'armed' for lack of a better term.  Even if the call to
> wake_up happens between this point and the call to schedule[_timeout], all that
> will (or should) happen is that you will be removed from the wait_queue that it
> was added to, its task state will be reset to TASK_RUNNING, and it will be
> rescheduled, without needing to wait for the timeout.  Thats why the call to
> prepare_to_wait_exclusive is made in sctp_wait_for_sndbuf prior to us checking
> association state, write space, etc.  Its safe to do there because, even if any
> of those data values are changed right after we read them, the schedule_timeout
> will effectively just fall through, and we'll break out of the loop on the next
> iteration.  Thats why I suggested that you experiment with setting the sndtimeo
> to a low value, that would break you out of the loop, and confirm thats where
> you're deadlocking, and why.

Will try it.

> 
> So In summary, I'm saying that I completely agree with you that the way we
> charge our skbs in the transmit path can lead to us calling wake_up on a given
> associations wait_queue.  I just don't think that doing so is a problem, because
> of the way the kernels wait_queues guard against the possiblility of a given
> task scheduling itself away forever.  Please take a look and correct me if I'm
> wrong, but thats how I read the wake up code.

Ok,  here is a possible race:

CPU1                                                CPU2
prepare_to_wait_exclusive()
    wait->flags |= WQ_FLAG_EXCLUSIVE;
    spin_lock_irqsave(&q->lock, flags);
    if (list_empty(&wait->task_list))
        __add_wait_queue_tail(q, wait);
	 					   __sctp_write_space()
                                                        waitqueue_active()
                                                        wake_up_interruptible();

    set_current_state().
                                                   

This one should be possible.  The waitqueue_active() checks the 'q' list without
a lock.  So, it's possible for it to succeed and attempt wake-up before the process
state has changed.

> 
> That all being said, I wonder if we're not seeing a race in which the return
> value of sctp_wspace toggles to a value that might cause a wake_up, and then
> back to a value that causes a subsequent sleep.  You mentioned that your test
> application runs on each interface of the system.  Is it multithreaded, or does
> it form multiple processes and share a socket by any chance?  What if the
> following were to happen:

There is no socket sharing.  It's a stand alone app that just gets run multiple times.

-vlad

next prev parent reply	other threads:[~2008-04-21 19:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-21 15:27 [Lksctp-developers] Is this a possible race condition Vlad Yasevich
2008-04-21 18:12 ` Neil Horman
2008-04-21 19:41 ` Vlad Yasevich [this message]
2008-04-21 21:38 ` David Miller
2008-04-22 11:07 ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=480CEDED.3080608@hp.com \
    --to=vladislav.yasevich@hp.com \
    --cc=linux-sctp@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.