netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: "David S. Miller" <davem@redhat.com>
Cc: vatsa@in.ibm.com, netdev@oss.sgi.com,
	linux-kernel@vger.kernel.org, dipankar@in.ibm.com
Subject: Re: [RFC] Use RCU for tcp_ehash lookup
Date: Thu, 2 Sep 2004 09:31:50 -0700	[thread overview]
Message-ID: <20040902163149.GB1258@us.ibm.com> (raw)
In-Reply-To: <20040901224108.3b2d692d.davem@redhat.com>

On Wed, Sep 01, 2004 at 10:41:08PM -0700, David S. Miller wrote:
> On Tue, 31 Aug 2004 18:29:41 +0530
> Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:
> > - Biggest problem I had converting over to RCU was the refcount race between
> >   sock_put and sock_hold. sock_put might see the refcount go to zero and decide
> >   to free the object, while on some other CPU, sock_get's are pending against
> >   the same object. The patch handles the race by deciding to free the object
> >   only from the RCU callback.
> 
> That's exactly what I was concerned about when I saw that you had attempted
> this change.  It is incredibly important for state changes and updates to
> be seen as atomic by the packet input processing engine.  It would be illegal
> for a cpu running TCP input to see a socket in two tables at the same time
> (for example, in the main established area and in the second half for TIME_WAIT
> buckets).
> 
> If the visibility of the socket is wrong, sockets could be erroneously
> be reset during the transition from established to TIME_WAIT state.
> Beware!

If the usages is too write-intensive, then RCU will certainly be less
likely to work well.  But there is nothing quite like actually trying
it to see how it works.  ;-)

That aside, it -is- possible to make such state changes appear atomic,
even when moving elements from one list to another.  One way of doing
this is to atomically replace the element with a "tombstone" element.
Normal pointer writes suffice.  The "tombstone" is set up so that searches
for the outgoing element will stall (e.g., spin or sleep, depending
on the environment).  The element is moved to its destination list.
At this point, searches for the element in the old list will still
stall, while searches for the element in the new list will succeed.
The tombstone is now marked so that CPUs stall on it now resume, but
indicating failure to find the element in the old list.

Of course, this approach makes writes more expensive than they otherwise
would be, so, again, RCU is best for read-intensive uses.  ;-)

The fact that this data structure is not very read-intensive is due
to the fact that short-lived TCP connections are quite common, right?
Or am I missing the finer points of this data structure's workings?

						Thanx, Paul

      parent reply	other threads:[~2004-09-02 16:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-31 12:59 [RFC] Use RCU for tcp_ehash lookup Srivatsa Vaddagiri
2004-08-31 13:04 ` Srivatsa Vaddagiri
2004-08-31 13:12 ` Srivatsa Vaddagiri
2004-08-31 13:54 ` Andi Kleen
2004-09-01 11:36   ` Srivatsa Vaddagiri
2004-09-02  5:45     ` David S. Miller
2004-09-02 21:19     ` Andi Kleen
2004-09-02  5:43   ` David S. Miller
2004-09-02  5:41 ` David S. Miller
2004-09-02 14:04   ` Srivatsa Vaddagiri
2004-09-02 16:31   ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040902163149.GB1258@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=davem@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@oss.sgi.com \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).