netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Grover <andy.grover@gmail.com>
To: Evgeniy Polyakov <zbr@ioremap.net>
Cc: netdev@vger.kernel.org, rdreier@cisco.com,
	rds-devel@oss.oracle.com, general@lists.openfabrics.org
Subject: [ofa-general] Re: [PATCH 01/21] RDS: Socket interface
Date: Wed, 28 Jan 2009 20:02:49 -0800	[thread overview]
Message-ID: <c0a09e5c0901282002k3e1083ebifb7fb31d15d93b8e@mail.gmail.com> (raw)
In-Reply-To: <20090127120840.GC2646@ioremap.net>

On Tue, Jan 27, 2009 at 4:08 AM, Evgeniy Polyakov <zbr@ioremap.net> wrote:
> Hi Andy.

Hi Evgeniy thanks for your time in reviewing.

>> +/* this is just used for stats gathering :/ */
>
> Shouldn't this be some kind of per-cpu data?
> Global list of all sockets? This does not scale, maybe it should be
> groupped into hash table or be per-device?

sch mentioned this too... is socket creation often a bottleneck? If so
we can certainly improve scalability here.

In any case, this is in the code to support a listing of RDS sockets
via the rds-info utility. Instead of having our own custom program to
list rds sockets we probably want to export an interface so netstat
will list them. Unfortunately netstat seems to be hardcoded to look
for particular entries in /proc/net, so both rds and netstat would
need to be updated before this would work, and RDS's custom
socket-listing interface dropped.

>> +static int rds_release(struct socket *sock)
>> +{
>> +     struct sock *sk = sock->sk;
>> +     struct rds_sock *rs;
>> +     unsigned long flags;
>> +
>> +     if (sk == NULL)
>> +             goto out;
>> +
>> +     rs = rds_sk_to_rs(sk);
>> +
>> +     sock_orphan(sk);
>
> Why is it needed getting socket is about to be freed?

from the comments above that code:

 * We have to be careful about racing with the incoming path.  sock_orphan()
 * sets SOCK_DEAD and we use that as an indicator to the rx path that new
 * messages shouldn't be queued.

Is that an appropriate usage of sock_orphan()?

> Does RDS sockets work with high number of creation/destruction
> workloads?

I'd guess from your comments that performance probably wouldn't be great :)

>> +static unsigned int rds_poll(struct file *file, struct socket *sock,
>> +                          poll_table *wait)
>> +{
>> +     struct sock *sk = sock->sk;
>> +     struct rds_sock *rs = rds_sk_to_rs(sk);
>> +     unsigned int mask = 0;
>> +     unsigned long flags;
>> +
>> +     poll_wait(file, sk->sk_sleep, wait);
>> +
>> +     poll_wait(file, &rds_poll_waitq, wait);
>> +
>
> Are you absolutely sure that provided poll_table callback
> will not do the bad things here? It is quite unusual to add several
> different queues into the same head in the poll callback.
> And shouldn't rds_poll_waitq be lock protected here?

I don't know. I looked into the poll_wait code a little and it
appeared to be designed to allow multiple.

I'm not very strong in this area and would love some more expert input here.

>> +     read_lock_irqsave(&rs->rs_recv_lock, flags);
>> +     if (!rs->rs_cong_monitor) {
>> +             /* When a congestion map was updated, we signal POLLIN for
>> +              * "historical" reasons. Applications can also poll for
>> +              * WRBAND instead. */
>> +             if (rds_cong_updated_since(&rs->rs_cong_track))
>> +                     mask |= (POLLIN | POLLRDNORM | POLLWRBAND);
>> +     } else {
>> +             spin_lock(&rs->rs_lock);
>
> Is there a possibility to have lock iteraction problem with above
> rs_recv_lock read lock?

I didn't see anywhere where they were being acquired in reverse order,
or simultaneously. This is the kind of thing that lockdep would find
immediately, right? I think I've got that turned on but I'll double
check.

>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 24)
>
> This should be dropped in the mainline tree.

yup.

> Hash table with the appropriate size will have faster lookup/access
> times btw.

No doubt. Definitely want to make this improvement at some point.

>> +static struct rds_sock *rds_bind_tree_walk(__be32 addr, __be16 port,
>> +                                        struct rds_sock *insert)
>> +{
>> +     struct rb_node **p = &rds_bind_tree.rb_node;
>> +     struct rb_node *parent = NULL;
>> +     struct rds_sock *rs;
>> +     u64 cmp;
>> +     u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
>> +
>> +     while (*p) {
>> +             parent = *p;
>> +             rs = rb_entry(parent, struct rds_sock, rs_bound_node);
>> +
>> +             cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
>> +                   be16_to_cpu(rs->rs_bound_port);
>> +
>> +             if (needle < cmp)
>
> Should it use wrapping logic if some field overflows?

Sorry, please explain?

>> +     rdsdebug("returning rs %p for %u.%u.%u.%u:%u\n", rs, NIPQUAD(addr),
>> +             ntohs(port));
>
> Iirc there is a new %pi4 or similar format id.

Yup, will do.

Thanks again.

Regards -- Andy

  reply	other threads:[~2009-01-29  4:02 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-27  2:17 [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Andy Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 01/21] RDS: Socket interface Andy Grover
2009-01-27  3:46   ` Stephen Hemminger
2009-01-29  3:17     ` [ofa-general] " Andrew Grover
2009-01-27  4:11   ` David Miller
2009-01-29 20:22     ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 12:08   ` Evgeniy Polyakov
2009-01-29  4:02     ` Andrew Grover [this message]
2009-01-29 16:24       ` [ofa-general] " Evgeniy Polyakov
2009-01-27  2:17 ` [ofa-general] [PATCH 02/21] RDS: Main header file Andy Grover
2009-01-27  7:34   ` Rémi Denis-Courmont
2009-01-27 19:27     ` [ofa-general] " Andrew Grover
2009-01-27 13:05   ` Evgeniy Polyakov
2009-01-27 19:23     ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 19:24       ` Steve Wise
2009-01-27  2:17 ` [PATCH 03/21] RDS: Congestion-handling code Andy Grover
2009-01-27  3:48   ` Stephen Hemminger
2009-01-27 19:15     ` Andrew Grover
2009-01-27 13:10   ` Evgeniy Polyakov
2009-01-27 19:10     ` Andrew Grover
2009-01-28 22:57   ` Roland Dreier
2009-01-29  2:39     ` [ofa-general] " Andy Grover
2009-01-27  2:17 ` [PATCH 04/21] RDS: Transport code Andy Grover
2009-01-27 13:18   ` Evgeniy Polyakov
2009-01-27 19:36     ` Andrew Grover
2009-01-27 21:56       ` [ofa-general] " Evgeniy Polyakov
2009-01-27 22:15         ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 05/21] RDS: Info and stats Andy Grover
2009-01-27 13:28   ` Evgeniy Polyakov
2009-01-27  2:17 ` [PATCH 06/21] RDS: Connection handling Andy Grover
2009-01-27 13:34   ` Evgeniy Polyakov
2009-01-27 13:47     ` Oliver Neukum
2009-01-27 13:51       ` Evgeniy Polyakov
2009-01-27 16:28       ` [ofa-general] " Steve Wise
2009-01-29  3:03         ` ***SPAM*** " Andrew Grover
2009-01-29  8:03           ` Evgeniy Polyakov
2009-01-27  2:17 ` [PATCH 07/21] RDS: loopback Andy Grover
2009-01-27  2:17 ` [PATCH 08/21] RDS: sysctls Andy Grover
2009-01-27  2:17 ` [PATCH 09/21] RDS: Message parsing Andy Grover
2009-01-27  2:17 ` [PATCH 10/21] RDS: send.c Andy Grover
2009-01-27  2:17 ` [PATCH 11/21] RDS: recv.c Andy Grover
2009-01-27  2:17 ` [PATCH 12/21] RDS: RDMA support Andy Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 13/21] RDS/IB: Infiniband transport Andy Grover
2009-01-27  2:17 ` [PATCH 14/21] RDS/IB: Ring-handling code Andy Grover
2009-01-27  2:17 ` [PATCH 15/21] RDS/IB: Implement RDMA ops using FMRs Andy Grover
2009-01-27  2:17 ` [PATCH 16/21] RDS/IB: Implement IB-specific datagram send Andy Grover
2009-01-27  2:17 ` [PATCH 17/21] RDS/IB: Receive datagrams via IB Andy Grover
2009-01-29  0:05   ` [ofa-general] " Roland Dreier
2009-01-29  2:20     ` Andy Grover
2009-01-29 21:02       ` Olaf Kirch
2009-01-29 21:47         ` [ofa-general] " Roland Dreier
2009-01-27  2:17 ` [PATCH 18/21] RDS/IB: Stats and sysctls Andy Grover
2009-01-27  2:17 ` [PATCH 19/21] RDS: Documentation Andy Grover
2009-01-27  2:17 ` [PATCH 20/21] RDS: Kconfig and Makefile Andy Grover
2009-01-28 22:59   ` Roland Dreier
2009-01-29  2:19     ` [ofa-general] " Andy Grover
2009-01-29  5:14       ` Roland Dreier
2009-01-27  2:17 ` [PATCH 21/21] RDS: Add AF and PF #defines for RDS sockets Andy Grover
2009-01-27  7:27   ` Rémi Denis-Courmont
2009-01-27 19:31     ` [ofa-general] " Andrew Grover
2009-01-27 15:34 ` [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Steve Wise
2009-01-27 19:29   ` ***SPAM*** " Andrew Grover
2009-01-28 22:37 ` Roland Dreier
2009-01-29  1:29   ` [ofa-general] " Andy Grover

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0a09e5c0901282002k3e1083ebifb7fb31d15d93b8e@mail.gmail.com \
    --to=andy.grover@gmail.com \
    --cc=general@lists.openfabrics.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=rds-devel@oss.oracle.com \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).