From: Evgeniy Polyakov <zbr@ioremap.net>
To: Andrew Grover <andy.grover@gmail.com>
Cc: netdev@vger.kernel.org, rdreier@cisco.com,
rds-devel@oss.oracle.com, general@lists.openfabrics.org
Subject: [ofa-general] Re: [PATCH 01/21] RDS: Socket interface
Date: Thu, 29 Jan 2009 19:24:25 +0300 [thread overview]
Message-ID: <20090129162425.GA31947@ioremap.net> (raw)
In-Reply-To: <c0a09e5c0901282002k3e1083ebifb7fb31d15d93b8e@mail.gmail.com>
Hi Andy.
On Wed, Jan 28, 2009 at 08:02:49PM -0800, Andrew Grover (andy.grover@gmail.com) wrote:
> Hi Evgeniy thanks for your time in reviewing.
No problem :)
> >> +/* this is just used for stats gathering :/ */
> >
> > Shouldn't this be some kind of per-cpu data?
> > Global list of all sockets? This does not scale, maybe it should be
> > groupped into hash table or be per-device?
>
> sch mentioned this too... is socket creation often a bottleneck? If so
> we can certainly improve scalability here.
It depends on the workload, but and it becomes a noticeble portion of
the overhead for multi-client short-living connections. Likely this
sockets will not be used for web-server like load, but something similar
will clearly show a bottleneck with this list.
> In any case, this is in the code to support a listing of RDS sockets
> via the rds-info utility. Instead of having our own custom program to
> list rds sockets we probably want to export an interface so netstat
> will list them. Unfortunately netstat seems to be hardcoded to look
> for particular entries in /proc/net, so both rds and netstat would
> need to be updated before this would work, and RDS's custom
> socket-listing interface dropped.
Other sockets use similar technique, but they are groupped into hash
table, so if you think that amount of socket will be noticebly large or
they will be frequently created and removed, it may worth pushing them
into the hash table.
> Is that an appropriate usage of sock_orphan()?
It is, I missed the process context socket detouch, things are correct.
> > Does RDS sockets work with high number of creation/destruction
> > workloads?
>
> I'd guess from your comments that performance probably wouldn't be great :)
Something tells me the same :)
> >> +static unsigned int rds_poll(struct file *file, struct socket *sock,
> >> + poll_table *wait)
> >> +{
> >> + struct sock *sk = sock->sk;
> >> + struct rds_sock *rs = rds_sk_to_rs(sk);
> >> + unsigned int mask = 0;
> >> + unsigned long flags;
> >> +
> >> + poll_wait(file, sk->sk_sleep, wait);
> >> +
> >> + poll_wait(file, &rds_poll_waitq, wait);
> >> +
> >
> > Are you absolutely sure that provided poll_table callback
> > will not do the bad things here? It is quite unusual to add several
> > different queues into the same head in the poll callback.
> > And shouldn't rds_poll_waitq be lock protected here?
>
> I don't know. I looked into the poll_wait code a little and it
> appeared to be designed to allow multiple.
>
> I'm not very strong in this area and would love some more expert input here.
It depends on how poll_table was initialized and how its callback
(invoked from the poll_wait()) operates with the given queue and head.
If you introduce own polling, some care has to be taken there for the
ordering of the wait queues and what their callbacks return when polling
even found.
For example with the own initialization it is possible that with
multiple queues are registered in the same table, only one of them
will be awakened (its callback invoked).
If you just hook into existing machinery things should be ok though, so
this is just something to pay attention and does not show a bug.
> >> + read_lock_irqsave(&rs->rs_recv_lock, flags);
> >> + if (!rs->rs_cong_monitor) {
> >> + /* When a congestion map was updated, we signal POLLIN for
> >> + * "historical" reasons. Applications can also poll for
> >> + * WRBAND instead. */
> >> + if (rds_cong_updated_since(&rs->rs_cong_track))
> >> + mask |= (POLLIN | POLLRDNORM | POLLWRBAND);
> >> + } else {
> >> + spin_lock(&rs->rs_lock);
> >
> > Is there a possibility to have lock iteraction problem with above
> > rs_recv_lock read lock?
>
> I didn't see anywhere where they were being acquired in reverse order,
> or simultaneously. This is the kind of thing that lockdep would find
> immediately, right? I think I've got that turned on but I'll double
> check.
If lockdep entered the bad race, then yes, it will fire this up.
I just wondered that we spin lock under the read lock, so some bad
iteration with writelock may happen.
> >> + cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
> >> + be16_to_cpu(rs->rs_bound_port);
> >> +
> >> + if (needle < cmp)
> >
> > Should it use wrapping logic if some field overflows?
>
> Sorry, please explain?
I meant that if there is an unsigned overflow this will suddenly become
a small number, so network timestamping comparison logic can be used,
but apparently neither address nor port are changed during the lifetime,
so nothing special is needed.
--
Evgeniy Polyakov
next prev parent reply other threads:[~2009-01-29 16:24 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-27 2:17 [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Andy Grover
2009-01-27 2:17 ` [ofa-general] [PATCH 01/21] RDS: Socket interface Andy Grover
2009-01-27 3:46 ` Stephen Hemminger
2009-01-29 3:17 ` [ofa-general] " Andrew Grover
2009-01-27 4:11 ` David Miller
2009-01-29 20:22 ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 12:08 ` Evgeniy Polyakov
2009-01-29 4:02 ` [ofa-general] " Andrew Grover
2009-01-29 16:24 ` Evgeniy Polyakov [this message]
2009-01-27 2:17 ` [ofa-general] [PATCH 02/21] RDS: Main header file Andy Grover
2009-01-27 7:34 ` Rémi Denis-Courmont
2009-01-27 19:27 ` [ofa-general] " Andrew Grover
2009-01-27 13:05 ` Evgeniy Polyakov
2009-01-27 19:23 ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 19:24 ` Steve Wise
2009-01-27 2:17 ` [PATCH 03/21] RDS: Congestion-handling code Andy Grover
2009-01-27 3:48 ` Stephen Hemminger
2009-01-27 19:15 ` Andrew Grover
2009-01-27 13:10 ` Evgeniy Polyakov
2009-01-27 19:10 ` Andrew Grover
2009-01-28 22:57 ` Roland Dreier
2009-01-29 2:39 ` [ofa-general] " Andy Grover
2009-01-27 2:17 ` [PATCH 04/21] RDS: Transport code Andy Grover
2009-01-27 13:18 ` Evgeniy Polyakov
2009-01-27 19:36 ` Andrew Grover
2009-01-27 21:56 ` [ofa-general] " Evgeniy Polyakov
2009-01-27 22:15 ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 2:17 ` [ofa-general] [PATCH 05/21] RDS: Info and stats Andy Grover
2009-01-27 13:28 ` Evgeniy Polyakov
2009-01-27 2:17 ` [PATCH 06/21] RDS: Connection handling Andy Grover
2009-01-27 13:34 ` Evgeniy Polyakov
2009-01-27 13:47 ` Oliver Neukum
2009-01-27 13:51 ` Evgeniy Polyakov
2009-01-27 16:28 ` [ofa-general] " Steve Wise
2009-01-29 3:03 ` ***SPAM*** " Andrew Grover
2009-01-29 8:03 ` Evgeniy Polyakov
2009-01-27 2:17 ` [PATCH 07/21] RDS: loopback Andy Grover
2009-01-27 2:17 ` [PATCH 08/21] RDS: sysctls Andy Grover
2009-01-27 2:17 ` [PATCH 09/21] RDS: Message parsing Andy Grover
2009-01-27 2:17 ` [PATCH 10/21] RDS: send.c Andy Grover
2009-01-27 2:17 ` [PATCH 11/21] RDS: recv.c Andy Grover
2009-01-27 2:17 ` [PATCH 12/21] RDS: RDMA support Andy Grover
2009-01-27 2:17 ` [ofa-general] [PATCH 13/21] RDS/IB: Infiniband transport Andy Grover
2009-01-27 2:17 ` [PATCH 14/21] RDS/IB: Ring-handling code Andy Grover
2009-01-27 2:17 ` [PATCH 15/21] RDS/IB: Implement RDMA ops using FMRs Andy Grover
2009-01-27 2:17 ` [PATCH 16/21] RDS/IB: Implement IB-specific datagram send Andy Grover
2009-01-27 2:17 ` [PATCH 17/21] RDS/IB: Receive datagrams via IB Andy Grover
2009-01-29 0:05 ` [ofa-general] " Roland Dreier
2009-01-29 2:20 ` Andy Grover
2009-01-29 21:02 ` Olaf Kirch
2009-01-29 21:47 ` [ofa-general] " Roland Dreier
2009-01-27 2:17 ` [PATCH 18/21] RDS/IB: Stats and sysctls Andy Grover
2009-01-27 2:17 ` [PATCH 19/21] RDS: Documentation Andy Grover
2009-01-27 2:17 ` [PATCH 20/21] RDS: Kconfig and Makefile Andy Grover
2009-01-28 22:59 ` Roland Dreier
2009-01-29 2:19 ` [ofa-general] " Andy Grover
2009-01-29 5:14 ` Roland Dreier
2009-01-27 2:17 ` [PATCH 21/21] RDS: Add AF and PF #defines for RDS sockets Andy Grover
2009-01-27 7:27 ` Rémi Denis-Courmont
2009-01-27 19:31 ` [ofa-general] " Andrew Grover
2009-01-27 15:34 ` [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Steve Wise
2009-01-27 19:29 ` ***SPAM*** " Andrew Grover
2009-01-28 22:37 ` Roland Dreier
2009-01-29 1:29 ` [ofa-general] " Andy Grover
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090129162425.GA31947@ioremap.net \
--to=zbr@ioremap.net \
--cc=andy.grover@gmail.com \
--cc=general@lists.openfabrics.org \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=rds-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).