netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Evgeniy Polyakov <zbr@ioremap.net>
To: Andy Grover <andy.grover@oracle.com>
Cc: rdreier@cisco.com, rds-devel@oss.oracle.com,
	general@lists.openfabrics.org, netdev@vger.kernel.org
Subject: Re: [PATCH 01/21] RDS: Socket interface
Date: Tue, 27 Jan 2009 15:08:40 +0300	[thread overview]
Message-ID: <20090127120840.GC2646@ioremap.net> (raw)
In-Reply-To: <1233022678-9259-2-git-send-email-andy.grover@oracle.com>

Hi Andy.

On Mon, Jan 26, 2009 at 06:17:38PM -0800, Andy Grover (andy.grover@oracle.com) wrote:
> +/* this is just used for stats gathering :/ */

Shouldn't this be some kind of per-cpu data?

> +static DEFINE_SPINLOCK(rds_sock_lock);
> +static unsigned long rds_sock_count;
> +static LIST_HEAD(rds_sock_list);
> +DECLARE_WAIT_QUEUE_HEAD(rds_poll_waitq);

Global list of all sockets? This does not scale, maybe it should be
groupped into hash table or be per-device?

> +static int rds_release(struct socket *sock)
> +{
> +	struct sock *sk = sock->sk;
> +	struct rds_sock *rs;
> +	unsigned long flags;
> +
> +	if (sk == NULL)
> +		goto out;
> +
> +	rs = rds_sk_to_rs(sk);
> +
> +	sock_orphan(sk);

Why is it needed getting socket is about to be freed?

> +	/* Note - rds_clear_recv_queue grabs rs_recv_lock, so
> +	 * that ensures the recv path has completed messing
> +	 * with the socket. */
> +	rds_clear_recv_queue(rs);
> +	rds_cong_remove_socket(rs);
> +	rds_remove_bound(rs);
> +	rds_send_drop_to(rs, NULL);
> +	rds_rdma_drop_keys(rs);
> +	rds_notify_queue_get(rs, NULL);
> +
> +	spin_lock_irqsave(&rds_sock_lock, flags);
> +	list_del_init(&rs->rs_item);
> +	rds_sock_count--;
> +	spin_unlock_irqrestore(&rds_sock_lock, flags);

Does RDS sockets work with high number of creation/destruction
workloads?
> +static unsigned int rds_poll(struct file *file, struct socket *sock,
> +			     poll_table *wait)
> +{
> +	struct sock *sk = sock->sk;
> +	struct rds_sock *rs = rds_sk_to_rs(sk);
> +	unsigned int mask = 0;
> +	unsigned long flags;
> +
> +	poll_wait(file, sk->sk_sleep, wait);
> +
> +	poll_wait(file, &rds_poll_waitq, wait);
> +

Are you absolutely sure that provided poll_table callback
will not do the bad things here? It is quite unusual to add several
different queues into the same head in the poll callback.
And shouldn't rds_poll_waitq be lock protected here?

> +	read_lock_irqsave(&rs->rs_recv_lock, flags);
> +	if (!rs->rs_cong_monitor) {
> +		/* When a congestion map was updated, we signal POLLIN for
> +		 * "historical" reasons. Applications can also poll for
> +		 * WRBAND instead. */
> +		if (rds_cong_updated_since(&rs->rs_cong_track))
> +			mask |= (POLLIN | POLLRDNORM | POLLWRBAND);
> +	} else {
> +		spin_lock(&rs->rs_lock);

Is there a possibility to have lock iteraction problem with above
rs_recv_lock read lock?

> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 24)

This should be dropped in the mainline tree.

> +/*
> + * XXX this probably still needs more work.. no INADDR_ANY, and rbtrees aren't
> + * particularly zippy.
> + *
> + * This is now called for every incoming frame so we arguably care much more
> + * about it than we used to.
> + */
> +static DEFINE_SPINLOCK(rds_bind_lock);
> +static struct rb_root rds_bind_tree = RB_ROOT;

Hash table with the appropriate size will have faster lookup/access
times btw.

> +static struct rds_sock *rds_bind_tree_walk(__be32 addr, __be16 port,
> +					   struct rds_sock *insert)
> +{
> +	struct rb_node **p = &rds_bind_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct rds_sock *rs;
> +	u64 cmp;
> +	u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
> +
> +	while (*p) {
> +		parent = *p;
> +		rs = rb_entry(parent, struct rds_sock, rs_bound_node);
> +
> +		cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
> +		      be16_to_cpu(rs->rs_bound_port);
> +
> +		if (needle < cmp)

Should it use wrapping logic if some field overflows?

> +	rdsdebug("returning rs %p for %u.%u.%u.%u:%u\n", rs, NIPQUAD(addr),
> +		ntohs(port));

Iirc there is a new %pi4 or similar format id.

-- 
	Evgeniy Polyakov

  parent reply	other threads:[~2009-01-27 12:08 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-27  2:17 [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Andy Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 01/21] RDS: Socket interface Andy Grover
2009-01-27  3:46   ` Stephen Hemminger
2009-01-29  3:17     ` [ofa-general] " Andrew Grover
2009-01-27  4:11   ` David Miller
2009-01-29 20:22     ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 12:08   ` Evgeniy Polyakov [this message]
2009-01-29  4:02     ` [ofa-general] " Andrew Grover
2009-01-29 16:24       ` Evgeniy Polyakov
2009-01-27  2:17 ` [ofa-general] [PATCH 02/21] RDS: Main header file Andy Grover
2009-01-27  7:34   ` Rémi Denis-Courmont
2009-01-27 19:27     ` [ofa-general] " Andrew Grover
2009-01-27 13:05   ` Evgeniy Polyakov
2009-01-27 19:23     ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 19:24       ` Steve Wise
2009-01-27  2:17 ` [PATCH 03/21] RDS: Congestion-handling code Andy Grover
2009-01-27  3:48   ` Stephen Hemminger
2009-01-27 19:15     ` Andrew Grover
2009-01-27 13:10   ` Evgeniy Polyakov
2009-01-27 19:10     ` Andrew Grover
2009-01-28 22:57   ` Roland Dreier
2009-01-29  2:39     ` [ofa-general] " Andy Grover
2009-01-27  2:17 ` [PATCH 04/21] RDS: Transport code Andy Grover
2009-01-27 13:18   ` Evgeniy Polyakov
2009-01-27 19:36     ` Andrew Grover
2009-01-27 21:56       ` [ofa-general] " Evgeniy Polyakov
2009-01-27 22:15         ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 05/21] RDS: Info and stats Andy Grover
2009-01-27 13:28   ` Evgeniy Polyakov
2009-01-27  2:17 ` [PATCH 06/21] RDS: Connection handling Andy Grover
2009-01-27 13:34   ` Evgeniy Polyakov
2009-01-27 13:47     ` Oliver Neukum
2009-01-27 13:51       ` Evgeniy Polyakov
2009-01-27 16:28       ` [ofa-general] " Steve Wise
2009-01-29  3:03         ` ***SPAM*** " Andrew Grover
2009-01-29  8:03           ` Evgeniy Polyakov
2009-01-27  2:17 ` [PATCH 07/21] RDS: loopback Andy Grover
2009-01-27  2:17 ` [PATCH 08/21] RDS: sysctls Andy Grover
2009-01-27  2:17 ` [PATCH 09/21] RDS: Message parsing Andy Grover
2009-01-27  2:17 ` [PATCH 10/21] RDS: send.c Andy Grover
2009-01-27  2:17 ` [PATCH 11/21] RDS: recv.c Andy Grover
2009-01-27  2:17 ` [PATCH 12/21] RDS: RDMA support Andy Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 13/21] RDS/IB: Infiniband transport Andy Grover
2009-01-27  2:17 ` [PATCH 14/21] RDS/IB: Ring-handling code Andy Grover
2009-01-27  2:17 ` [PATCH 15/21] RDS/IB: Implement RDMA ops using FMRs Andy Grover
2009-01-27  2:17 ` [PATCH 16/21] RDS/IB: Implement IB-specific datagram send Andy Grover
2009-01-27  2:17 ` [PATCH 17/21] RDS/IB: Receive datagrams via IB Andy Grover
2009-01-29  0:05   ` [ofa-general] " Roland Dreier
2009-01-29  2:20     ` Andy Grover
2009-01-29 21:02       ` Olaf Kirch
2009-01-29 21:47         ` [ofa-general] " Roland Dreier
2009-01-27  2:17 ` [PATCH 18/21] RDS/IB: Stats and sysctls Andy Grover
2009-01-27  2:17 ` [PATCH 19/21] RDS: Documentation Andy Grover
2009-01-27  2:17 ` [PATCH 20/21] RDS: Kconfig and Makefile Andy Grover
2009-01-28 22:59   ` Roland Dreier
2009-01-29  2:19     ` [ofa-general] " Andy Grover
2009-01-29  5:14       ` Roland Dreier
2009-01-27  2:17 ` [PATCH 21/21] RDS: Add AF and PF #defines for RDS sockets Andy Grover
2009-01-27  7:27   ` Rémi Denis-Courmont
2009-01-27 19:31     ` [ofa-general] " Andrew Grover
2009-01-27 15:34 ` [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Steve Wise
2009-01-27 19:29   ` ***SPAM*** " Andrew Grover
2009-01-28 22:37 ` Roland Dreier
2009-01-29  1:29   ` [ofa-general] " Andy Grover

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090127120840.GC2646@ioremap.net \
    --to=zbr@ioremap.net \
    --cc=andy.grover@oracle.com \
    --cc=general@lists.openfabrics.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=rds-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).