All of lore.kernel.org
 help / color / mirror / Atom feed
From: Evgeniy Polyakov <zbr@ioremap.net>
To: Andy Grover <andy.grover@oracle.com>
Cc: rdreier@cisco.com, rds-devel@oss.oracle.com,
	general@lists.openfabrics.org, netdev@vger.kernel.org
Subject: Re: [PATCH 01/21] RDS: Socket interface
Date: Tue, 27 Jan 2009 15:08:40 +0300	[thread overview]
Message-ID: <20090127120840.GC2646@ioremap.net> (raw)
In-Reply-To: <1233022678-9259-2-git-send-email-andy.grover@oracle.com>

Hi Andy.

On Mon, Jan 26, 2009 at 06:17:38PM -0800, Andy Grover (andy.grover@oracle.com) wrote:
> +/* this is just used for stats gathering :/ */

Shouldn't this be some kind of per-cpu data?

> +static DEFINE_SPINLOCK(rds_sock_lock);
> +static unsigned long rds_sock_count;
> +static LIST_HEAD(rds_sock_list);
> +DECLARE_WAIT_QUEUE_HEAD(rds_poll_waitq);

Global list of all sockets? This does not scale, maybe it should be
groupped into hash table or be per-device?

> +static int rds_release(struct socket *sock)
> +{
> +	struct sock *sk = sock->sk;
> +	struct rds_sock *rs;
> +	unsigned long flags;
> +
> +	if (sk == NULL)
> +		goto out;
> +
> +	rs = rds_sk_to_rs(sk);
> +
> +	sock_orphan(sk);

Why is it needed getting socket is about to be freed?

> +	/* Note - rds_clear_recv_queue grabs rs_recv_lock, so
> +	 * that ensures the recv path has completed messing
> +	 * with the socket. */
> +	rds_clear_recv_queue(rs);
> +	rds_cong_remove_socket(rs);
> +	rds_remove_bound(rs);
> +	rds_send_drop_to(rs, NULL);
> +	rds_rdma_drop_keys(rs);
> +	rds_notify_queue_get(rs, NULL);
> +
> +	spin_lock_irqsave(&rds_sock_lock, flags);
> +	list_del_init(&rs->rs_item);
> +	rds_sock_count--;
> +	spin_unlock_irqrestore(&rds_sock_lock, flags);

Does RDS sockets work with high number of creation/destruction
workloads?
> +static unsigned int rds_poll(struct file *file, struct socket *sock,
> +			     poll_table *wait)
> +{
> +	struct sock *sk = sock->sk;
> +	struct rds_sock *rs = rds_sk_to_rs(sk);
> +	unsigned int mask = 0;
> +	unsigned long flags;
> +
> +	poll_wait(file, sk->sk_sleep, wait);
> +
> +	poll_wait(file, &rds_poll_waitq, wait);
> +

Are you absolutely sure that provided poll_table callback
will not do the bad things here? It is quite unusual to add several
different queues into the same head in the poll callback.
And shouldn't rds_poll_waitq be lock protected here?

> +	read_lock_irqsave(&rs->rs_recv_lock, flags);
> +	if (!rs->rs_cong_monitor) {
> +		/* When a congestion map was updated, we signal POLLIN for
> +		 * "historical" reasons. Applications can also poll for
> +		 * WRBAND instead. */
> +		if (rds_cong_updated_since(&rs->rs_cong_track))
> +			mask |= (POLLIN | POLLRDNORM | POLLWRBAND);
> +	} else {
> +		spin_lock(&rs->rs_lock);

Is there a possibility to have lock iteraction problem with above
rs_recv_lock read lock?

> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 24)

This should be dropped in the mainline tree.

> +/*
> + * XXX this probably still needs more work.. no INADDR_ANY, and rbtrees aren't
> + * particularly zippy.
> + *
> + * This is now called for every incoming frame so we arguably care much more
> + * about it than we used to.
> + */
> +static DEFINE_SPINLOCK(rds_bind_lock);
> +static struct rb_root rds_bind_tree = RB_ROOT;

Hash table with the appropriate size will have faster lookup/access
times btw.

> +static struct rds_sock *rds_bind_tree_walk(__be32 addr, __be16 port,
> +					   struct rds_sock *insert)
> +{
> +	struct rb_node **p = &rds_bind_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct rds_sock *rs;
> +	u64 cmp;
> +	u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
> +
> +	while (*p) {
> +		parent = *p;
> +		rs = rb_entry(parent, struct rds_sock, rs_bound_node);
> +
> +		cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
> +		      be16_to_cpu(rs->rs_bound_port);
> +
> +		if (needle < cmp)

Should it use wrapping logic if some field overflows?

> +	rdsdebug("returning rs %p for %u.%u.%u.%u:%u\n", rs, NIPQUAD(addr),
> +		ntohs(port));

Iirc there is a new %pi4 or similar format id.

-- 
	Evgeniy Polyakov

  parent reply	other threads:[~2009-01-27 12:08 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-27  2:17 [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Andy Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 01/21] RDS: Socket interface Andy Grover
2009-01-27  3:46   ` Stephen Hemminger
2009-01-29  3:17     ` [ofa-general] " Andrew Grover
2009-01-27  4:11   ` David Miller
2009-01-29 20:22     ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 12:08   ` Evgeniy Polyakov [this message]
2009-01-29  4:02     ` [ofa-general] " Andrew Grover
2009-01-29 16:24       ` Evgeniy Polyakov
2009-01-27  2:17 ` [ofa-general] [PATCH 02/21] RDS: Main header file Andy Grover
2009-01-27  7:34   ` Rémi Denis-Courmont
2009-01-27 19:27     ` [ofa-general] " Andrew Grover
2009-01-27 13:05   ` Evgeniy Polyakov
2009-01-27 19:23     ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 19:24       ` Steve Wise
2009-01-27  2:17 ` [PATCH 03/21] RDS: Congestion-handling code Andy Grover
2009-01-27  3:48   ` Stephen Hemminger
2009-01-27 19:15     ` Andrew Grover
2009-01-27 13:10   ` Evgeniy Polyakov
2009-01-27 19:10     ` Andrew Grover
2009-01-28 22:57   ` Roland Dreier
2009-01-29  2:39     ` [ofa-general] " Andy Grover
2009-01-27  2:17 ` [PATCH 04/21] RDS: Transport code Andy Grover
2009-01-27 13:18   ` Evgeniy Polyakov
2009-01-27 19:36     ` Andrew Grover
2009-01-27 21:56       ` [ofa-general] " Evgeniy Polyakov
2009-01-27 22:15         ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 05/21] RDS: Info and stats Andy Grover
2009-01-27 13:28   ` Evgeniy Polyakov
2009-01-27  2:17 ` [PATCH 06/21] RDS: Connection handling Andy Grover
2009-01-27 13:34   ` Evgeniy Polyakov
2009-01-27 13:47     ` Oliver Neukum
2009-01-27 13:51       ` Evgeniy Polyakov
2009-01-27 16:28       ` [ofa-general] " Steve Wise
2009-01-29  3:03         ` ***SPAM*** " Andrew Grover
2009-01-29  8:03           ` Evgeniy Polyakov
2009-01-27  2:17 ` [PATCH 07/21] RDS: loopback Andy Grover
2009-01-27  2:17 ` [PATCH 08/21] RDS: sysctls Andy Grover
2009-01-27  2:17 ` [PATCH 09/21] RDS: Message parsing Andy Grover
2009-01-27  2:17 ` [PATCH 10/21] RDS: send.c Andy Grover
2009-01-27  2:17 ` [PATCH 11/21] RDS: recv.c Andy Grover
2009-01-27  2:17 ` [PATCH 12/21] RDS: RDMA support Andy Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 13/21] RDS/IB: Infiniband transport Andy Grover
2009-01-27  2:17 ` [PATCH 14/21] RDS/IB: Ring-handling code Andy Grover
2009-01-27  2:17 ` [PATCH 15/21] RDS/IB: Implement RDMA ops using FMRs Andy Grover
2009-01-27  2:17 ` [PATCH 16/21] RDS/IB: Implement IB-specific datagram send Andy Grover
2009-01-27  2:17 ` [PATCH 17/21] RDS/IB: Receive datagrams via IB Andy Grover
2009-01-29  0:05   ` [ofa-general] " Roland Dreier
2009-01-29  2:20     ` Andy Grover
2009-01-29 21:02       ` Olaf Kirch
2009-01-29 21:47         ` [ofa-general] " Roland Dreier
2009-01-27  2:17 ` [PATCH 18/21] RDS/IB: Stats and sysctls Andy Grover
2009-01-27  2:17 ` [PATCH 19/21] RDS: Documentation Andy Grover
2009-01-27  2:17 ` [PATCH 20/21] RDS: Kconfig and Makefile Andy Grover
2009-01-28 22:59   ` Roland Dreier
2009-01-29  2:19     ` [ofa-general] " Andy Grover
2009-01-29  5:14       ` Roland Dreier
2009-01-27  2:17 ` [PATCH 21/21] RDS: Add AF and PF #defines for RDS sockets Andy Grover
2009-01-27  7:27   ` Rémi Denis-Courmont
2009-01-27 19:31     ` [ofa-general] " Andrew Grover
2009-01-27 15:34 ` [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Steve Wise
2009-01-27 19:29   ` ***SPAM*** " Andrew Grover
2009-01-28 22:37 ` Roland Dreier
2009-01-29  1:29   ` [ofa-general] " Andy Grover

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090127120840.GC2646@ioremap.net \
    --to=zbr@ioremap.net \
    --cc=andy.grover@oracle.com \
    --cc=general@lists.openfabrics.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=rds-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.