All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@vyatta.com>
To: Andy Grover <andy.grover@oracle.com>
Cc: rdreier@cisco.com, rds-devel@oss.oracle.com,
	general@lists.openfabrics.org, netdev@vger.kernel.org
Subject: Re: [PATCH 03/21] RDS: Congestion-handling code
Date: Mon, 26 Jan 2009 19:48:20 -0800	[thread overview]
Message-ID: <20090126194820.41cdb7f5@extreme> (raw)
In-Reply-To: <1233022678-9259-4-git-send-email-andy.grover@oracle.com>

On Mon, 26 Jan 2009 18:17:40 -0800
Andy Grover <andy.grover@oracle.com> wrote:

> RDS handles per-socket congestion by updating peers with a complete
> congestion map (8KB). This code keeps track of these maps for itself
> and ones received from peers.
> 
> Signed-off-by: Andy Grover <andy.grover@oracle.com>
> ---
>  drivers/infiniband/ulp/rds/cong.c |  424 +++++++++++++++++++++++++++++++++++++
>  1 files changed, 424 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/infiniband/ulp/rds/cong.c
> 
> diff --git a/drivers/infiniband/ulp/rds/cong.c b/drivers/infiniband/ulp/rds/cong.c
> new file mode 100644
> index 0000000..b7c49d2
> --- /dev/null
> +++ b/drivers/infiniband/ulp/rds/cong.c
> @@ -0,0 +1,424 @@
> +/*
> + * Copyright (c) 2007 Oracle.  All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + */
> +#include <linux/types.h>
> +#include <linux/rbtree.h>
> +
> +#include "rds.h"
> +
> +/*
> + * This file implements the receive side of the unconventional congestion
> + * management in RDS.
> + *
> + * Messages waiting in the receive queue on the receiving socket are accounted
> + * against the sockets SO_RCVBUF option value.  Only the payload bytes in the
> + * message are accounted for.  If the number of bytes queued equals or exceeds
> + * rcvbuf then the socket is congested.  All sends attempted to this socket's
> + * address should return block or return -EWOULDBLOCK.
> + *
> + * Applications are expected to be reasonably tuned such that this situation
> + * very rarely occurs.  An application encountering this "back-pressure" is
> + * considered a bug.
> + *
> + * This is implemented by having each node maintain bitmaps which indicate
> + * which ports on bound addresses are congested.  As the bitmap changes it is
> + * sent through all the connections which terminate in the local address of the
> + * bitmap which changed.
> + *
> + * The bitmaps are allocated as connections are brought up.  This avoids
> + * allocation in the interrupt handling path which queues messages on sockets.
> + * The dense bitmaps let transports send the entire bitmap on any bitmap change
> + * reasonably efficiently.  This is much easier to implement than some
> + * finer-grained communication of per-port congestion.  The sender does a very
> + * inexpensive bit test to test if the port it's about to send to is congested
> + * or not.
> + */
> +
> +/*
> + * Interaction with poll is a tad tricky. We want all processes stuck in
> + * poll to wake up and check whether a congested destination became uncongested.
> + * The really sad thing is we have no idea which destinations the application
> + * wants to send to - we don't even know which rds_connections are involved.
> + * So until we implement a more flexible rds poll interface, we have to make
> + * do with this:
> + * We maintain a global counter that is incremented each time a congestion map
> + * update is received. Each rds socket tracks this value, and if rds_poll
> + * finds that the saved generation number is smaller than the global generation
> + * number, it wakes up the process.
> + */
> +static atomic_t		rds_cong_generation = ATOMIC_INIT(0);
> +
> +/*
> + * Congestion monitoring
> + */
> +static LIST_HEAD(rds_cong_monitor);
> +static DEFINE_RWLOCK(rds_cong_monitor_lock);
> +
> +/*
> + * Yes, a global lock.  It's used so infrequently that it's worth keeping it
> + * global to simplify the locking.  It's only used in the following
> + * circumstances:
> + *
> + *  - on connection buildup to associate a conn with its maps
> + *  - on map changes to inform conns of a new map to send
> + *
> + *  It's sadly ordered under the socket callback lock and the connection lock.
> + *  Receive paths can mark ports congested from interrupt context so the
> + *  lock masks interrupts.
> + */

So this is starting to look like another "Oracle special" like AIO
and HugeTLB. That has lots of caveat restrictions on the application.


  reply	other threads:[~2009-01-27  3:48 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-27  2:17 [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Andy Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 01/21] RDS: Socket interface Andy Grover
2009-01-27  3:46   ` Stephen Hemminger
2009-01-29  3:17     ` [ofa-general] " Andrew Grover
2009-01-27  4:11   ` David Miller
2009-01-29 20:22     ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 12:08   ` Evgeniy Polyakov
2009-01-29  4:02     ` [ofa-general] " Andrew Grover
2009-01-29 16:24       ` Evgeniy Polyakov
2009-01-27  2:17 ` [ofa-general] [PATCH 02/21] RDS: Main header file Andy Grover
2009-01-27  7:34   ` Rémi Denis-Courmont
2009-01-27 19:27     ` [ofa-general] " Andrew Grover
2009-01-27 13:05   ` Evgeniy Polyakov
2009-01-27 19:23     ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27 19:24       ` Steve Wise
2009-01-27  2:17 ` [PATCH 03/21] RDS: Congestion-handling code Andy Grover
2009-01-27  3:48   ` Stephen Hemminger [this message]
2009-01-27 19:15     ` Andrew Grover
2009-01-27 13:10   ` Evgeniy Polyakov
2009-01-27 19:10     ` Andrew Grover
2009-01-28 22:57   ` Roland Dreier
2009-01-29  2:39     ` [ofa-general] " Andy Grover
2009-01-27  2:17 ` [PATCH 04/21] RDS: Transport code Andy Grover
2009-01-27 13:18   ` Evgeniy Polyakov
2009-01-27 19:36     ` Andrew Grover
2009-01-27 21:56       ` [ofa-general] " Evgeniy Polyakov
2009-01-27 22:15         ` [ofa-general] ***SPAM*** " Andrew Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 05/21] RDS: Info and stats Andy Grover
2009-01-27 13:28   ` Evgeniy Polyakov
2009-01-27  2:17 ` [PATCH 06/21] RDS: Connection handling Andy Grover
2009-01-27 13:34   ` Evgeniy Polyakov
2009-01-27 13:47     ` Oliver Neukum
2009-01-27 13:51       ` Evgeniy Polyakov
2009-01-27 16:28       ` [ofa-general] " Steve Wise
2009-01-29  3:03         ` ***SPAM*** " Andrew Grover
2009-01-29  8:03           ` Evgeniy Polyakov
2009-01-27  2:17 ` [PATCH 07/21] RDS: loopback Andy Grover
2009-01-27  2:17 ` [PATCH 08/21] RDS: sysctls Andy Grover
2009-01-27  2:17 ` [PATCH 09/21] RDS: Message parsing Andy Grover
2009-01-27  2:17 ` [PATCH 10/21] RDS: send.c Andy Grover
2009-01-27  2:17 ` [PATCH 11/21] RDS: recv.c Andy Grover
2009-01-27  2:17 ` [PATCH 12/21] RDS: RDMA support Andy Grover
2009-01-27  2:17 ` [ofa-general] [PATCH 13/21] RDS/IB: Infiniband transport Andy Grover
2009-01-27  2:17 ` [PATCH 14/21] RDS/IB: Ring-handling code Andy Grover
2009-01-27  2:17 ` [PATCH 15/21] RDS/IB: Implement RDMA ops using FMRs Andy Grover
2009-01-27  2:17 ` [PATCH 16/21] RDS/IB: Implement IB-specific datagram send Andy Grover
2009-01-27  2:17 ` [PATCH 17/21] RDS/IB: Receive datagrams via IB Andy Grover
2009-01-29  0:05   ` [ofa-general] " Roland Dreier
2009-01-29  2:20     ` Andy Grover
2009-01-29 21:02       ` Olaf Kirch
2009-01-29 21:47         ` [ofa-general] " Roland Dreier
2009-01-27  2:17 ` [PATCH 18/21] RDS/IB: Stats and sysctls Andy Grover
2009-01-27  2:17 ` [PATCH 19/21] RDS: Documentation Andy Grover
2009-01-27  2:17 ` [PATCH 20/21] RDS: Kconfig and Makefile Andy Grover
2009-01-28 22:59   ` Roland Dreier
2009-01-29  2:19     ` [ofa-general] " Andy Grover
2009-01-29  5:14       ` Roland Dreier
2009-01-27  2:17 ` [PATCH 21/21] RDS: Add AF and PF #defines for RDS sockets Andy Grover
2009-01-27  7:27   ` Rémi Denis-Courmont
2009-01-27 19:31     ` [ofa-general] " Andrew Grover
2009-01-27 15:34 ` [ofa-general] [PATCH 0/21] Reliable Datagram Sockets (RDS) Steve Wise
2009-01-27 19:29   ` ***SPAM*** " Andrew Grover
2009-01-28 22:37 ` Roland Dreier
2009-01-29  1:29   ` [ofa-general] " Andy Grover

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090126194820.41cdb7f5@extreme \
    --to=shemminger@vyatta.com \
    --cc=andy.grover@oracle.com \
    --cc=general@lists.openfabrics.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=rds-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.