All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Add PGM protocol support to the IP stack
Date: Mon, 22 Mar 2010 19:53:10 +0100	[thread overview]
Message-ID: <20100322185310.GA20695@one.firstfloor.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1003221300180.17230@router.home>

On Mon, Mar 22, 2010 at 01:07:37PM -0500, Christoph Lameter wrote:
> > >         B. PGM over UDP
> > >
> > >                 fd = socket(AF_INET, SOCK_RDM, IPPROTO_UDP)
> > >
> > >         C. PGM over SHM (?)
> > >
> > >                 fd = socket(AF_UNIX, SOCK_RDM, 0)
> >
> > Not sure how that should work.
> 
> Multiple processes would communicate via shm segments. Maybe defer to the
> future but its an important operation mode as the systems grow bigger and bigger.
> SHM segment would have to contain some sort of ring buffer that the
> receivers could tap into. But that mode has not really been thought
> through.

AF_UNIX is not SHM today.

The only point is to avoid one copy? (user1 -> kernel -> user2  to user1 -> user2) 
Not sure if that is really worth it. Don't you need another copy to the reliability
buffer anyways?

Letting kernel parse a data structure in user defined memory is also
always somewhat tricky.

But in principle AF_INET over localhost should not be that less efficient
than AF_UNIX, so you can probably drop it for now (unless you need special AF_UNIX
features like credentials)

> > >
> > >         Packet sizes are determined by the number of  packets in a single sendmsg() unless
> >
> > Number of bytes surely?
> 
> Sorry yes you are right.
> 
> > >         overridden by the RM_SET_MESSAGE_BOUNDARY socket option.
> >
> > That's unusual to have such a option (except the MTU). What is it good for?
> 
> No idea why it was implemented. It can be used to use send() for portions
> of a message. Triggers the send() only when all bytes have been provided.
> Probably necessary if one wants to have very long (megabytes) messages.

Those could be a problem in kernel memory consumption. One would need
to be very careful to have a good memory management scheme for the socket
in place.

> > >
> > >         A. Setting the window size / rate.
> > >
> > >                 struct pgm_send_window x;
> > >                 x.RateKbitsPerSec = 56;
> > >                 x.WindowSizeInMsecs = 60000;
> > >                 x.WindowSizeinBytes = 10000000;
> > >
> > >                 setsockopt(fd, SOCK_RDM, RM_RATE_WINDOW_SIZE, &x, sizeof(x));
> > >
> > >                 Default is sending at 56Kbps with a buffer of 10 Megabytes and buffering for a minute.
> >
> > That's a very large buffer for a socket. It would be better to use the usual
> > auto shrinking/increasing mechanisms.
> 
> Reliable multicast protocols have a defined time period / "reliabilty
> buffer" so that they can resend a message that was missed for a time
> period. It is customary to either specify a time period or define the size
> of the "reliability buffer".

One problem is memory management then. What happens when a process opens 100 of those
sockets and fills them all?

I guess you would still need a suitable global limit like TCP has.

> Never used it. I'd rather skip for now. Maybe later.
> 
> >
> > > /* Socket API structures (established by M$DN) */
> > > struct pgm_receiver_stats {
> > >         u64     NumODataPacketsReceived;        /* Number of ODATA (original) sequences */
> >
> > It's difficult to maintain 64 bit counters on 32bit hosts on all targets.
> > But I guess it would be ok to only fill in 32bit in this case.
> 
> 32 bit counters have the awful habit of overflowing.

There's just no portable atomic64_t. Ok maybe you can use the socket lock
to synchronize all the counts if they are only per socket.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

  reply	other threads:[~2010-03-22 18:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-18 17:58 Add PGM protocol support to the IP stack Christoph Lameter
2010-03-18 21:58 ` Christoph Lameter
2010-03-19 17:18 ` Andi Kleen
2010-03-19 21:53   ` David Miller
2010-03-19 22:26     ` H. Peter Anvin
2010-03-22 14:24       ` Christoph Lameter
2010-03-22 14:20   ` Christoph Lameter
2010-03-22 16:36     ` Andi Kleen
2010-03-22 16:51       ` Christoph Lameter
2010-03-22 17:43         ` Andi Kleen
2010-03-22 18:07           ` Christoph Lameter
2010-03-22 18:53             ` Andi Kleen [this message]
2010-03-22 19:32               ` Christoph Lameter
2010-03-26 17:33               ` Christoph Lameter
2010-03-27 13:11                 ` Andi Kleen
2010-03-27 16:54                   ` Martin Sustrik
2010-03-29 14:50                     ` Christoph Lameter
2010-03-29 15:00                   ` Christoph Lameter
2010-03-29 21:43                     ` Andi Kleen
2010-03-29 23:01               ` H. Peter Anvin
2010-03-30 18:12                 ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100322185310.GA20695@one.firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=cl@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.