Kernel sends multicast groups to sockets that did not subscribe to the MC group

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel sends multicast groups to sockets that did not subscribe to the MC group
@ 2009-04-13 21:06 Christoph Lameter
  2009-04-14 13:25 ` Neil Horman
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2009-04-13 21:06 UTC (permalink / raw)
  To: netdev

I ran two processes listening to two multicast groups:

One listening for 239.0.0.51 and the other on 239.0.0.50. Both are
listening on the same port.

If I send a multicast message to 239.0.0.50 then both receive it. Why is
the process listening to 239.0.0.51 receiving the multicast message send
to 239.0.0.50?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel sends multicast groups to sockets that did not subscribe to the MC group
  2009-04-13 21:06 Kernel sends multicast groups to sockets that did not subscribe to the MC group Christoph Lameter
@ 2009-04-14 13:25 ` Neil Horman
  2009-04-14 13:53   ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-04-14 13:25 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev

On Mon, Apr 13, 2009 at 05:06:08PM -0400, Christoph Lameter wrote:
> I ran two processes listening to two multicast groups:
> 
> One listening for 239.0.0.51 and the other on 239.0.0.50. Both are
> listening on the same port.
> 
> If I send a multicast message to 239.0.0.50 then both receive it. Why is
> the process listening to 239.0.0.51 receiving the multicast message send
> to 239.0.0.50?
> 
> 
Its correct behavior.  I had the same misunderstanding a few months back, but
Dave stevens set me right:
http://kerneltrap.org/index.php?q=mailarchive/linux-netdev/2008/7/11/2430904

Basically, its handled the exact same way that unicast addresses are handled,
its just a bit counter intuitive since group membership modifications are made
through a particular socket descriptor.  Despite that, multicast groups are
treated as being owned by the system, rather than by a socket.  So if you have a
situation in which you do a group membership add to 239.0.0.50 port 1024 on
socket A, and then do a bind on socket B to INADDR_ANY port 1024, you get those
multicast frames that socket A subscribed to.  Theres a link in the thread above
to a few papers that describe the operation in detail.

Regards
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel sends multicast groups to sockets that did not subscribe to the MC group
  2009-04-14 13:25 ` Neil Horman
@ 2009-04-14 13:53   ` Christoph Lameter
  2009-04-14 18:27     ` Neil Horman
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2009-04-14 13:53 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David Miller

On Tue, 14 Apr 2009, Neil Horman wrote:

> > If I send a multicast message to 239.0.0.50 then both receive it. Why is
> > the process listening to 239.0.0.51 receiving the multicast message send
> > to 239.0.0.50?
> Its correct behavior.  I had the same misunderstanding a few months back, but
> Dave stevens set me right:
> http://kerneltrap.org/index.php?q=mailarchive/linux-netdev/2008/7/11/2430904

Well its traditional behavior. The join operation occurs on a socket so
it is surprising that this means I join all multicast groups on an
interface. The reason given is that all unixes since BSD were as
braindead. Still crappy behavior since multiple applications that
subscribe to different groups but listen on the same port will get all
traffic of the other apps delivered to them.
Its trivial to fix as your patch shows since we already have per socket mc
lists required to support IGMPv3.

Maybe we can get this merged by adding a mc configuration variable
that switches between interface and socket based multicast subscriptions.

/proc/sys/net/mc_socket_based_join

Defaults to 0 and can be set to 1 to get sane behavior?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel sends multicast groups to sockets that did not subscribe to the MC group
  2009-04-14 13:53   ` Christoph Lameter
@ 2009-04-14 18:27     ` Neil Horman
  2009-04-14 18:33       ` Christoph Lameter
  2009-04-14 18:48       ` [PATCH] Multicast: Avoid useless duplication of multicast messages Christoph Lameter
  0 siblings, 2 replies; 28+ messages in thread
From: Neil Horman @ 2009-04-14 18:27 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, David Miller

On Tue, Apr 14, 2009 at 09:53:50AM -0400, Christoph Lameter wrote:
> On Tue, 14 Apr 2009, Neil Horman wrote:
> 
> > > If I send a multicast message to 239.0.0.50 then both receive it. Why is
> > > the process listening to 239.0.0.51 receiving the multicast message send
> > > to 239.0.0.50?
> > Its correct behavior.  I had the same misunderstanding a few months back, but
> > Dave stevens set me right:
> > http://kerneltrap.org/index.php?q=mailarchive/linux-netdev/2008/7/11/2430904
> 
> Well its traditional behavior. The join operation occurs on a socket so
> it is surprising that this means I join all multicast groups on an
> interface. The reason given is that all unixes since BSD were as
> braindead. Still crappy behavior since multiple applications that
> subscribe to different groups but listen on the same port will get all
> traffic of the other apps delivered to them.

The only reason that happens is because the apps themselves are broken.  The
only way an application would get messages from unexpected Multicast addresses
is if it joined a group, and then bound the socket to INADDR_ANY, rather than to
the multicast group and port that it joined to.  And if it does that, it has to
be written to detect and cope with malformed data from unexpected hosts, lest it
be vulnurable to any number of bugs.

It works exactly the same way with unicast UDP.  If an application receives on a
socket that is bound to INADDR_ANY, it needs to be especially careful in parsing
the data that it receives, since there is no transport layer validation of the
sending clients status (the way there is with tcp or sctp).  If host A has a
socket on an application bound to INADDR_ANY and is receiving data from host B,
nothing is stopping host C from starting to send whatever garbage it wants to
host A as well, and its up to the application to sort that out.  Its exactly the
same with multicast.  Its just that people assume it works in a certain way
(like I did), and it doesn't.

> Its trivial to fix as your patch shows since we already have per socket mc
> lists required to support IGMPv3.
> 
Yes, we can change the code, and its not hard, the question is: why?  It would
make the use of multicast a bit more intuitive, yes, but I would be concerned
about applications which expect this behavior.  They would all break with this
change.  I can certainly envision an application listening on multiple multicast
groups, and as a matter of simplification, binding to INADDR_ANY, and validating
any received data to toss messages from groups they don't want in user space.  I
suppose theres some advantage in doing the filtering in kernel space to avoid
the extraneous copy_to_user, but I'm not sure thats always feasible, As an
application might not know at any given moment what multicast groups it needs to
receive on.

> Maybe we can get this merged by adding a mc configuration variable
> that switches between interface and socket based multicast subscriptions.
> 
> /proc/sys/net/mc_socket_based_join
> 
Possible, but I'd still be worried about the above.  Using a switch like this is
global (or at least per net namespace), and prevents a mix of apps written to
the 'new model' and the current model.  A prctl option or an additional socket
option might be more palatable.  I think if you could find some standard,
specification or common practice documenting that multicast works the way
you 'expect' on other systems, thsi might get more traction.  I've not found
anything to that effect though (although I've not looked very hard).

Neil


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel sends multicast groups to sockets that did not subscribe to the MC group
  2009-04-14 18:27     ` Neil Horman
@ 2009-04-14 18:33       ` Christoph Lameter
  2009-04-14 20:01         ` Neil Horman
  2009-04-14 18:48       ` [PATCH] Multicast: Avoid useless duplication of multicast messages Christoph Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2009-04-14 18:33 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David Miller

On Tue, 14 Apr 2009, Neil Horman wrote:

> The only reason that happens is because the apps themselves are broken.  The
> only way an application would get messages from unexpected Multicast addresses
> is if it joined a group, and then bound the socket to INADDR_ANY, rather than to
> the multicast group and port that it joined to.  And if it does that, it has to
> be written to detect and cope with malformed data from unexpected hosts, lest it
> be vulnurable to any number of bugs.

This occurs here if two applications on a machine bind to the
different MC groups but the same port. Applications need to bind to the
same port since MC traffic has a port number included. Do I need to bind
to the IP address of the NIC? What does INADDR_ANY have to do with it?

> It works exactly the same way with unicast UDP.  If an application receives on a
> socket that is bound to INADDR_ANY, it needs to be especially careful in parsing
> the data that it receives, since there is no transport layer validation of the
> sending clients status (the way there is with tcp or sctp).  If host A has a
> socket on an application bound to INADDR_ANY and is receiving data from host B,
> nothing is stopping host C from starting to send whatever garbage it wants to
> host A as well, and its up to the application to sort that out.  Its exactly the
> same with multicast.  Its just that people assume it works in a certain way
> (like I did), and it doesn't.

Again what does this have to do with INADDR_ANY? You are talking about UDP
sockets? In that case the sorting out usually happens based on the source
address anyways.

> Yes, we can change the code, and its not hard, the question is: why?  It would
> make the use of multicast a bit more intuitive, yes, but I would be concerned
> about applications which expect this behavior.  They would all break with this
> change.  I can certainly envision an application listening on multiple multicast
> groups, and as a matter of simplification, binding to INADDR_ANY, and validating
> any received data to toss messages from groups they don't want in user space.  I
> suppose theres some advantage in doing the filtering in kernel space to avoid
> the extraneous copy_to_user, but I'm not sure thats always feasible, As an
> application might not know at any given moment what multicast groups it needs to
> receive on.

Please read the initial message that started this thread.

If an application listens on multiple multicast groups then it needs to
perform join operations otherwise the switch will not forward the
multicast groups to the host.
(Just ignoring the INADDR_ANY bits since I do not know what this would
have to do with the issue at hand)

> Possible, but I'd still be worried about the above.  Using a switch like this is
> global (or at least per net namespace), and prevents a mix of apps written to
> the 'new model' and the current model.  A prctl option or an additional socket
> option might be more palatable.  I think if you could find some standard,
> specification or common practice documenting that multicast works the way
> you 'expect' on other systems, thsi might get more traction.  I've not found
> anything to that effect though (although I've not looked very hard).

I cannot envision that there would be many applications having made any
use of the current situation where all mc traffic to a port is forwarded
to the multiple applications that may have subscribed to disjunct sets of
multicast groups.

One could envison having two processes that open the same socket/port and
coordinate: The first joins the multicast groups and then continues in a
monitoring role whereas the second actually processes the data. But then
the data is forwarded to both processes and one of them is not processing
it. So its fundamentally bad behavior. I would even suggest to make the
socket based filtering the default (as in other OSes).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel sends multicast groups to sockets that did not subscribe to the MC group
  2009-04-14 18:33       ` Christoph Lameter
@ 2009-04-14 20:01         ` Neil Horman
  2009-04-14 20:16           ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-04-14 20:01 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, David Miller

On Tue, Apr 14, 2009 at 02:33:53PM -0400, Christoph Lameter wrote:
> On Tue, 14 Apr 2009, Neil Horman wrote:
> 
> > The only reason that happens is because the apps themselves are broken.  The
> > only way an application would get messages from unexpected Multicast addresses
> > is if it joined a group, and then bound the socket to INADDR_ANY, rather than to
> > the multicast group and port that it joined to.  And if it does that, it has to
> > be written to detect and cope with malformed data from unexpected hosts, lest it
> > be vulnurable to any number of bugs.
> 
> This occurs here if two applications on a machine bind to the
> different MC groups but the same port. Applications need to bind to the
> same port since MC traffic has a port number included. Do I need to bind
> to the IP address of the NIC? What does INADDR_ANY have to do with it?
> 
Lets be clear here, when you say, bind, are you referring to the bind call
specifically?  or the setsockopt(IP_ADD_MEMBERSHIP, ... ) call?  I'm referring
to the former.  From your initial description, you're referring to the latter.
If you check inet_bind, you'll note that the sockets rcv_saddr is set to the
bound address, which is then later used in udp_v4_mcast_next to filter on which
sockets should receive which frames, so by binding (via the bind syscall), you
can make a socket filter which mcast address it receives.  I'm fairly certain in
your test, you're referring to membership management, via IP_ADD_MEMBERSHIP,
which is different from bind.

INADDRY_ANY comes into play here because application writers tend to be lazy
(and who can blame them :) ).  Nominally when you use unicast udp, you bind a
socket to an interface to tell it that you want to receive frames from that
interface (using the interfaces source ip address).  If they application wants
to receive frames on all interfaces, it just binds to INADDR_ANY.  When an app
developer uses multicast, they likely say something to themselves along the
lines of "I'd like to receive multicast on all interfaces, so I'll just bind to
INADDRY_ANY".  The problem is that they haven't really taken into consideration
what bind does.  The high level behavior is that it attaches a socket to an
interface, restricting packet reception to only that interface (and the man page
for ip doesn't do much to correct that).  The actuall behavior however is that
bind places a filter on a socket, allowing reception only from the specified
address on that socket.  This is made intuitive by the fact that unicast ip
addresses typically are bound to an interface (I'll ignore the system
ownership/interface ownership discussion here), so bind works like you expect.
But with multicast, there is no interface that the address is bound to, so the
behavior is less intuitive, but still perfectly functional, if you properly
understand how bind filters received packets.

> > It works exactly the same way with unicast UDP.  If an application receives on a
> > socket that is bound to INADDR_ANY, it needs to be especially careful in parsing
> > the data that it receives, since there is no transport layer validation of the
> > sending clients status (the way there is with tcp or sctp).  If host A has a
> > socket on an application bound to INADDR_ANY and is receiving data from host B,
> > nothing is stopping host C from starting to send whatever garbage it wants to
> > host A as well, and its up to the application to sort that out.  Its exactly the
> > same with multicast.  Its just that people assume it works in a certain way
> > (like I did), and it doesn't.
> 
> Again what does this have to do with INADDR_ANY? You are talking about UDP
> sockets? In that case the sorting out usually happens based on the source
> address anyways.
> 
See above.  It has everything to do with how bind works.  If you bind to
INADDR_ANY (which by the way is the default binding if you don't call bind on a
socket from your application), you are implicitly bound to receive frames from
all destination addresses in the system, which is why you get the behavior you
are seeing.  If you want to restrict which multicast addresses you recieve, bind
to them.

> > Yes, we can change the code, and its not hard, the question is: why?  It would
> > make the use of multicast a bit more intuitive, yes, but I would be concerned
> > about applications which expect this behavior.  They would all break with this
> > change.  I can certainly envision an application listening on multiple multicast
> > groups, and as a matter of simplification, binding to INADDR_ANY, and validating
> > any received data to toss messages from groups they don't want in user space.  I
> > suppose theres some advantage in doing the filtering in kernel space to avoid
> > the extraneous copy_to_user, but I'm not sure thats always feasible, As an
> > application might not know at any given moment what multicast groups it needs to
> > receive on.
> 
> Please read the initial message that started this thread.
> 
> If an application listens on multiple multicast groups then it needs to
> perform join operations otherwise the switch will not forward the
> multicast groups to the host.
> (Just ignoring the INADDR_ANY bits since I do not know what this would
> have to do with the issue at hand)
> 
If you don't yet understand what bind has to do with how this works, please read
above.

> > Possible, but I'd still be worried about the above.  Using a switch like this is
> > global (or at least per net namespace), and prevents a mix of apps written to
> > the 'new model' and the current model.  A prctl option or an additional socket
> > option might be more palatable.  I think if you could find some standard,
> > specification or common practice documenting that multicast works the way
> > you 'expect' on other systems, thsi might get more traction.  I've not found
> > anything to that effect though (although I've not looked very hard).
> 
> I cannot envision that there would be many applications having made any
> use of the current situation where all mc traffic to a port is forwarded
> to the multiple applications that may have subscribed to disjunct sets of
> multicast groups.
> 
This has been the behavior of multicast udp in the Linux network stack from its
creation, from what I can see.  I wouldn't be so quick to assume that changing
it won't bring people out of the woodwork.  Setting that asside however, not
breaking anybody isn't in my mind a sufficient reason to change this.  Despite
your inability to see how bind fits into this mechanism, trust for the moment
that bind provides the ability for a socket to filter which multicast group
messages are delivered to it.  Bearing that in mind, what additionl value does
the mechanism that you are proposing provide?

> One could envison having two processes that open the same socket/port and
> coordinate: The first joins the multicast groups and then continues in a
> monitoring role whereas the second actually processes the data. But then
> the data is forwarded to both processes and one of them is not processing
> it. So its fundamentally bad behavior. I would even suggest to make the
> socket based filtering the default (as in other OSes).
> 
Thats a poorly written application.  I refer you again to the implementation of
bind (see inet_bind and udp_v4_mcast_next for details of its inner workings)

Neil

> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel sends multicast groups to sockets that did not subscribe to the MC group
  2009-04-14 20:01         ` Neil Horman
@ 2009-04-14 20:16           ` Christoph Lameter
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2009-04-14 20:16 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David Miller

On Tue, 14 Apr 2009, Neil Horman wrote:

> Lets be clear here, when you say, bind, are you referring to the bind call
> specifically?  or the setsockopt(IP_ADD_MEMBERSHIP, ... ) call?  I'm referring
> to the former.  From your initial description, you're referring to the latter.
> If you check inet_bind, you'll note that the sockets rcv_saddr is set to the
> bound address, which is then later used in udp_v4_mcast_next to filter on which
> sockets should receive which frames, so by binding (via the bind syscall), you
> can make a socket filter which mcast address it receives.  I'm fairly certain in
> your test, you're referring to membership management, via IP_ADD_MEMBERSHIP,
> which is different from bind.

We are both referring to the bind call. Which does not make sense in this
contect since we want to filter multicast traffic to a certain multicast
address. This is *not* to be confused with the interface address.

> See above.  It has everything to do with how bind works.  If you bind to
> INADDR_ANY (which by the way is the default binding if you don't call bind on a
> socket from your application), you are implicitly bound to receive frames from
> all destination addresses in the system, which is why you get the behavior you
> are seeing.  If you want to restrict which multicast addresses you recieve, bind
> to them.

There is only a single multicast address you can "bind" to here. What you
are suggesting here is dicsouraged because traffic you will then send out
from the socket will have as its origin the multicast address.

> If you don't yet understand what bind has to do with how this works, please read
> above.

You are messing around with binding becasue the multicast filtering
does not work?

> > One could envison having two processes that open the same socket/port and
> > coordinate: The first joins the multicast groups and then continues in a
> > monitoring role whereas the second actually processes the data. But then
> > the data is forwarded to both processes and one of them is not processing
> > it. So its fundamentally bad behavior. I would even suggest to make the
> > socket based filtering the default (as in other OSes).
> >
> Thats a poorly written application.  I refer you again to the implementation of
> bind (see inet_bind and udp_v4_mcast_next for details of its inner workings)

Which you should not abuse for multicast filtering. These are hacks to
deal with the problem that the socket multicast lists are not used for
filtering.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-14 18:27     ` Neil Horman
  2009-04-14 18:33       ` Christoph Lameter
@ 2009-04-14 18:48       ` Christoph Lameter
  2009-04-14 20:44         ` Neil Horman
  2009-04-15 22:16         ` David Stevens
  1 sibling, 2 replies; 28+ messages in thread
From: Christoph Lameter @ 2009-04-14 18:48 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David Miller

Neil: Could you test this?


Subject: Multicast: Avoid useless duplication of multicast messages

If two processes open the same port as a multicast socket and then
join two different multicast groups then traffic for both multicast groups
is forwarded to either process. This means that application will get surprising
data that they did not ask for. Applications will have to filter these out in
order to work correctly if multiple apps run on the same system.

These are pretty strange semantics but they have been around since the
beginning of multicast support on Unix systems.

Add an option

	igmp_mc_socket_based_filtering

that is off by default so that the default behavior stays as is.

If one wants to have sane multicast behavior for the above case
then this option can be set. Thereupon applications will not get
additional traffic forwarded to them if they happen to run on a host
where another application also receives multicast traffic from a
different multicast group.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 Documentation/networking/ip-sysctl.txt |   10 ++++++++++
 include/linux/igmp.h                   |    1 +
 include/linux/sysctl.h                 |    1 +
 net/ipv4/igmp.c                        |    6 +++---
 net/ipv4/sysctl_net_ipv4.c             |    8 ++++++++
 5 files changed, 23 insertions(+), 3 deletions(-)

Index: linux-2.6/net/ipv4/igmp.c
===================================================================
--- linux-2.6.orig/net/ipv4/igmp.c	2009-04-14 13:03:14.000000000 -0500
+++ linux-2.6/net/ipv4/igmp.c	2009-04-14 13:11:38.000000000 -0500
@@ -1419,7 +1419,7 @@ static struct in_device *ip_mc_find_dev(
  */
 int sysctl_igmp_max_memberships __read_mostly = IP_MAX_MEMBERSHIPS;
 int sysctl_igmp_max_msf __read_mostly = IP_MAX_MSF;
-
+int sysctl_igmp_mc_socket_based_filtering = 0;

 static int ip_mc_del1_src(struct ip_mc_list *pmc, int sfmode,
 	__be32 *psfsrc)
@@ -2187,7 +2187,7 @@ int ip_mc_sf_allow(struct sock *sk, __be
 	struct ip_sf_socklist *psl;
 	int i;

-	if (!ipv4_is_multicast(loc_addr))
+	if (ipv4_is_lbcast(loc_addr) || !ipv4_is_multicast(loc_addr))
 		return 1;

 	for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
@@ -2196,7 +2196,7 @@ int ip_mc_sf_allow(struct sock *sk, __be
 			break;
 	}
 	if (!pmc)
-		return 1;
+		return !sysctl_igmp_mc_socket_based_filtering;
 	psl = pmc->sflist;
 	if (!psl)
 		return pmc->sfmode == MCAST_EXCLUDE;
Index: linux-2.6/include/linux/igmp.h
===================================================================
--- linux-2.6.orig/include/linux/igmp.h	2009-04-14 13:13:14.000000000 -0500
+++ linux-2.6/include/linux/igmp.h	2009-04-14 13:41:14.000000000 -0500
@@ -150,6 +150,7 @@ static inline struct igmpv3_query *

 extern int sysctl_igmp_max_memberships;
 extern int sysctl_igmp_max_msf;
+extern int sysctl_igmp_mc_socket_based_filtering;

 struct ip_sf_socklist
 {
Index: linux-2.6/include/linux/sysctl.h
===================================================================
--- linux-2.6.orig/include/linux/sysctl.h	2009-04-14 13:15:57.000000000 -0500
+++ linux-2.6/include/linux/sysctl.h	2009-04-14 13:16:49.000000000 -0500
@@ -435,6 +435,7 @@ enum
 	NET_TCP_ALLOWED_CONG_CONTROL=123,
 	NET_TCP_MAX_SSTHRESH=124,
 	NET_TCP_FRTO_RESPONSE=125,
+	NET_IPV4_IGMP_MC_SOCKET_BASED_FILTERING=126,
 };

 enum {
Index: linux-2.6/net/ipv4/sysctl_net_ipv4.c
===================================================================
--- linux-2.6.orig/net/ipv4/sysctl_net_ipv4.c	2009-04-14 13:13:53.000000000 -0500
+++ linux-2.6/net/ipv4/sysctl_net_ipv4.c	2009-04-14 13:15:44.000000000 -0500
@@ -408,6 +408,14 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.ctl_name	= NET_IPV4_IGMP_MC_SOCKET_BASED_FILTERING,
+		.procname	= "igmp_mc_socked_based_filtering",
+		.data		= &sysctl_igmp_mc_socket_based_filtering,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},

 #endif
 	{
Index: linux-2.6/Documentation/networking/ip-sysctl.txt
===================================================================
--- linux-2.6.orig/Documentation/networking/ip-sysctl.txt	2009-04-14 13:48:09.000000000 -0500
+++ linux-2.6/Documentation/networking/ip-sysctl.txt	2009-04-14 13:53:10.000000000 -0500
@@ -611,6 +611,16 @@ igmp_max_memberships - INTEGER
 	Change the maximum number of multicast groups we can subscribe to.
 	Default: 20

+igmp_mc_socket_based_filtering - INTEGER
+	Use the list of subscribed multicast addresses to filter the traffic
+	going to a multicast socket. If set to zero then multicast traffic
+        is forwarded to any socket subscribed to a port number ignoring the
+        list of multicast groups that a socket has been subscribed to. This mode
+        is the default since it has been done that way in the past.
+	If set to one then only multicast traffic of the multicast groups
+        that a socket has joined are forwarded to the socket.
+	Default: 0
+
 conf/interface/*  changes special settings per interface (where "interface" is
 		  the name of your network interface)
 conf/all/*	  is special, changes the settings for all interfaces

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-14 18:48       ` [PATCH] Multicast: Avoid useless duplication of multicast messages Christoph Lameter
@ 2009-04-14 20:44         ` Neil Horman
  2009-04-14 21:45           ` Christoph Lameter
  2009-04-15 22:16         ` David Stevens
  1 sibling, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-04-14 20:44 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, David Miller

On Tue, Apr 14, 2009 at 02:48:48PM -0400, Christoph Lameter wrote:
> Neil: Could you test this?
> 
I can and will, although it will take me a few days to get a system I can play
with it on.  I really don't think it needs much testing as it clearly provides
the functionality that you describe. 

That said, I still disagree with the addition of this switch, as its
superfolous.  Programatically an application can do what you want without this
change already .If you provide me with the test application that you've been working with, I can
demonstrate exactly how.

Regards
Neil

> 
> Subject: Multicast: Avoid useless duplication of multicast messages
> 
> If two processes open the same port as a multicast socket and then
> join two different multicast groups then traffic for both multicast groups
> is forwarded to either process. This means that application will get surprising
> data that they did not ask for. Applications will have to filter these out in
> order to work correctly if multiple apps run on the same system.
> 
> These are pretty strange semantics but they have been around since the
> beginning of multicast support on Unix systems.
> 
> Add an option
> 
> 	igmp_mc_socket_based_filtering
> 
> that is off by default so that the default behavior stays as is.
> 
> If one wants to have sane multicast behavior for the above case
> then this option can be set. Thereupon applications will not get
> additional traffic forwarded to them if they happen to run on a host
> where another application also receives multicast traffic from a
> different multicast group.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  Documentation/networking/ip-sysctl.txt |   10 ++++++++++
>  include/linux/igmp.h                   |    1 +
>  include/linux/sysctl.h                 |    1 +
>  net/ipv4/igmp.c                        |    6 +++---
>  net/ipv4/sysctl_net_ipv4.c             |    8 ++++++++
>  5 files changed, 23 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6/net/ipv4/igmp.c
> ===================================================================
> --- linux-2.6.orig/net/ipv4/igmp.c	2009-04-14 13:03:14.000000000 -0500
> +++ linux-2.6/net/ipv4/igmp.c	2009-04-14 13:11:38.000000000 -0500
> @@ -1419,7 +1419,7 @@ static struct in_device *ip_mc_find_dev(
>   */
>  int sysctl_igmp_max_memberships __read_mostly = IP_MAX_MEMBERSHIPS;
>  int sysctl_igmp_max_msf __read_mostly = IP_MAX_MSF;
> -
> +int sysctl_igmp_mc_socket_based_filtering = 0;
> 
>  static int ip_mc_del1_src(struct ip_mc_list *pmc, int sfmode,
>  	__be32 *psfsrc)
> @@ -2187,7 +2187,7 @@ int ip_mc_sf_allow(struct sock *sk, __be
>  	struct ip_sf_socklist *psl;
>  	int i;
> 
> -	if (!ipv4_is_multicast(loc_addr))
> +	if (ipv4_is_lbcast(loc_addr) || !ipv4_is_multicast(loc_addr))
>  		return 1;
> 
>  	for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
> @@ -2196,7 +2196,7 @@ int ip_mc_sf_allow(struct sock *sk, __be
>  			break;
>  	}
>  	if (!pmc)
> -		return 1;
> +		return !sysctl_igmp_mc_socket_based_filtering;
>  	psl = pmc->sflist;
>  	if (!psl)
>  		return pmc->sfmode == MCAST_EXCLUDE;
> Index: linux-2.6/include/linux/igmp.h
> ===================================================================
> --- linux-2.6.orig/include/linux/igmp.h	2009-04-14 13:13:14.000000000 -0500
> +++ linux-2.6/include/linux/igmp.h	2009-04-14 13:41:14.000000000 -0500
> @@ -150,6 +150,7 @@ static inline struct igmpv3_query *
> 
>  extern int sysctl_igmp_max_memberships;
>  extern int sysctl_igmp_max_msf;
> +extern int sysctl_igmp_mc_socket_based_filtering;
> 
>  struct ip_sf_socklist
>  {
> Index: linux-2.6/include/linux/sysctl.h
> ===================================================================
> --- linux-2.6.orig/include/linux/sysctl.h	2009-04-14 13:15:57.000000000 -0500
> +++ linux-2.6/include/linux/sysctl.h	2009-04-14 13:16:49.000000000 -0500
> @@ -435,6 +435,7 @@ enum
>  	NET_TCP_ALLOWED_CONG_CONTROL=123,
>  	NET_TCP_MAX_SSTHRESH=124,
>  	NET_TCP_FRTO_RESPONSE=125,
> +	NET_IPV4_IGMP_MC_SOCKET_BASED_FILTERING=126,
>  };
> 
>  enum {
> Index: linux-2.6/net/ipv4/sysctl_net_ipv4.c
> ===================================================================
> --- linux-2.6.orig/net/ipv4/sysctl_net_ipv4.c	2009-04-14 13:13:53.000000000 -0500
> +++ linux-2.6/net/ipv4/sysctl_net_ipv4.c	2009-04-14 13:15:44.000000000 -0500
> @@ -408,6 +408,14 @@ static struct ctl_table ipv4_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec
>  	},
> +	{
> +		.ctl_name	= NET_IPV4_IGMP_MC_SOCKET_BASED_FILTERING,
> +		.procname	= "igmp_mc_socked_based_filtering",
> +		.data		= &sysctl_igmp_mc_socket_based_filtering,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec
> +	},
> 
>  #endif
>  	{
> Index: linux-2.6/Documentation/networking/ip-sysctl.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/networking/ip-sysctl.txt	2009-04-14 13:48:09.000000000 -0500
> +++ linux-2.6/Documentation/networking/ip-sysctl.txt	2009-04-14 13:53:10.000000000 -0500
> @@ -611,6 +611,16 @@ igmp_max_memberships - INTEGER
>  	Change the maximum number of multicast groups we can subscribe to.
>  	Default: 20
> 
> +igmp_mc_socket_based_filtering - INTEGER
> +	Use the list of subscribed multicast addresses to filter the traffic
> +	going to a multicast socket. If set to zero then multicast traffic
> +        is forwarded to any socket subscribed to a port number ignoring the
> +        list of multicast groups that a socket has been subscribed to. This mode
> +        is the default since it has been done that way in the past.
> +	If set to one then only multicast traffic of the multicast groups
> +        that a socket has joined are forwarded to the socket.
> +	Default: 0
> +
>  conf/interface/*  changes special settings per interface (where "interface" is
>  		  the name of your network interface)
>  conf/all/*	  is special, changes the settings for all interfaces
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-14 20:44         ` Neil Horman
@ 2009-04-14 21:45           ` Christoph Lameter
  2009-04-15 11:07             ` Neil Horman
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2009-04-14 21:45 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David Miller

On Tue, 14 Apr 2009, Neil Horman wrote:

> That said, I still disagree with the addition of this switch, as its
> superfolous.  Programatically an application can do what you want without this
> change already .If you provide me with the test application that you've been working with, I can
> demonstrate exactly how.

Ok. First applications listens to multicast groups A and B. Second one
uses C and D. All use a port 4711 for communication which is the
standard for a certain application. Both applications are very latency
sensitive. You do not want the data to be duplicated into the user space
of both processes.

Both processes need to reply to messages on the
multicast groups. The receiver always needs the source IP address for
authentication otherwise messages are ignored

How are you going to do this? Its trivial to do with this patch and one
socket in each process listening to port 4711 that subscribes to the
necessary multicast groups. Plus one the problem is that some of the
applications here are only available in binary form and they were naively
coded assuming that the socket layer would not need exotic tricks to
convince it to do the right thing.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-14 21:45           ` Christoph Lameter
@ 2009-04-15 11:07             ` Neil Horman
  2009-04-15 12:51               ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-04-15 11:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, David Miller

On Tue, Apr 14, 2009 at 05:45:58PM -0400, Christoph Lameter wrote:
> On Tue, 14 Apr 2009, Neil Horman wrote:
> 
> > That said, I still disagree with the addition of this switch, as its
> > superfolous.  Programatically an application can do what you want without this
> > change already .If you provide me with the test application that you've been working with, I can
> > demonstrate exactly how.
> 
> Ok. First applications listens to multicast groups A and B. Second one
> uses C and D. All use a port 4711 for communication which is the
> standard for a certain application. Both applications are very latency
> sensitive. You do not want the data to be duplicated into the user space
> of both processes.
> 
> Both processes need to reply to messages on the
> multicast groups. The receiver always needs the source IP address for
> authentication otherwise messages are ignored
> 
> How are you going to do this? 
I'm going to write each application to use 2 sockets, one bound to each
multicast group.  Thats the way it works now.  I think you missed the obvious in
your construction of this example.

> Its trivial to do with this patch and one
> socket in each process listening to port 4711 that subscribes to the
> necessary multicast groups. 
Its trivial without the patch as well.

> Plus one the problem is that some of the
> applications here are only available in binary form and they were naively
> coded assuming that the socket layer would not need exotic tricks to
> convince it to do the right thing.
What?  If you have binary only software thats written for Linux and it
malfunctions,  This has been the accepted and standard multicast behavior on
Linux and various other unicies for decades.  What you are describing is a
vendor bug in your application, something you need to either contact the vendor
about, or find a new vendor.  They made an application based on an assumption,
and their assumption was wrong, they need to fix it.  We could do them a favor,
I suppose, but I don't see why.

It boils down to this:  This is the way multicast subscriptions have worked in
bsd, linux, and presumably various other unix and non-unix operating systems for
lord only knows how long.  Provide some documentation that shows its in
violation of a newer standard, or that it is common practice to behave
differently on another OS (such that including this directive would make porting
easier).  As it stands currently, this patch only serves to create a crutch to
perpetuate misundersandings about how the behavior currently works.

Neil

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 11:07             ` Neil Horman
@ 2009-04-15 12:51               ` Christoph Lameter
  2009-04-15 14:22                 ` Neil Horman
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2009-04-15 12:51 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David Miller

On Wed, 15 Apr 2009, Neil Horman wrote:

> > How are you going to do this?
> I'm going to write each application to use 2 sockets, one bound to each
> multicast group.  Thats the way it works now.  I think you missed the obvious in
> your construction of this example.

Ok it could be done with binding. But you would need 3 sockets. One per MC
groups bound to a MC group each and then one for the replies (hmmm...
looks like you could use SO_BINDTODEVICE on one socket to get around the
third one --there is even an exception case for this in inet_bind causing
more weird semantics-- but then the application needs to know the device
name of the NIC, argh)

> > Its trivial to do with this patch and one
> > socket in each process listening to port 4711 that subscribes to the
> > necessary multicast groups.
> Its trivial without the patch as well.

I do not see how you can justify making such a statement.

> It boils down to this:  This is the way multicast subscriptions have worked in
> bsd, linux, and presumably various other unix and non-unix operating systems for
> lord only knows how long.  Provide some documentation that shows its in
> violation of a newer standard, or that it is common practice to behave
> differently on another OS (such that including this directive would make porting
> easier).  As it stands currently, this patch only serves to create a crutch to
> perpetuate misundersandings about how the behavior currently works.

The way things work is counterintuitive and leads to weird code constructs
with the application having to manage multiple sockets because weird
semantics have developed over the years.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 12:51               ` Christoph Lameter
@ 2009-04-15 14:22                 ` Neil Horman
  2009-04-15 14:41                   ` Vlad Yasevich
  0 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-04-15 14:22 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, David Miller

On Wed, Apr 15, 2009 at 08:51:35AM -0400, Christoph Lameter wrote:
> On Wed, 15 Apr 2009, Neil Horman wrote:
> 
> > > How are you going to do this?
> > I'm going to write each application to use 2 sockets, one bound to each
> > multicast group.  Thats the way it works now.  I think you missed the obvious in
> > your construction of this example.
> 
> Ok it could be done with binding. But you would need 3 sockets. One per MC
> groups bound to a MC group each and then one for the replies (hmmm...
> looks like you could use SO_BINDTODEVICE on one socket to get around the
Depending on your setup, 2 is perfectly sufficient.  In fact 1 can be sufficient
if you want to filter in your application, but we've been over that.

> third one --there is even an exception case for this in inet_bind causing
> more weird semantics-- but then the application needs to know the device
> name of the NIC, argh)
Of course it does, but thats zero incremental cost, since you need to know the
device name anyway, when specifying the ifindex to the join request.

> > > Its trivial to do with this patch and one
> > > socket in each process listening to port 4711 that subscribes to the
> > > necessary multicast groups.
> > Its trivial without the patch as well.
> 
> I do not see how you can justify making such a statement.
> 
I find it justified because I don't see an application using 2 or 3 sockets and a poll or
select call as anything more than trivial.  If you find that to be non-trivial,
perhaps a refresher programming course might be in order for you?

> > It boils down to this:  This is the way multicast subscriptions have worked in
> > bsd, linux, and presumably various other unix and non-unix operating systems for
> > lord only knows how long.  Provide some documentation that shows its in
> > violation of a newer standard, or that it is common practice to behave
> > differently on another OS (such that including this directive would make porting
> > easier).  As it stands currently, this patch only serves to create a crutch to
> > perpetuate misundersandings about how the behavior currently works.
> 
> The way things work is counterintuitive and leads to weird code constructs
> with the application having to manage multiple sockets because weird
> semantics have developed over the years.

The way things work is counterintuitive to _you_ (is it was to me a few months
ago).  That asside, I came to understand how this actually works, how it has
worked for decades, and how programmers have successfully written applications
that use this model over that time period.  Can we modify the model?  Sure.
Should we?  I certainly don't see any need, given that it does little except
change the model.  For those who understand it, its compltely useless.  I'm
willing to concede that I'm wrong, but not without some modicum of evidence that
this change will benefit existing applications.  If some other operating system
adheres to the model you expect it to, perhaps this has legs, but I don't know
of any that do.  The current model, even if counter intuitive, is well defined,
well understood, and documented.  I fail to see how adding an alternate,
undocumented model (that may itself be counterintuitive to all the developers
who have developed under the current model) adds anything significant.

Neil


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 14:22                 ` Neil Horman
@ 2009-04-15 14:41                   ` Vlad Yasevich
  2009-04-15 15:57                     ` Neil Horman
  2009-04-15 21:42                     ` David Stevens
  0 siblings, 2 replies; 28+ messages in thread
From: Vlad Yasevich @ 2009-04-15 14:41 UTC (permalink / raw)
  To: Neil Horman; +Cc: Christoph Lameter, netdev, David Miller, David Stevens

Neil Horman wrote:
> On Wed, Apr 15, 2009 at 08:51:35AM -0400, Christoph Lameter wrote:
>> On Wed, 15 Apr 2009, Neil Horman wrote:
>>
>>>> How are you going to do this?
>>> I'm going to write each application to use 2 sockets, one bound to each
>>> multicast group.  Thats the way it works now.  I think you missed the obvious in
>>> your construction of this example.
>> Ok it could be done with binding. But you would need 3 sockets. One per MC
>> groups bound to a MC group each and then one for the replies (hmmm...
>> looks like you could use SO_BINDTODEVICE on one socket to get around the
> Depending on your setup, 2 is perfectly sufficient.  In fact 1 can be sufficient
> if you want to filter in your application, but we've been over that.
> 
>> third one --there is even an exception case for this in inet_bind causing
>> more weird semantics-- but then the application needs to know the device
>> name of the NIC, argh)
> Of course it does, but thats zero incremental cost, since you need to know the
> device name anyway, when specifying the ifindex to the join request.
> 
>>>> Its trivial to do with this patch and one
>>>> socket in each process listening to port 4711 that subscribes to the
>>>> necessary multicast groups.
>>> Its trivial without the patch as well.
>> I do not see how you can justify making such a statement.
>>
> I find it justified because I don't see an application using 2 or 3 sockets and a poll or
> select call as anything more than trivial.  If you find that to be non-trivial,
> perhaps a refresher programming course might be in order for you?
> 
>>> It boils down to this:  This is the way multicast subscriptions have worked in
>>> bsd, linux, and presumably various other unix and non-unix operating systems for
>>> lord only knows how long.  Provide some documentation that shows its in
>>> violation of a newer standard, or that it is common practice to behave
>>> differently on another OS (such that including this directive would make porting
>>> easier).  As it stands currently, this patch only serves to create a crutch to
>>> perpetuate misundersandings about how the behavior currently works.
>> The way things work is counterintuitive and leads to weird code constructs
>> with the application having to manage multiple sockets because weird
>> semantics have developed over the years.
> 
> The way things work is counterintuitive to _you_ (is it was to me a few months
> ago).  That asside, I came to understand how this actually works, how it has
> worked for decades, and how programmers have successfully written applications
> that use this model over that time period.  Can we modify the model?  Sure.
> Should we?  I certainly don't see any need, given that it does little except
> change the model.  For those who understand it, its compltely useless.  I'm
> willing to concede that I'm wrong, but not without some modicum of evidence that
> this change will benefit existing applications.  If some other operating system
> adheres to the model you expect it to, perhaps this has legs, but I don't know
> of any that do.  The current model, even if counter intuitive, is well defined,
> well understood, and documented.  I fail to see how adding an alternate,
> undocumented model (that may itself be counterintuitive to all the developers
> who have developed under the current model) adds anything significant.
> 
> Neil

Hi Neil

This has been somewhat bugging me for a while, so I went digging.

Here is a rather pertinent text that points out that we "might" have a bug.
RFC 4607:

4.2.  Requirements on the Host IP Module

   An incoming datagram destined to an SSM address MUST be delivered by
   the IP module to all sockets that have indicated (via Subscribe) a
   desire to receive data that matches the datagram's source address,
   destination address, and arriving interface.  It MUST NOT be
   delivered to other sockets.


Additionally, RFC 3678 describes IP_ADD_MEMBERSHIP as an 'any-source group'
and is allowed by the SSM spec.  This is also how it is implemented in the kernel.
However, we do not appear to perform the filtering required by the above quoted
section 4.2.  In particular, if we fail to match the 'datagram's destination address',
we deliver the packet, which I believe is in violation of the "MUST NOT" above.

I've CC'd Dave Stevens, since I'd like to hear his opinion regarding this text.

Thanks
-vlad

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 14:41                   ` Vlad Yasevich
@ 2009-04-15 15:57                     ` Neil Horman
  2009-04-15 16:07                       ` Vlad Yasevich
  2009-04-15 21:42                     ` David Stevens
  1 sibling, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-04-15 15:57 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Christoph Lameter, netdev, David Miller, David Stevens

On Wed, Apr 15, 2009 at 10:41:03AM -0400, Vlad Yasevich wrote:
> Neil Horman wrote:
> > On Wed, Apr 15, 2009 at 08:51:35AM -0400, Christoph Lameter wrote:
> >> On Wed, 15 Apr 2009, Neil Horman wrote:
> >>
> >>>> How are you going to do this?
> >>> I'm going to write each application to use 2 sockets, one bound to each
> >>> multicast group.  Thats the way it works now.  I think you missed the obvious in
> >>> your construction of this example.
> >> Ok it could be done with binding. But you would need 3 sockets. One per MC
> >> groups bound to a MC group each and then one for the replies (hmmm...
> >> looks like you could use SO_BINDTODEVICE on one socket to get around the
> > Depending on your setup, 2 is perfectly sufficient.  In fact 1 can be sufficient
> > if you want to filter in your application, but we've been over that.
> > 
> >> third one --there is even an exception case for this in inet_bind causing
> >> more weird semantics-- but then the application needs to know the device
> >> name of the NIC, argh)
> > Of course it does, but thats zero incremental cost, since you need to know the
> > device name anyway, when specifying the ifindex to the join request.
> > 
> >>>> Its trivial to do with this patch and one
> >>>> socket in each process listening to port 4711 that subscribes to the
> >>>> necessary multicast groups.
> >>> Its trivial without the patch as well.
> >> I do not see how you can justify making such a statement.
> >>
> > I find it justified because I don't see an application using 2 or 3 sockets and a poll or
> > select call as anything more than trivial.  If you find that to be non-trivial,
> > perhaps a refresher programming course might be in order for you?
> > 
> >>> It boils down to this:  This is the way multicast subscriptions have worked in
> >>> bsd, linux, and presumably various other unix and non-unix operating systems for
> >>> lord only knows how long.  Provide some documentation that shows its in
> >>> violation of a newer standard, or that it is common practice to behave
> >>> differently on another OS (such that including this directive would make porting
> >>> easier).  As it stands currently, this patch only serves to create a crutch to
> >>> perpetuate misundersandings about how the behavior currently works.
> >> The way things work is counterintuitive and leads to weird code constructs
> >> with the application having to manage multiple sockets because weird
> >> semantics have developed over the years.
> > 
> > The way things work is counterintuitive to _you_ (is it was to me a few months
> > ago).  That asside, I came to understand how this actually works, how it has
> > worked for decades, and how programmers have successfully written applications
> > that use this model over that time period.  Can we modify the model?  Sure.
> > Should we?  I certainly don't see any need, given that it does little except
> > change the model.  For those who understand it, its compltely useless.  I'm
> > willing to concede that I'm wrong, but not without some modicum of evidence that
> > this change will benefit existing applications.  If some other operating system
> > adheres to the model you expect it to, perhaps this has legs, but I don't know
> > of any that do.  The current model, even if counter intuitive, is well defined,
> > well understood, and documented.  I fail to see how adding an alternate,
> > undocumented model (that may itself be counterintuitive to all the developers
> > who have developed under the current model) adds anything significant.
> > 
> > Neil
> 
> Hi Neil
> 
> This has been somewhat bugging me for a while, so I went digging.
> 
> Here is a rather pertinent text that points out that we "might" have a bug.
> RFC 4607:
> 
> 4.2.  Requirements on the Host IP Module
> 
>    An incoming datagram destined to an SSM address MUST be delivered by
>    the IP module to all sockets that have indicated (via Subscribe) a
>    desire to receive data that matches the datagram's source address,
>    destination address, and arriving interface.  It MUST NOT be
>    delivered to other sockets.
> 
I'll let David respond more fully, since I'm not familiar with this RFC, but a
quick read would suggest that (from the abstract), this only applies to a subset
of addresses, which are not being used in the application in question here.
>From what I read, the RFC defines an extenstion to the sockets api which allows
you to subscribe to a multicast group from a specific source, using one of the
reserved muticast ranges provided in the abstract.  It appears that we support
this RFC via the IP_ADD_SOURCE_MEMBERSHIP socket option.  Now, if we allow
sockets that issue IP_ADD_SOURCE_MEMBERSHIP calls to receive datagrams from
multicast addresses within the range defined by the rfc from other sources that
they have not subscribed to, yes we have a bug, but thats not overly relevant I
think to Christophs problem, since he's using the any-source model, and its
corresponding addresses.  Switching to the specific-source model would solve his
immeidate problem here that we've been debating, but would likely introduce a
new set, in that he would then have to write his app to subscribe to the myrriad
of sources that are sending to that multicast group.

> 
> Additionally, RFC 3678 describes IP_ADD_MEMBERSHIP as an 'any-source group'
> and is allowed by the SSM spec.  This is also how it is implemented in the kernel.
> However, we do not appear to perform the filtering required by the above quoted
> section 4.2.  
Very true, so we may have a bug in the SSM model, but again, thats not what
Christoph is using, its the any-source model, using group address unrelated to
the ssm RFC.

In particular, if we fail to match the 'datagram's destination address',
> we deliver the packet, which I believe is in violation of the "MUST NOT" above.
> 
I think only if the SSM model is used via the socket extensions the RFC
describes.  If Christophs app is subscribing via IP_ADD_SOURCE_MEMBERSHIP, then
yes, we have a problem.  But everything I've read says he uses the standard, any-source
IP_ADD_MEMBERSHIP option which I think makes assertions from RFC 4607 void.
Christoph, are you using IP_ADD_SOURCE_MEMBERSHIP?

Neil

> I've CC'd Dave Stevens, since I'd like to hear his opinion regarding this text.
> 
> Thanks
> -vlad
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 15:57                     ` Neil Horman
@ 2009-04-15 16:07                       ` Vlad Yasevich
  2009-04-15 16:38                         ` Neil Horman
  0 siblings, 1 reply; 28+ messages in thread
From: Vlad Yasevich @ 2009-04-15 16:07 UTC (permalink / raw)
  To: Neil Horman; +Cc: Christoph Lameter, netdev, David Miller, David Stevens

Neil Horman wrote:
> On Wed, Apr 15, 2009 at 10:41:03AM -0400, Vlad Yasevich wrote:
>> Neil Horman wrote:
>>> On Wed, Apr 15, 2009 at 08:51:35AM -0400, Christoph Lameter wrote:
>>>> On Wed, 15 Apr 2009, Neil Horman wrote:
>>>>
>>>>>> How are you going to do this?
>>>>> I'm going to write each application to use 2 sockets, one bound to each
>>>>> multicast group.  Thats the way it works now.  I think you missed the obvious in
>>>>> your construction of this example.
>>>> Ok it could be done with binding. But you would need 3 sockets. One per MC
>>>> groups bound to a MC group each and then one for the replies (hmmm...
>>>> looks like you could use SO_BINDTODEVICE on one socket to get around the
>>> Depending on your setup, 2 is perfectly sufficient.  In fact 1 can be sufficient
>>> if you want to filter in your application, but we've been over that.
>>>
>>>> third one --there is even an exception case for this in inet_bind causing
>>>> more weird semantics-- but then the application needs to know the device
>>>> name of the NIC, argh)
>>> Of course it does, but thats zero incremental cost, since you need to know the
>>> device name anyway, when specifying the ifindex to the join request.
>>>
>>>>>> Its trivial to do with this patch and one
>>>>>> socket in each process listening to port 4711 that subscribes to the
>>>>>> necessary multicast groups.
>>>>> Its trivial without the patch as well.
>>>> I do not see how you can justify making such a statement.
>>>>
>>> I find it justified because I don't see an application using 2 or 3 sockets and a poll or
>>> select call as anything more than trivial.  If you find that to be non-trivial,
>>> perhaps a refresher programming course might be in order for you?
>>>
>>>>> It boils down to this:  This is the way multicast subscriptions have worked in
>>>>> bsd, linux, and presumably various other unix and non-unix operating systems for
>>>>> lord only knows how long.  Provide some documentation that shows its in
>>>>> violation of a newer standard, or that it is common practice to behave
>>>>> differently on another OS (such that including this directive would make porting
>>>>> easier).  As it stands currently, this patch only serves to create a crutch to
>>>>> perpetuate misundersandings about how the behavior currently works.
>>>> The way things work is counterintuitive and leads to weird code constructs
>>>> with the application having to manage multiple sockets because weird
>>>> semantics have developed over the years.
>>> The way things work is counterintuitive to _you_ (is it was to me a few months
>>> ago).  That asside, I came to understand how this actually works, how it has
>>> worked for decades, and how programmers have successfully written applications
>>> that use this model over that time period.  Can we modify the model?  Sure.
>>> Should we?  I certainly don't see any need, given that it does little except
>>> change the model.  For those who understand it, its compltely useless.  I'm
>>> willing to concede that I'm wrong, but not without some modicum of evidence that
>>> this change will benefit existing applications.  If some other operating system
>>> adheres to the model you expect it to, perhaps this has legs, but I don't know
>>> of any that do.  The current model, even if counter intuitive, is well defined,
>>> well understood, and documented.  I fail to see how adding an alternate,
>>> undocumented model (that may itself be counterintuitive to all the developers
>>> who have developed under the current model) adds anything significant.
>>>
>>> Neil
>> Hi Neil
>>
>> This has been somewhat bugging me for a while, so I went digging.
>>
>> Here is a rather pertinent text that points out that we "might" have a bug.
>> RFC 4607:
>>
>> 4.2.  Requirements on the Host IP Module
>>
>>    An incoming datagram destined to an SSM address MUST be delivered by
>>    the IP module to all sockets that have indicated (via Subscribe) a
>>    desire to receive data that matches the datagram's source address,
>>    destination address, and arriving interface.  It MUST NOT be
>>    delivered to other sockets.
>>
> I'll let David respond more fully, since I'm not familiar with this RFC, but a
> quick read would suggest that (from the abstract), this only applies to a subset
> of addresses, which are not being used in the application in question here.
> From what I read, the RFC defines an extenstion to the sockets api which allows
> you to subscribe to a multicast group from a specific source, using one of the
> reserved muticast ranges provided in the abstract.  It appears that we support
> this RFC via the IP_ADD_SOURCE_MEMBERSHIP socket option.  Now, if we allow
> sockets that issue IP_ADD_SOURCE_MEMBERSHIP calls to receive datagrams from
> multicast addresses within the range defined by the rfc from other sources that
> they have not subscribed to, yes we have a bug, but thats not overly relevant I
> think to Christophs problem, since he's using the any-source model, and its
> corresponding addresses.  Switching to the specific-source model would solve his
> immeidate problem here that we've been debating, but would likely introduce a
> new set, in that he would then have to write his app to subscribe to the myrriad
> of sources that are sending to that multicast group.

The problem is that if an application follows an IP_ADD_MEMBERSHIP call with
an IP_BLOCK_SOURCE call, thus extending the exclude the list, we would still
deliver packets that don't match the multicast destination.  That violates the
above SSM requirement.  It appears to be an API bug that allows for a violation
of the protocol specification.

> 
>> Additionally, RFC 3678 describes IP_ADD_MEMBERSHIP as an 'any-source group'
>> and is allowed by the SSM spec.  This is also how it is implemented in the kernel.
>> However, we do not appear to perform the filtering required by the above quoted
>> section 4.2.  
> Very true, so we may have a bug in the SSM model, but again, thats not what
> Christoph is using, its the any-source model, using group address unrelated to
> the ssm RFC.
> 

See above.  IP_ADD_MEMBERSHIP is also part of the ssm model since it can be
followed with IP_BLOCK_SOURCE.  They have to work together, but the socket matching
code is ignoring it if it can't find the multicast in the socket's list.

-vlad

> In particular, if we fail to match the 'datagram's destination address',
>> we deliver the packet, which I believe is in violation of the "MUST NOT" above.
>>
> I think only if the SSM model is used via the socket extensions the RFC
> describes.  If Christophs app is subscribing via IP_ADD_SOURCE_MEMBERSHIP, then
> yes, we have a problem.  But everything I've read says he uses the standard, any-source
> IP_ADD_MEMBERSHIP option which I think makes assertions from RFC 4607 void.
> Christoph, are you using IP_ADD_SOURCE_MEMBERSHIP?
> 
> Neil
> 
>> I've CC'd Dave Stevens, since I'd like to hear his opinion regarding this text.
>>
>> Thanks
>> -vlad
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 16:07                       ` Vlad Yasevich
@ 2009-04-15 16:38                         ` Neil Horman
  2009-04-15 17:19                           ` Vlad Yasevich
  0 siblings, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-04-15 16:38 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Christoph Lameter, netdev, David Miller, David Stevens

On Wed, Apr 15, 2009 at 12:07:23PM -0400, Vlad Yasevich wrote:
> Neil Horman wrote:
> > On Wed, Apr 15, 2009 at 10:41:03AM -0400, Vlad Yasevich wrote:
> >> Neil Horman wrote:
> >>> On Wed, Apr 15, 2009 at 08:51:35AM -0400, Christoph Lameter wrote:
> >>>> On Wed, 15 Apr 2009, Neil Horman wrote:
> >>>>
> >>>>>> How are you going to do this?
> >>>>> I'm going to write each application to use 2 sockets, one bound to each
> >>>>> multicast group.  Thats the way it works now.  I think you missed the obvious in
> >>>>> your construction of this example.
> >>>> Ok it could be done with binding. But you would need 3 sockets. One per MC
> >>>> groups bound to a MC group each and then one for the replies (hmmm...
> >>>> looks like you could use SO_BINDTODEVICE on one socket to get around the
> >>> Depending on your setup, 2 is perfectly sufficient.  In fact 1 can be sufficient
> >>> if you want to filter in your application, but we've been over that.
> >>>
> >>>> third one --there is even an exception case for this in inet_bind causing
> >>>> more weird semantics-- but then the application needs to know the device
> >>>> name of the NIC, argh)
> >>> Of course it does, but thats zero incremental cost, since you need to know the
> >>> device name anyway, when specifying the ifindex to the join request.
> >>>
> >>>>>> Its trivial to do with this patch and one
> >>>>>> socket in each process listening to port 4711 that subscribes to the
> >>>>>> necessary multicast groups.
> >>>>> Its trivial without the patch as well.
> >>>> I do not see how you can justify making such a statement.
> >>>>
> >>> I find it justified because I don't see an application using 2 or 3 sockets and a poll or
> >>> select call as anything more than trivial.  If you find that to be non-trivial,
> >>> perhaps a refresher programming course might be in order for you?
> >>>
> >>>>> It boils down to this:  This is the way multicast subscriptions have worked in
> >>>>> bsd, linux, and presumably various other unix and non-unix operating systems for
> >>>>> lord only knows how long.  Provide some documentation that shows its in
> >>>>> violation of a newer standard, or that it is common practice to behave
> >>>>> differently on another OS (such that including this directive would make porting
> >>>>> easier).  As it stands currently, this patch only serves to create a crutch to
> >>>>> perpetuate misundersandings about how the behavior currently works.
> >>>> The way things work is counterintuitive and leads to weird code constructs
> >>>> with the application having to manage multiple sockets because weird
> >>>> semantics have developed over the years.
> >>> The way things work is counterintuitive to _you_ (is it was to me a few months
> >>> ago).  That asside, I came to understand how this actually works, how it has
> >>> worked for decades, and how programmers have successfully written applications
> >>> that use this model over that time period.  Can we modify the model?  Sure.
> >>> Should we?  I certainly don't see any need, given that it does little except
> >>> change the model.  For those who understand it, its compltely useless.  I'm
> >>> willing to concede that I'm wrong, but not without some modicum of evidence that
> >>> this change will benefit existing applications.  If some other operating system
> >>> adheres to the model you expect it to, perhaps this has legs, but I don't know
> >>> of any that do.  The current model, even if counter intuitive, is well defined,
> >>> well understood, and documented.  I fail to see how adding an alternate,
> >>> undocumented model (that may itself be counterintuitive to all the developers
> >>> who have developed under the current model) adds anything significant.
> >>>
> >>> Neil
> >> Hi Neil
> >>
> >> This has been somewhat bugging me for a while, so I went digging.
> >>
> >> Here is a rather pertinent text that points out that we "might" have a bug.
> >> RFC 4607:
> >>
> >> 4.2.  Requirements on the Host IP Module
> >>
> >>    An incoming datagram destined to an SSM address MUST be delivered by
> >>    the IP module to all sockets that have indicated (via Subscribe) a
> >>    desire to receive data that matches the datagram's source address,
> >>    destination address, and arriving interface.  It MUST NOT be
> >>    delivered to other sockets.
> >>
> > I'll let David respond more fully, since I'm not familiar with this RFC, but a
> > quick read would suggest that (from the abstract), this only applies to a subset
> > of addresses, which are not being used in the application in question here.
> > From what I read, the RFC defines an extenstion to the sockets api which allows
> > you to subscribe to a multicast group from a specific source, using one of the
> > reserved muticast ranges provided in the abstract.  It appears that we support
> > this RFC via the IP_ADD_SOURCE_MEMBERSHIP socket option.  Now, if we allow
> > sockets that issue IP_ADD_SOURCE_MEMBERSHIP calls to receive datagrams from
> > multicast addresses within the range defined by the rfc from other sources that
> > they have not subscribed to, yes we have a bug, but thats not overly relevant I
> > think to Christophs problem, since he's using the any-source model, and its
> > corresponding addresses.  Switching to the specific-source model would solve his
> > immeidate problem here that we've been debating, but would likely introduce a
> > new set, in that he would then have to write his app to subscribe to the myrriad
> > of sources that are sending to that multicast group.
> 
> The problem is that if an application follows an IP_ADD_MEMBERSHIP call with
> an IP_BLOCK_SOURCE call, thus extending the exclude the list, we would still
> deliver packets that don't match the multicast destination.  That violates the
> above SSM requirement.  It appears to be an API bug that allows for a violation
> of the protocol specification.
> 
Ok, but IP_BLOCK_SOURCE is part of the any-source api, not the SSM api (although
it allows it).  I agree what you've described is a bug, but its a bug against
the SSM RFC, not RFC 3678, which is what Christoph is using based on the
selection of his multicast address being outside the range of that defined by
SSM.

> > 
> >> Additionally, RFC 3678 describes IP_ADD_MEMBERSHIP as an 'any-source group'
> >> and is allowed by the SSM spec.  This is also how it is implemented in the kernel.
> >> However, we do not appear to perform the filtering required by the above quoted
> >> section 4.2.  
> > Very true, so we may have a bug in the SSM model, but again, thats not what
> > Christoph is using, its the any-source model, using group address unrelated to
> > the ssm RFC.
> > 
> 
> See above.  IP_ADD_MEMBERSHIP is also part of the ssm model since it can be
> followed with IP_BLOCK_SOURCE.  They have to work together, but the socket matching
> code is ignoring it if it can't find the multicast in the socket's list.
> 
Also see above, hes using multicast group 239.x.x.x.  SSM only encompases
addresses in the 232.0.0.1 to 232.255.255.255 range.  He's using ASM not SSM,
regardless of what SSM allows.  I agree it sounds like we have a bug in SSM
behavior, but its not overly relevant to this discussion (saving for the fact
that his new feature would inadvertantly fix the bug, in addition to altering
ASM behavior).  If we want to fix the SSM bug, thats great, lets fix it, but
lets not do it by introducing a new behavior to ASM.

And thats all moot anyway, becaues Christoph (unless I'm mistaken) is not, and
does not want to restrict source sending privlidges.  He wants to get data on
multicast groups from all/any source, he just doesn't want to get multicast data
from groups he didn't explicitly join in the app doing the receiving, which is
exactly what you can currently use bind for.

Neil

> -vlad
> 
> > In particular, if we fail to match the 'datagram's destination address',
> >> we deliver the packet, which I believe is in violation of the "MUST NOT" above.
> >>
> > I think only if the SSM model is used via the socket extensions the RFC
> > describes.  If Christophs app is subscribing via IP_ADD_SOURCE_MEMBERSHIP, then
> > yes, we have a problem.  But everything I've read says he uses the standard, any-source
> > IP_ADD_MEMBERSHIP option which I think makes assertions from RFC 4607 void.
> > Christoph, are you using IP_ADD_SOURCE_MEMBERSHIP?
> > 
> > Neil
> > 
> >> I've CC'd Dave Stevens, since I'd like to hear his opinion regarding this text.
> >>
> >> Thanks
> >> -vlad
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 16:38                         ` Neil Horman
@ 2009-04-15 17:19                           ` Vlad Yasevich
  2009-04-15 17:53                             ` Neil Horman
  2009-04-15 19:17                             ` Christoph Lameter
  0 siblings, 2 replies; 28+ messages in thread
From: Vlad Yasevich @ 2009-04-15 17:19 UTC (permalink / raw)
  To: Neil Horman; +Cc: Christoph Lameter, netdev, David Miller, David Stevens

Neil Horman wrote:
find the multicast in the socket's list.
>>
> Also see above, hes using multicast group 239.x.x.x.  SSM only encompases
> addresses in the 232.0.0.1 to 232.255.255.255 range.  He's using ASM not SSM,
> regardless of what SSM allows.  I agree it sounds like we have a bug in SSM
> behavior, but its not overly relevant to this discussion (saving for the fact
> that his new feature would inadvertantly fix the bug, in addition to altering
> ASM behavior).  If we want to fix the SSM bug, thats great, lets fix it, but
> lets not do it by introducing a new behavior to ASM.

Sorry, but I don't buy it.  What we have is essentially "backward-brokeness".

Looking at BSD, which was the root of the original brokeness, they have it fixed.
The code will skip sockets that are not members of a particular group.  So, we
are trying really hard to stay bug-for-bug compatible with old implementations.

> 
> And thats all moot anyway, becaues Christoph (unless I'm mistaken) is not, and
> does not want to restrict source sending privlidges.  He wants to get data on
> multicast groups from all/any source, he just doesn't want to get multicast data
> from groups he didn't explicitly join in the app doing the receiving, which is
> exactly what you can currently use bind for.

Let's look at it the other way.  What is broken if we actually filter based on the
socket group membership?  The only applications that will be impacted are ones that
do not join groups themselves and expect to get multicast traffic.  Such applications
are broken to start with.

We already do group membership check for the socket. We simply incorrectly determine
that any socket that doesn't list a group.

What's worse is that if you have a socket that doesn't care about any mulicast
destinations (never did an ADD_MEMBERSHIP), it will still get multicast traffic if
it bound to that port.

We need to take into account the socket's multicast group list.

-vlad

> 
> Neil
> 
>> -vlad
>>
>>> In particular, if we fail to match the 'datagram's destination address',
>>>> we deliver the packet, which I believe is in violation of the "MUST NOT" above.
>>>>
>>> I think only if the SSM model is used via the socket extensions the RFC
>>> describes.  If Christophs app is subscribing via IP_ADD_SOURCE_MEMBERSHIP, then
>>> yes, we have a problem.  But everything I've read says he uses the standard, any-source
>>> IP_ADD_MEMBERSHIP option which I think makes assertions from RFC 4607 void.
>>> Christoph, are you using IP_ADD_SOURCE_MEMBERSHIP?
>>>
>>> Neil
>>>
>>>> I've CC'd Dave Stevens, since I'd like to hear his opinion regarding this text.
>>>>
>>>> Thanks
>>>> -vlad
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 17:19                           ` Vlad Yasevich
@ 2009-04-15 17:53                             ` Neil Horman
  2009-04-15 19:21                               ` Christoph Lameter
  2009-04-15 19:17                             ` Christoph Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: Neil Horman @ 2009-04-15 17:53 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Christoph Lameter, netdev, David Miller, David Stevens

On Wed, Apr 15, 2009 at 01:19:41PM -0400, Vlad Yasevich wrote:
> Neil Horman wrote:
> find the multicast in the socket's list.
> >>
> > Also see above, hes using multicast group 239.x.x.x.  SSM only encompases
> > addresses in the 232.0.0.1 to 232.255.255.255 range.  He's using ASM not SSM,
> > regardless of what SSM allows.  I agree it sounds like we have a bug in SSM
> > behavior, but its not overly relevant to this discussion (saving for the fact
> > that his new feature would inadvertantly fix the bug, in addition to altering
> > ASM behavior).  If we want to fix the SSM bug, thats great, lets fix it, but
> > lets not do it by introducing a new behavior to ASM.
> 
> Sorry, but I don't buy it.  What we have is essentially "backward-brokeness".
> 
> Looking at BSD, which was the root of the original brokeness, they have it fixed.
> The code will skip sockets that are not members of a particular group.  So, we
> are trying really hard to stay bug-for-bug compatible with old implementations.
> 

Despite your assertions, its not broken just because you call it such.  Its
working as its been documented.  If BSD has changed, I'll go look, and as I've
said several times in this thread, if this makes porting from other os'es
easier, than this has legs.  You're the first to have pointed one out, so thank
you.  Regardless however, that doesn't make the current behavior broken.

> > 
> > And thats all moot anyway, becaues Christoph (unless I'm mistaken) is not, and
> > does not want to restrict source sending privlidges.  He wants to get data on
> > multicast groups from all/any source, he just doesn't want to get multicast data
> > from groups he didn't explicitly join in the app doing the receiving, which is
> > exactly what you can currently use bind for.
> 
> Let's look at it the other way.  What is broken if we actually filter based on the
> socket group membership?  The only applications that will be impacted are ones that
> do not join groups themselves and expect to get multicast traffic.  Such applications
> are broken to start with.
> 
I can easily envision on application which expects to get multicast traffic that
doesn't join a group within the context of its own process, specifically relying
on the behavior as its documented today.  Consider a data processing
application whos group management is segmented into a different utility.  This
is really the problem here though isn't it?  A proposal to change the 20 year
old behavior of multicast reception with no way to know how strongly
applications rely on this behavior and no documentation to support the assertion
that the current behavior is broken.

> We already do group membership check for the socket. We simply incorrectly determine
> that any socket that doesn't list a group.
> 
> What's worse is that if you have a socket that doesn't care about any mulicast
> destinations (never did an ADD_MEMBERSHIP), it will still get multicast traffic if
> it bound to that port.
> 
You assume thats true, but you really have no way of knowing thats the case.  I
can imagine plenty of uses for applications that anonymously receive multicast
datagrams.


I'll refer you again to this exact conversation months ago, when I was on the
opposite end of this, and shown to be wrong:
http://kerneltrap.org/mailarchive/linux-netdev/2008/7/11/2430904

Neil

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 17:53                             ` Neil Horman
@ 2009-04-15 19:21                               ` Christoph Lameter
  2009-04-15 19:43                                 ` Neil Horman
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2009-04-15 19:21 UTC (permalink / raw)
  To: Neil Horman; +Cc: Vlad Yasevich, netdev, David Miller, David Stevens

On Wed, 15 Apr 2009, Neil Horman wrote:

> I can easily envision on application which expects to get multicast traffic that
> doesn't join a group within the context of its own process, specifically relying
> on the behavior as its documented today.  Consider a data processing
> application whos group management is segmented into a different utility.  This
> is really the problem here though isn't it?  A proposal to change the 20 year
> old behavior of multicast reception with no way to know how strongly
> applications rely on this behavior and no documentation to support the assertion
> that the current behavior is broken.

The "utility" must be a daemon that keeps the socket open. You are
talking about a sheperding process that first opens a socket and
then performs multicast groups. It then keeps the socket open
(otherwise would be unsubscribed) and starts other processes that then
open their own sockets and expect the subscriptions to work.

That does not look convincing. Can you cite a case of an
application actually depending on this behavior?

> I'll refer you again to this exact conversation months ago, when I was on the
> opposite end of this, and shown to be wrong:
> http://kerneltrap.org/mailarchive/linux-netdev/2008/7/11/2430904

Just you backing down does not mean that this is wrong. We have many
more factiods here now.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 19:21                               ` Christoph Lameter
@ 2009-04-15 19:43                                 ` Neil Horman
  0 siblings, 0 replies; 28+ messages in thread
From: Neil Horman @ 2009-04-15 19:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Vlad Yasevich, netdev, David Miller, David Stevens

On Wed, Apr 15, 2009 at 03:21:27PM -0400, Christoph Lameter wrote:
> On Wed, 15 Apr 2009, Neil Horman wrote:
> 
> > I can easily envision on application which expects to get multicast traffic that
> > doesn't join a group within the context of its own process, specifically relying
> > on the behavior as its documented today.  Consider a data processing
> > application whos group management is segmented into a different utility.  This
> > is really the problem here though isn't it?  A proposal to change the 20 year
> > old behavior of multicast reception with no way to know how strongly
> > applications rely on this behavior and no documentation to support the assertion
> > that the current behavior is broken.
> 
> The "utility" must be a daemon that keeps the socket open. You are
> talking about a sheperding process that first opens a socket and
> then performs multicast groups. It then keeps the socket open
> (otherwise would be unsubscribed) and starts other processes that then
> open their own sockets and expect the subscriptions to work.
> 
> That does not look convincing. Can you cite a case of an
> application actually depending on this behavior?
> 
No, of course not, since I'm just hypothesizing.  Of course that doesn't mean
they don't exist.  And by that token you can't predict what will happen to
applications that do rely (either explicitly or inadvertently) on the current
behavior.

None of which _really_ matters, anyway, they're applications, they can be fixed
to work with either.  The question really is, do we need to, and I think the
answer is no

> > I'll refer you again to this exact conversation months ago, when I was on the
> > opposite end of this, and shown to be wrong:
> > http://kerneltrap.org/mailarchive/linux-netdev/2008/7/11/2430904
> 
> Just you backing down does not mean that this is wrong. We have many
> more factiods here now.

No, it doesn't mean this is wrong, but it does mean David convinced me what we
have now is right.  I'm obviously not going to be able to pass that on to you,
so I'm done.  Perhaps he will pick this up, I've said my peace.

Neil


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 17:19                           ` Vlad Yasevich
  2009-04-15 17:53                             ` Neil Horman
@ 2009-04-15 19:17                             ` Christoph Lameter
  2009-04-15 21:06                               ` Vlad Yasevich
  1 sibling, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2009-04-15 19:17 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Neil Horman, netdev, David Miller, David Stevens

On Wed, 15 Apr 2009, Vlad Yasevich wrote:

> Looking at BSD, which was the root of the original brokeness, they have it fixed.
> The code will skip sockets that are not members of a particular group.  So, we
> are trying really hard to stay bug-for-bug compatible with old implementations.

Ahh interesting. David: Could you say something on this?

> What's worse is that if you have a socket that doesn't care about any mulicast
> destinations (never did an ADD_MEMBERSHIP), it will still get multicast traffic if
> it bound to that port.
>
> We need to take into account the socket's multicast group list.

Right. The fix is pretty simple too since the infrastructure has been
there since the IGMPv3 updates.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 19:17                             ` Christoph Lameter
@ 2009-04-15 21:06                               ` Vlad Yasevich
  2009-04-15 23:45                                 ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Vlad Yasevich @ 2009-04-15 21:06 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Neil Horman, netdev, David Miller, David Stevens

Christoph Lameter wrote:
> On Wed, 15 Apr 2009, Vlad Yasevich wrote:
> 
>> Looking at BSD, which was the root of the original brokeness, they have it fixed.
>> The code will skip sockets that are not members of a particular group.  So, we
>> are trying really hard to stay bug-for-bug compatible with old implementations.
> 
> Ahh interesting. David: Could you say something on this?

Just digging around some more, it appears that OpenSolaris also filters out
non-joined groups at the socket:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/inet/ip/ip_multi.c#ilg_lookup_ill_withsrc

In that code, connp is essentially a socket and ilg is the membership list.
That function function called from conn_wantpacket(), which is in turn called
for every socket that matches the packet.

> 
>> What's worse is that if you have a socket that doesn't care about any mulicast
>> destinations (never did an ADD_MEMBERSHIP), it will still get multicast traffic if
>> it bound to that port.
>>
>> We need to take into account the socket's multicast group list.
> 
> Right. The fix is pretty simple too since the infrastructure has been
> there since the IGMPv3 updates.
> 

Right. since IGMPv3 introduced the concept of filtering.  It even states this
in RFC 3376:

     Filtering of packets based upon a socket's multicast reception
     state is a new feature of this service interface.  The previous
     service interface [RFC1112] described no filtering based upon
     multicast join state; rather, a join on a socket simply caused the
     host to join a group on the given interface, and packets destined
     for that group could be delivered to all sockets whether they had
     joined or not.


-vlad

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 21:06                               ` Vlad Yasevich
@ 2009-04-15 23:45                                 ` David Miller
  2009-04-16 12:44                                   ` Vlad Yasevich
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2009-04-15 23:45 UTC (permalink / raw)
  To: vladislav.yasevich; +Cc: cl, nhorman, netdev, dlstevens


Please don't post references to OpenSolaris code as I've been advised
in the past that even just looking at the opensolaris tree might taint
us as Linux developers.

Thank you.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 23:45                                 ` David Miller
@ 2009-04-16 12:44                                   ` Vlad Yasevich
  0 siblings, 0 replies; 28+ messages in thread
From: Vlad Yasevich @ 2009-04-16 12:44 UTC (permalink / raw)
  To: David Miller; +Cc: cl, nhorman, netdev, dlstevens

David Miller wrote:
> Please don't post references to OpenSolaris code as I've been advised
> in the past that even just looking at the opensolaris tree might taint
> us as Linux developers.
> 
> Thank you.
> 


Sorry, didn't realize it.

-vlad

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 14:41                   ` Vlad Yasevich
  2009-04-15 15:57                     ` Neil Horman
@ 2009-04-15 21:42                     ` David Stevens
  1 sibling, 0 replies; 28+ messages in thread
From: David Stevens @ 2009-04-15 21:42 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: Christoph Lameter, David Miller, netdev, netdev-owner,
	Neil Horman

Vlad, Neil,
        [Sorry if I'm behind -- just saw this thread...]

> Here is a rather pertinent text that points out that we "might" have a 
bug.
> RFC 4607:
> 
> 4.2.  Requirements on the Host IP Module
> 
>    An incoming datagram destined to an SSM address MUST be delivered by
>    the IP module to all sockets that have indicated (via Subscribe) a
>    desire to receive data that matches the datagram's source address,
>    destination address, and arriving interface.  It MUST NOT be
>    delivered to other sockets.
> 
> 
> Additionally, RFC 3678 describes IP_ADD_MEMBERSHIP as an 'any-source 
group'
> and is allowed by the SSM spec.  This is also how it is implemented in 
the kernel.
> However, we do not appear to perform the filtering required by the above 
quoted
> section 4.2.  In particular, if we fail to match the 'datagram's 
destination address',
> we deliver the packet, which I believe is in violation of the "MUST NOT" 
above.

        "SSM" stands for "Source-Specific Multicast". "Any source" is 
explicitly
*not* source-specific. Nothing in RFC 4607 is intended to change legacy 
behavior
where there is no source filtering. If you do SSM on Linux, you will have 
the
group-join check.
        The legacy behavior you don't like was not a bug in BSD. It was 
intentional
and authored by Steve Deering, who is also the author of the original 
multicast
RFC. The intent was that multicast address socket behavior would be just 
like
unicast address behavior, with the exception that multicast addresses are
interface-specific. If you add a new unicast address to the machine, and
the binding on the socket doesn't restrict it, you will receive packets
from any of the addresses on the machine. That's what INADDR_ANY means. 
Your
multicast application may also receive unicast traffic on that port too.
        It does seem to cause a lot of confusion for people, though I'm 
not
sure why. *Any* multicast application can receive traffic not intended for
it because there is no exclusion on multicast senders. So, applications 
must
handle packets not for it, period. Unless you have an IANA-allocated 
multicast
address and port, someone else may use it for something else on the same
network, while still following the rules. And if you do have assigned 
addresses
and ports, someone may send you garbage, anyway.
        I'll read the rest of the thread and see if there's any more to 
respond
to... :-)

 +-DLS



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-14 18:48       ` [PATCH] Multicast: Avoid useless duplication of multicast messages Christoph Lameter
  2009-04-14 20:44         ` Neil Horman
@ 2009-04-15 22:16         ` David Stevens
  2009-04-16 14:45           ` Christoph Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: David Stevens @ 2009-04-15 22:16 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, netdev, netdev-owner, Neil Horman

Actually, I think having an option for Solaris
compatibility might be a good idea, but I think
it should be per-socket.

I know of at least one application (a JVM) in
the wild that relies on the current behavior--
joins are done in one process for sockets in
another. It isn't necessarily obvious what
will break if you turn it off globally.

I tend to agree with Neil that it really shouldn't
be necessary, but there is no doubt it causes
great confusion to people.

Also, for the record, Linux doesn't support SSM
per RFC 4607. The filtering it requires applies
only to its address range and it explicitly states
the current Linux model as part of the reasoning
for it in the SSM address range only. That indicates
to me it is incorrect to filter all multicasts that
way, as Solaris does.

Doing something per-socket to express what you want
easily is fine with me and, as long as it defaults to
standard behavior, not a standards issue. Changing
all sockets on the machine from existing, correct
behavior I think is not appropriate.

                                                +-DLS

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Multicast: Avoid useless duplication of multicast messages
  2009-04-15 22:16         ` David Stevens
@ 2009-04-16 14:45           ` Christoph Lameter
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2009-04-16 14:45 UTC (permalink / raw)
  To: David Stevens; +Cc: David Miller, netdev, netdev-owner, Neil Horman

On Wed, 15 Apr 2009, David Stevens wrote:

> I know of at least one application (a JVM) in
> the wild that relies on the current behavior--
> joins are done in one process for sockets in
> another. It isn't necessarily obvious what

That is sick. Relying on operations on one socket affecting a
socket in another processs.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-04-16 14:52 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-13 21:06 Kernel sends multicast groups to sockets that did not subscribe to the MC group Christoph Lameter
2009-04-14 13:25 ` Neil Horman
2009-04-14 13:53   ` Christoph Lameter
2009-04-14 18:27     ` Neil Horman
2009-04-14 18:33       ` Christoph Lameter
2009-04-14 20:01         ` Neil Horman
2009-04-14 20:16           ` Christoph Lameter
2009-04-14 18:48       ` [PATCH] Multicast: Avoid useless duplication of multicast messages Christoph Lameter
2009-04-14 20:44         ` Neil Horman
2009-04-14 21:45           ` Christoph Lameter
2009-04-15 11:07             ` Neil Horman
2009-04-15 12:51               ` Christoph Lameter
2009-04-15 14:22                 ` Neil Horman
2009-04-15 14:41                   ` Vlad Yasevich
2009-04-15 15:57                     ` Neil Horman
2009-04-15 16:07                       ` Vlad Yasevich
2009-04-15 16:38                         ` Neil Horman
2009-04-15 17:19                           ` Vlad Yasevich
2009-04-15 17:53                             ` Neil Horman
2009-04-15 19:21                               ` Christoph Lameter
2009-04-15 19:43                                 ` Neil Horman
2009-04-15 19:17                             ` Christoph Lameter
2009-04-15 21:06                               ` Vlad Yasevich
2009-04-15 23:45                                 ` David Miller
2009-04-16 12:44                                   ` Vlad Yasevich
2009-04-15 21:42                     ` David Stevens
2009-04-15 22:16         ` David Stevens
2009-04-16 14:45           ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).