* (unknown)
@ 2010-09-27 20:05 Jason Gunthorpe
[not found] ` <20100927200500.GB25879-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2010-09-27 20:05 UTC (permalink / raw)
To: David Stevens
Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, Bob Arendt
Bcc:
Subject: Re: igmp: Staggered igmp report intervals for unsolicited igmp
reports
Reply-To:
In-Reply-To: <OF871D4733.876C9DA0-ON882577AB.006AB200-882577AB.006B6101-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
On Mon, Sep 27, 2010 at 12:32:45PM -0700, David Stevens wrote:
> You can, of course, add a querier (or configure it, assuming an
> attached switch supports it) and set the query interval and
> robustness count as appropriate for that network.
Presumably the IPoIB multicast router should already be the querier..
How does this help handling joins to new groups?
> As would be having those networks queue packets for hardware
> addresses they know require a delay before a transmit can
> complete. But that approach can't adversely affect already-working
> solutions for typical networks, or depart unnecessarily from
> established standard protocols.
There is no way to know when a hardware address is 'ready' in a IGMPv2
sense.. The problem with IGMPv2 and any network that doesn't flood
multicast to all nodes is that there is no way to know when all IGMPv2
listeners are listening on the group you just created.
For IGMPv2 there is a special hack in the IPoIB routers that cause
them to automatically join the IP multicast groups as they are created
so they can get the per-group IGMP messages, and this process takes
time and is completely opaque to the end nodes.
IB could emulate something like ethernet flooding by sending packets
to the permanent 'broadcast' (all-IP-endpoints) multicast group - but
it has no way to know when that is necessary and when it is not.
Sending IGMPv2 packets to the group address that is being managed
(rather than an IGMP specific group like in v3) is a design choice
that probably only works well on ethernet :(
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:
[not found] ` <20100927200500.GB25879-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-09-27 20:14 ` David Stevens
[not found] ` <OF056C7E7C.A9A5EFC7-ON882577AB.006E6B89-882577AB.006F2C1E-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: David Stevens @ 2010-09-27 20:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
netdev-owner-u79uwXL29TY76Z2rM5mHXA, Bob Arendt
Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote on 09/27/2010
01:05:00 PM:
> On Mon, Sep 27, 2010 at 12:32:45PM -0700, David Stevens wrote:
>
> > You can, of course, add a querier (or configure it, assuming an
> > attached switch supports it) and set the query interval and
> > robustness count as appropriate for that network.
>
> Presumably the IPoIB multicast router should already be the querier..
> How does this help handling joins to new groups?
Because a querier can set the robustness value and
query interval to anything you want. In the original report,
he's not running a querier. The fact that it's a new group
doesn't matter -- these are per-interface.
+-DLS
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:
[not found] ` <OF056C7E7C.A9A5EFC7-ON882577AB.006E6B89-882577AB.006F2C1E-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-09-27 20:23 ` Christoph Lameter
[not found] ` <alpine.DEB.2.00.1009271521510.14117-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2010-09-27 20:23 UTC (permalink / raw)
To: David Stevens
Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
netdev-owner-u79uwXL29TY76Z2rM5mHXA, Bob Arendt
On Mon, 27 Sep 2010, David Stevens wrote:
> Because a querier can set the robustness value and
> query interval to anything you want. In the original report,
> he's not running a querier. The fact that it's a new group
> doesn't matter -- these are per-interface.
The per interface settings are used to force an IGMP version overriding
any information by the queriers. You would not want to enable that because
it disables support for other IGMP versions. Without the override
different version of IGMP can be handled per MC group.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:
[not found] ` <alpine.DEB.2.00.1009271521510.14117-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-09-27 20:54 ` Bob Arendt
2010-09-27 22:01 ` Re: David Stevens
2010-09-27 21:50 ` David Stevens
1 sibling, 1 reply; 9+ messages in thread
From: Bob Arendt @ 2010-09-27 20:54 UTC (permalink / raw)
To: Christoph Lameter
Cc: David Stevens, Jason Gunthorpe,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 09/27/10 13:23, Christoph Lameter wrote:
> On Mon, 27 Sep 2010, David Stevens wrote:
>
>> Because a querier can set the robustness value and
>> query interval to anything you want. In the original report,
>> he's not running a querier. The fact that it's a new group
>> doesn't matter -- these are per-interface.
>
> The per interface settings are used to force an IGMP version overriding
> any information by the queriers. You would not want to enable that because
> it disables support for other IGMP versions. Without the override
> different version of IGMP can be handled per MC group.
>
If a network vlan has IGMPv3 capability, then it should be able
to support both v2 and v3 Joins (clients). But if the vlan is
IGMPv2 only, then an initial Join from a Linux client might go out
as v3 (if it hasn't seen a query yet) and be ignored. I believe
this is the case that force_igmp_version really addresses.
And it turns out that force_igmp_version=2 doesn't fully work.
If the host sees a IGMPv3 query, it still responds with a v3 Join
despite the flag. Bug report and candidate patch here:
https://bugzilla.kernel.org/show_bug.cgi?id=18212
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:
[not found] ` <alpine.DEB.2.00.1009271521510.14117-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-09-27 20:54 ` Re: Bob Arendt
@ 2010-09-27 21:50 ` David Stevens
2010-09-28 15:49 ` Re: Christoph Lameter
1 sibling, 1 reply; 9+ messages in thread
From: David Stevens @ 2010-09-27 21:50 UTC (permalink / raw)
To: Christoph Lameter
Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
netdev-owner-u79uwXL29TY76Z2rM5mHXA, Bob Arendt
Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote on 09/27/2010 01:23:00 PM:
> From: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> To: David Stevens/Beaverton/IBM@IBMUS
> Cc: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>, linux-
> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Bob Arendt <rda-x0S3BwdUo6DQT0dZR+AlfA@public.gmane.org>
> Date: 09/27/2010 01:23 PM
> Subject: Re:
>
> On Mon, 27 Sep 2010, David Stevens wrote:
>
> > Because a querier can set the robustness value and
> > query interval to anything you want. In the original report,
> > he's not running a querier. The fact that it's a new group
> > doesn't matter -- these are per-interface.
>
> The per interface settings are used to force an IGMP version overriding
> any information by the queriers.
No. I'm not talking about the force_igmp_tunable here, I'm talking
about the per-interface robustness and interval settings which come from
the querier (whatever version you are using).
> You would not want to enable that because
> it disables support for other IGMP versions. Without the override
> different version of IGMP can be handled per MC group.
No. IGMPv3 includes backward compatibility for both IGMPv2 and
IGMPv1. If queries for an earlier version are present, that is the
IGMP version all use, and the features of the later version are not
available to anyone.
+-DLS
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:
2010-09-27 20:54 ` Re: Bob Arendt
@ 2010-09-27 22:01 ` David Stevens
2010-09-27 23:51 ` IGMP v3 reponse Bob Arendt
0 siblings, 1 reply; 9+ messages in thread
From: David Stevens @ 2010-09-27 22:01 UTC (permalink / raw)
To: Bob Arendt
Cc: Christoph Lameter, Jason Gunthorpe, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, netdev-owner@vger.kernel.org
Bob Arendt <rda@rincon.com> wrote on 09/27/2010 01:54:36 PM:
> On 09/27/10 13:23, Christoph Lameter wrote:
> > On Mon, 27 Sep 2010, David Stevens wrote:
> >
> >> Because a querier can set the robustness value and
> >> query interval to anything you want. In the original report,
> >> he's not running a querier. The fact that it's a new group
> >> doesn't matter -- these are per-interface.
> >
> > The per interface settings are used to force an IGMP version
overriding
> > any information by the queriers. You would not want to enable that
because
> > it disables support for other IGMP versions. Without the override
> > different version of IGMP can be handled per MC group.
> >
> If a network vlan has IGMPv3 capability, then it should be able
> to support both v2 and v3 Joins (clients). But if the vlan is
> IGMPv2 only, then an initial Join from a Linux client might go out
> as v3 (if it hasn't seen a query yet) and be ignored. I believe
> this is the case that force_igmp_version really addresses.
Not really. It's for the case where there is no querier at all,
but a snooping switch that only supports IGMPv2. After any query has
put an interface in IGMPv2 mode (or IGMPv1), the initial report for
all joins will use the earlier protocol. It isn't per-group, it's
per interface, and you cannot mix versions of IGMP on the same network.
>
> And it turns out that force_igmp_version=2 doesn't fully work.
> If the host sees a IGMPv3 query, it still responds with a v3 Join
> despite the flag. Bug report and candidate patch here:
> https://bugzilla.kernel.org/show_bug.cgi?id=18212
This is a special case. The "correct" alternative is to drop
the query and not send any report at all. Sending an answer in the
originating protocol doesn't hurt anything here, because MC routers
are required to use the earlier version too; there should be no such
thing as an "IGMPv3-only querier" as in that report. IGMPv3 compliance
*requires* falling back to IGMPv2 if there is a v2 query by another
router.
By answering instead of dropping, it allows fuller filter
information from a manual query to be returned even if the network
is using v2 MC routers, but dropping and ignoring the query as
required by RFC does not fix the bug & patch submitter's problem.
Which is why I also NACKed that patch.
+-DLS
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IGMP v3 reponse
2010-09-27 22:01 ` Re: David Stevens
@ 2010-09-27 23:51 ` Bob Arendt
[not found] ` <4CA12DF3.6050608-x0S3BwdUo6DQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Bob Arendt @ 2010-09-27 23:51 UTC (permalink / raw)
To: David Stevens
Cc: Christoph Lameter, Jason Gunthorpe, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, netdev-owner@vger.kernel.org
On 09/27/10 15:01, David Stevens wrote:
> Bob Arendt<rda@rincon.com> wrote on 09/27/2010 01:54:36 PM:
>> And it turns out that force_igmp_version=2 doesn't fully work.
>> If the host sees a IGMPv3 query, it still responds with a v3 Join
>> despite the flag. Bug report and candidate patch here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=18212
>
> This is a special case. The "correct" alternative is to drop
> the query and not send any report at all. Sending an answer in the
> originating protocol doesn't hurt anything here, because MC routers
> are required to use the earlier version too; there should be no such
> thing as an "IGMPv3-only querier" as in that report. IGMPv3 compliance
> *requires* falling back to IGMPv2 if there is a v2 query by another
> router.
> By answering instead of dropping, it allows fuller filter
> information from a manual query to be returned even if the network
> is using v2 MC routers, but dropping and ignoring the query as
> required by RFC does not fix the bug& patch submitter's problem.
> Which is why I also NACKed that patch.
>
> +-DLS
>
>
Per rfc 2236, the v2 client *can't* drop the IGMPv3 query. From para 2.5:
2.5. Other fields
Note that IGMP messages may be longer than 8 octets, especially
future backwards-compatible versions of IGMP. As long as the Type is
one that is recognized, an IGMPv2 implementation MUST ignore anything
past the first 8 octets while processing the packet. However, the
IGMP checksum is always computed over the whole IP payload, not just
over the first 8 octets.
The IGMPv3 query *is* a valid v2 query with extra crap at the end (it's
backward compatible). Per rfc 2236 p2.5, the v3 query has to be regarded
as a valid v2 query if you're correctly implementing IGMPv2. Some switch
vendors (Cisco for one) only generate v3 queries when operating as an
IGMP snooping switch, since a proper v2 client will respond with IGMPv2
packets and it handles this properly. However, we're seeing some switches
get confused when a client initial joins with v2, then responds to a query
with v3. It ends up creating 2 entries, and only one is cleared by the
leave message. This is also an issue when the primary (querier) switch
only generates v3 queries, and some intermediate switches only support v2.
We set all the Linux clients to v2 .. but they respond the v2 query with
v3 protocols, which could be missed by the intermediate switch.
I believe the intent of the bug 18212 patch is correct.
-Bob Arendt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IGMP v3 reponse
[not found] ` <4CA12DF3.6050608-x0S3BwdUo6DQT0dZR+AlfA@public.gmane.org>
@ 2010-09-28 0:41 ` David Stevens
0 siblings, 0 replies; 9+ messages in thread
From: David Stevens @ 2010-09-28 0:41 UTC (permalink / raw)
To: Bob Arendt
Cc: Christoph Lameter, Jason Gunthorpe,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Bob Arendt <rda-x0S3BwdUo6DQT0dZR+AlfA@public.gmane.org> wrote on 09/27/2010 04:51:15 PM:
> Per rfc 2236, the v2 client *can't* drop the IGMPv3 query. From para
2.5:
> 2.5. Other fields
> Note that IGMP messages may be longer than 8 octets, especially
> future backwards-compatible versions of IGMP. As long as the Type
is
> one that is recognized, an IGMPv2 implementation MUST ignore
anything
> past the first 8 octets while processing the packet. However, the
> IGMP checksum is always computed over the whole IP payload, not just
> over the first 8 octets.
>
> The IGMPv3 query *is* a valid v2 query with extra crap at the end (it's
> backward compatible). Per rfc 2236 p2.5, the v3 query has to be
regarded
> as a valid v2 query if you're correctly implementing IGMPv2. Some
switch
> vendors (Cisco for one) only generate v3 queries when operating as an
> IGMP snooping switch, since a proper v2 client will respond with IGMPv2
> packets and it handles this properly. However, we're seeing some
switches
> get confused when a client initial joins with v2, then responds to a
query
> with v3. It ends up creating 2 entries, and only one is cleared by the
> leave message. This is also an issue when the primary (querier) switch
> only generates v3 queries, and some intermediate switches only support
v2.
> We set all the Linux clients to v2 .. but they respond the v2 query with
> v3 protocols, which could be missed by the intermediate switch.
>
> I believe the intent of the bug 18212 patch is correct.
Bob,
That would've been a nice quote for the other discussion; I didn't
do the v2 implementation for Linux, but it appears I broke this. I'll
take another look (and thanks for pointing that out!).
+-DLS
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re:
2010-09-27 21:50 ` David Stevens
@ 2010-09-28 15:49 ` Christoph Lameter
0 siblings, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2010-09-28 15:49 UTC (permalink / raw)
To: David Stevens
Cc: Jason Gunthorpe, linux-rdma, netdev, David Miller, Bob Arendt
On Mon, 27 Sep 2010, David Stevens wrote:
> No. I'm not talking about the force_igmp_tunable here, I'm talking
> about the per-interface robustness and interval settings which come from
> the querier (whatever version you are using).
The igmp subsystem currently does not keep state on the interface layer
about robustness etc. An interval setting is only kept for IGMP v3 and
used only for general query timeouts with igmp V3. The interval is
different one from the one used for the host membership reports.
Looking at the spec I get the impression that these variables seems to be
mainly of interest to router to router communications?
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-09-28 15:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-27 20:05 (unknown) Jason Gunthorpe
[not found] ` <20100927200500.GB25879-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-09-27 20:14 ` David Stevens
[not found] ` <OF056C7E7C.A9A5EFC7-ON882577AB.006E6B89-882577AB.006F2C1E-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-09-27 20:23 ` Re: Christoph Lameter
[not found] ` <alpine.DEB.2.00.1009271521510.14117-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-09-27 20:54 ` Re: Bob Arendt
2010-09-27 22:01 ` Re: David Stevens
2010-09-27 23:51 ` IGMP v3 reponse Bob Arendt
[not found] ` <4CA12DF3.6050608-x0S3BwdUo6DQT0dZR+AlfA@public.gmane.org>
2010-09-28 0:41 ` David Stevens
2010-09-27 21:50 ` David Stevens
2010-09-28 15:49 ` Re: Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox