* Multicast and receive filtering in TUN/TAP
@ 2008-07-09 22:58 Max Krasnyansky
2008-07-10 8:29 ` Christian Borntraeger
2008-07-10 21:38 ` Shaun Jackman
0 siblings, 2 replies; 11+ messages in thread
From: Max Krasnyansky @ 2008-07-09 22:58 UTC (permalink / raw)
To: Brian Braunstein, Christian Borntraeger, Shaun Jackman, netdev,
virtualization
Yesterday while fixing xoff stuckiness issue in the TUN/TAP driver I got
a chance to look into the multicast filtering code in there. And
immediately realized how terribly broken & confusing it is. The patch
was originally done by Shaun (CC'ed) and went in without any proper ACK
from me, Dave or Jeff.
Here is the original ref
http://marc.info/?l=linux-netdev&m=110490502102308&w=2
I'm not going to dive into too much details on what's wrong with the
current code. The main issues are that it mixes RX and TX filtering
which are orthogonal, and it reuses ioctl names and stuff for
manipulating TX filter state as if it was a normal RX multicast state.
Later on Brian's patch added insult to the injury
http://git.kernel.org/?p=linux/kernel/git/\
torvalds/linux-2.6.git;\
a=commit;h=36226a8ded46b89a94f9de5976f554bb5e02d84c
Brian missed the point of the original patch (not his fault, as I said
the original patch was not the best) that the separate address
introduced by the MC patch was used for filtering _TX_ packets. It had
nothing to do with the HW addr of the local network interface.
The problem is that MC stuff is now even more broken and ioctls that
were used originally now mean something different. So my first thinking
was to just rip the MC stuff out because it's broken and probably nobody
uses it (given that we got no complains after Brian's patch broke it
completely). But then I realized that if done properly it might be very
useful for virtualization.
---
So the first question is are there any users out there that ever used
the original patch. Shaun, any insight ? How did you intend to use it ?
---
The second question is do you guys think that QEMU/KVM/LGUEST/etc would
benefit if receive filtering was done by the host OS. Here is a specific
example of what I'm talking about.
We can do what qemu/hw/e1000.c:receive_filter() does in the _host_
context (that function currently runs in the guest context). By looking
at libvirt, typical QEMU based setup is that you have a single bridge
and all the TAPs from different VMs are hooked up to that bridge. What
that means is that if one VM is getting MC traffic or when the bridge
sees MACADDR that is not in its tables the packets get delivered to all
the VMs. ie We have to wake all of the up only to so that they could
drop that packet. Instead, we could setup filters in the host's side of
the TAP device.
Does that sound like something useful for QEMU/KVM ?
If yes we can talk about the API. If not then I'll just nuke it.
Thanx
Max
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-09 22:58 Multicast and receive filtering in TUN/TAP Max Krasnyansky
@ 2008-07-10 8:29 ` Christian Borntraeger
2008-07-10 16:57 ` Max Krasnyansky
2008-07-10 21:38 ` Shaun Jackman
1 sibling, 1 reply; 11+ messages in thread
From: Christian Borntraeger @ 2008-07-10 8:29 UTC (permalink / raw)
To: virtualization
Cc: Max Krasnyansky, Brian Braunstein, Shaun Jackman, netdev,
Rusty Russell
Am Donnerstag, 10. Juli 2008 schrieb Max Krasnyansky:
[...]
> The second question is do you guys think that QEMU/KVM/LGUEST/etc would
> benefit if receive filtering was done by the host OS. Here is a specific
> example of what I'm talking about.
> We can do what qemu/hw/e1000.c:receive_filter() does in the _host_
> context (that function currently runs in the guest context). By looking
> at libvirt, typical QEMU based setup is that you have a single bridge
> and all the TAPs from different VMs are hooked up to that bridge. What
> that means is that if one VM is getting MC traffic or when the bridge
> sees MACADDR that is not in its tables the packets get delivered to all
> the VMs. ie We have to wake all of the up only to so that they could
> drop that packet. Instead, we could setup filters in the host's side of
> the TAP device.
> Does that sound like something useful for QEMU/KVM ?
> If yes we can talk about the API. If not then I'll just nuke it.
Max,
I know that on s390 the shared OSA network card have multicast filter
capabilities. So I guess it is worthwile for a virtualization environments
with lots of guests. I also think, that this kind of filtering should be
straightforward to implement with the qemu e1000 code. Qemu already knows the
multicast addresses.
Thing is, we are heading towards virtio. Unfortunately, virtio_net currently
does not offer a method to register multicast addresses.
Rusty, do you think its worthwile to notify the host about registered
multicast addresses?
Christian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-10 8:29 ` Christian Borntraeger
@ 2008-07-10 16:57 ` Max Krasnyansky
2008-07-10 20:23 ` Christian Borntraeger
0 siblings, 1 reply; 11+ messages in thread
From: Max Krasnyansky @ 2008-07-10 16:57 UTC (permalink / raw)
To: Christian Borntraeger
Cc: virtualization, Brian Braunstein, Shaun Jackman, netdev,
Rusty Russell
Christian Borntraeger wrote:
> Am Donnerstag, 10. Juli 2008 schrieb Max Krasnyansky:
> [...]
>> The second question is do you guys think that QEMU/KVM/LGUEST/etc would
>> benefit if receive filtering was done by the host OS. Here is a specific
>> example of what I'm talking about.
>> We can do what qemu/hw/e1000.c:receive_filter() does in the _host_
>> context (that function currently runs in the guest context). By looking
>> at libvirt, typical QEMU based setup is that you have a single bridge
>> and all the TAPs from different VMs are hooked up to that bridge. What
>> that means is that if one VM is getting MC traffic or when the bridge
>> sees MACADDR that is not in its tables the packets get delivered to all
>> the VMs. ie We have to wake all of the up only to so that they could
>> drop that packet. Instead, we could setup filters in the host's side of
>> the TAP device.
>> Does that sound like something useful for QEMU/KVM ?
>> If yes we can talk about the API. If not then I'll just nuke it.
>
> Max,
>
> I know that on s390 the shared OSA network card have multicast filter
> capabilities. So I guess it is worthwile for a virtualization environments
> with lots of guests. I also think, that this kind of filtering should be
> straightforward to implement with the qemu e1000 code. Qemu already knows the
> multicast addresses.
Sure. It's straightforward to do inside QEMU, and it's already doing it.
The question is should we do it in the host context instead and avoid some
wakeups.
> Thing is, we are heading towards virtio.
Even for Windows ?
> Unfortunately, virtio_net currently does not offer a method to register multicast addresses.
I haven't looked at the virtio stuff much, I was assuming that the host side
of it is still the TUN driver. Is it not ?
Max
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-10 16:57 ` Max Krasnyansky
@ 2008-07-10 20:23 ` Christian Borntraeger
2008-07-11 2:20 ` Max Krasnyansky
0 siblings, 1 reply; 11+ messages in thread
From: Christian Borntraeger @ 2008-07-10 20:23 UTC (permalink / raw)
To: Max Krasnyansky
Cc: virtualization, Brian Braunstein, Shaun Jackman, netdev,
Rusty Russell
Am Donnerstag, 10. Juli 2008 schrieb Max Krasnyansky:
> > Thing is, we are heading towards virtio.
> Even for Windows ?
Its possible:
http://marc.info/?l=kvm&m=121075389300722&w=2
>
> > Unfortunately, virtio_net currently does not offer a method to register
multicast addresses.
> I haven't looked at the virtio stuff much, I was assuming that the host side
> of it is still the TUN driver. Is it not ?
Yes, the host side is still tun/tap. The problem is that qemu doesnt know
which multicast addresses are used inside the guest.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-09 22:58 Multicast and receive filtering in TUN/TAP Max Krasnyansky
2008-07-10 8:29 ` Christian Borntraeger
@ 2008-07-10 21:38 ` Shaun Jackman
2008-07-11 2:32 ` Brian Braunstein
2008-07-11 3:01 ` Max Krasnyansky
1 sibling, 2 replies; 11+ messages in thread
From: Shaun Jackman @ 2008-07-10 21:38 UTC (permalink / raw)
To: Max Krasnyansky
Cc: Brian Braunstein, Christian Borntraeger, netdev, virtualization
Hi Max,
The original patch implemented receive multicast filtering by
emulating the implementation used by many physical Ethernet
interfaces: hashing the multicast address. TUN emulates two network
cards (and communication via the virtual link between them), the guest
and the host, or the character device and the network device, so there
are two receive filters: chr_filter and net_filter. I implemented the
filtering at the character device using chr_filter in tun_chr_readv,
and left filtering at the network device for someone else to
implement.
I'm not sure what you mean by TX filtering. Multicast filtering is
implemented uniquely at the receiver. There are, however, two
receivers: the character device and the network device.
I believe Brian's patch was mistaken. Two entirely distinct Ethernet
addresses are required: one for the character device and one for the
network device, or put another way, one for the virtual Ethernet
interface at the guest and one for the virtual Ethernet interface at
the host. For the same reason, there are two distinct multicast
filters.
Looking over the original patch, I believe I see a bug in tun_net_mclist:
memset(tun->chr_filter, 0, sizeof tun->chr_filter);
should be
memset(tun->net_filter, 0, sizeof tun->net_filter);
Cheers,
Shaun
On Wed, Jul 9, 2008 at 3:58 PM, Max Krasnyansky <maxk@qualcomm.com> wrote:
> Yesterday while fixing xoff stuckiness issue in the TUN/TAP driver I got a
> chance to look into the multicast filtering code in there. And immediately
> realized how terribly broken & confusing it is. The patch was originally
> done by Shaun (CC'ed) and went in without any proper ACK from me, Dave or
> Jeff.
> Here is the original ref
> http://marc.info/?l=linux-netdev&m=110490502102308&w=2
>
> I'm not going to dive into too much details on what's wrong with the current
> code. The main issues are that it mixes RX and TX filtering which are
> orthogonal, and it reuses ioctl names and stuff for manipulating TX filter
> state as if it was a normal RX multicast state.
> Later on Brian's patch added insult to the injury
> http://git.kernel.org/?p=linux/kernel/git/\
> torvalds/linux-2.6.git;\
> a=commit;h=36226a8ded46b89a94f9de5976f554bb5e02d84c
> Brian missed the point of the original patch (not his fault, as I said the
> original patch was not the best) that the separate address introduced by the
> MC patch was used for filtering _TX_ packets. It had nothing to do with the
> HW addr of the local network interface.
>
> The problem is that MC stuff is now even more broken and ioctls that were
> used originally now mean something different. So my first thinking was to
> just rip the MC stuff out because it's broken and probably nobody uses it
> (given that we got no complains after Brian's patch broke it completely).
> But then I realized that if done properly it might be very useful for
> virtualization.
>
> ---
>
> So the first question is are there any users out there that ever used the
> original patch. Shaun, any insight ? How did you intend to use it ?
>
> ---
>
> The second question is do you guys think that QEMU/KVM/LGUEST/etc would
> benefit if receive filtering was done by the host OS. Here is a specific
> example of what I'm talking about.
> We can do what qemu/hw/e1000.c:receive_filter() does in the _host_ context
> (that function currently runs in the guest context). By looking at libvirt,
> typical QEMU based setup is that you have a single bridge and all the TAPs
> from different VMs are hooked up to that bridge. What that means is that if
> one VM is getting MC traffic or when the bridge sees MACADDR that is not in
> its tables the packets get delivered to all the VMs. ie We have to wake all
> of the up only to so that they could drop that packet. Instead, we could
> setup filters in the host's side of the TAP device.
> Does that sound like something useful for QEMU/KVM ?
> If yes we can talk about the API. If not then I'll just nuke it.
>
> Thanx
> Max
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-10 20:23 ` Christian Borntraeger
@ 2008-07-11 2:20 ` Max Krasnyansky
2008-07-11 7:01 ` Rusty Russell
0 siblings, 1 reply; 11+ messages in thread
From: Max Krasnyansky @ 2008-07-11 2:20 UTC (permalink / raw)
To: Christian Borntraeger
Cc: virtualization, Brian Braunstein, Shaun Jackman, netdev,
Rusty Russell
Christian Borntraeger wrote:
> Am Donnerstag, 10. Juli 2008 schrieb Max Krasnyansky:
>>> Thing is, we are heading towards virtio.
>> Even for Windows ?
>
> Its possible:
> http://marc.info/?l=kvm&m=121075389300722&w=2
Nice.
btw Is there something similar for the display driver ? vmware has one.
>>> Unfortunately, virtio_net currently does not offer a method to register
> multicast addresses.
>> I haven't looked at the virtio stuff much, I was assuming that the host side
>> of it is still the TUN driver. Is it not ?
>
> Yes, the host side is still tun/tap. The problem is that qemu doesnt know
> which multicast addresses are used inside the guest.
Ah, now I see what you meant by virtio_net does not do multicast. I guess it
should trivial to add. Rusty will clarify it I guess.
Max
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-10 21:38 ` Shaun Jackman
@ 2008-07-11 2:32 ` Brian Braunstein
2008-07-11 3:05 ` Max Krasnyansky
2008-07-11 3:01 ` Max Krasnyansky
1 sibling, 1 reply; 11+ messages in thread
From: Brian Braunstein @ 2008-07-11 2:32 UTC (permalink / raw)
To: Shaun Jackman
Cc: Christian Borntraeger, Brian Braunstein, virtualization,
Max Krasnyansky, netdev
[-- Attachment #1.1: Type: text/plain, Size: 5353 bytes --]
Sorry that I was confused here and it seems I am still confused.
I was thinking that for any one instance of a TAP interface, there should be
only 1 MAC address, since there is only 1 network interface, since the
character device is not a network interface but rather the interface for the
application to send and receive on that virtual network interface.
For the MC stuff, I have to admit I haven't looked into it much, but it
seems like the basic operation of setting the MAC address of the network
interface should be supported, and it seems like an ioctl called
SIOCSIFHWADDR should Set the InterFace HardWare ADDRess. Sorry if I was
wrong about this. It might be good to add a comment to SIOCSIFHWADDR that
says "This does not actually set the network interface hardware address,
this is for multicast filtering" or whatever it actually is suppose to do.
Or perhaps create a new ioctl that has something about multicast filtering
in the name, and leave SIOCSIFHWADDR doing what it is doing now.
brian
On Thu, Jul 10, 2008 at 2:38 PM, Shaun Jackman <sjackman@gmail.com> wrote:
> Hi Max,
>
> The original patch implemented receive multicast filtering by
> emulating the implementation used by many physical Ethernet
> interfaces: hashing the multicast address. TUN emulates two network
> cards (and communication via the virtual link between them), the guest
> and the host, or the character device and the network device, so there
> are two receive filters: chr_filter and net_filter. I implemented the
> filtering at the character device using chr_filter in tun_chr_readv,
> and left filtering at the network device for someone else to
> implement.
>
> I'm not sure what you mean by TX filtering. Multicast filtering is
> implemented uniquely at the receiver. There are, however, two
> receivers: the character device and the network device.
>
> I believe Brian's patch was mistaken. Two entirely distinct Ethernet
> addresses are required: one for the character device and one for the
> network device, or put another way, one for the virtual Ethernet
> interface at the guest and one for the virtual Ethernet interface at
> the host. For the same reason, there are two distinct multicast
> filters.
>
>
> Looking over the original patch, I believe I see a bug in tun_net_mclist:
> memset(tun->chr_filter, 0, sizeof tun->chr_filter);
> should be
> memset(tun->net_filter, 0, sizeof tun->net_filter);
>
> Cheers,
> Shaun
>
> On Wed, Jul 9, 2008 at 3:58 PM, Max Krasnyansky <maxk@qualcomm.com> wrote:
> > Yesterday while fixing xoff stuckiness issue in the TUN/TAP driver I got
> a
> > chance to look into the multicast filtering code in there. And
> immediately
> > realized how terribly broken & confusing it is. The patch was originally
> > done by Shaun (CC'ed) and went in without any proper ACK from me, Dave or
> > Jeff.
> > Here is the original ref
> > http://marc.info/?l=linux-netdev&m=110490502102308&w=2
> >
> > I'm not going to dive into too much details on what's wrong with the
> current
> > code. The main issues are that it mixes RX and TX filtering which are
> > orthogonal, and it reuses ioctl names and stuff for manipulating TX
> filter
> > state as if it was a normal RX multicast state.
> > Later on Brian's patch added insult to the injury
> > http://git.kernel.org/?p=linux/kernel/git/\<http://git.kernel.org/?p=linux/kernel/git/%5C>
> > torvalds/linux-2.6.git;\
> > a=commit;h=36226a8ded46b89a94f9de5976f554bb5e02d84c
> > Brian missed the point of the original patch (not his fault, as I said
> the
> > original patch was not the best) that the separate address introduced by
> the
> > MC patch was used for filtering _TX_ packets. It had nothing to do with
> the
> > HW addr of the local network interface.
> >
> > The problem is that MC stuff is now even more broken and ioctls that were
> > used originally now mean something different. So my first thinking was to
> > just rip the MC stuff out because it's broken and probably nobody uses it
> > (given that we got no complains after Brian's patch broke it completely).
> > But then I realized that if done properly it might be very useful for
> > virtualization.
> >
> > ---
> >
> > So the first question is are there any users out there that ever used the
> > original patch. Shaun, any insight ? How did you intend to use it ?
> >
> > ---
> >
> > The second question is do you guys think that QEMU/KVM/LGUEST/etc would
> > benefit if receive filtering was done by the host OS. Here is a specific
> > example of what I'm talking about.
> > We can do what qemu/hw/e1000.c:receive_filter() does in the _host_
> context
> > (that function currently runs in the guest context). By looking at
> libvirt,
> > typical QEMU based setup is that you have a single bridge and all the
> TAPs
> > from different VMs are hooked up to that bridge. What that means is that
> if
> > one VM is getting MC traffic or when the bridge sees MACADDR that is not
> in
> > its tables the packets get delivered to all the VMs. ie We have to wake
> all
> > of the up only to so that they could drop that packet. Instead, we could
> > setup filters in the host's side of the TAP device.
> > Does that sound like something useful for QEMU/KVM ?
> > If yes we can talk about the API. If not then I'll just nuke it.
> >
> > Thanx
> > Max
> >
>
[-- Attachment #1.2: Type: text/html, Size: 6516 bytes --]
[-- Attachment #2: Type: text/plain, Size: 184 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-10 21:38 ` Shaun Jackman
2008-07-11 2:32 ` Brian Braunstein
@ 2008-07-11 3:01 ` Max Krasnyansky
1 sibling, 0 replies; 11+ messages in thread
From: Max Krasnyansky @ 2008-07-11 3:01 UTC (permalink / raw)
To: Shaun Jackman
Cc: Brian Braunstein, Christian Borntraeger, netdev, virtualization
Shaun Jackman wrote:
> Hi Max,
>
> The original patch implemented receive multicast filtering by
> emulating the implementation used by many physical Ethernet
> interfaces: hashing the multicast address. TUN emulates two network
> cards (and communication via the virtual link between them),
Nope. It does not. That's where the confusion is coming from.
TUN/TAP emulates _single_ network interface.
A better analogy of what TUN/TAP is is the "disconnected Ethernet port". It's
up to the application to simulate wire|loopback|another nic|etc.
Character device is simply a mechanism to reading and writing frames.
Application that uses TUN/TAP _may_ chose to treat the other end of it as
another network interface but TUN/TAP driver itself does not. Consider a
tunneling application like OpenVPN for example.
What you're describing is more like TAP (host) <-> TAP (guest) kind of thing.
ie Separate TAP instance for the guest.
> the guest
> and the host, or the character device and the network device, so there
> are two receive filters: chr_filter and net_filter. I implemented the
> filtering at the character device using chr_filter in tun_chr_readv,
> and left filtering at the network device for someone else to
> implement.
>
> I'm not sure what you mean by TX filtering. Multicast filtering is
> implemented uniquely at the receiver. There are, however, two
> receivers: the character device and the network device.
See my explanation above. You came up with a bit different interpretation of
what TUN/TAP is which is not correct. With the proper TUN/TAP model in mind
your patch implemented filtering of the packets _transmitted_ on the TAP.
That's what I meant by the TX filtering.
> I believe Brian's patch was mistaken. Two entirely distinct Ethernet
> addresses are required: one for the character device and one for the
> network device, or put another way, one for the virtual Ethernet
> interface at the guest and one for the virtual Ethernet interface at
> the host. For the same reason, there are two distinct multicast
> filters.
If you see my first email on this thread I mentioned that the wrong set of
ioctls() was used in the original MC patch. Since most people think of the
TUN/TAP as the single device, Brian rightfully assumed that SIOCSIFHWADDR
sets the HW address of that single interface. ie It looks as if it's just a
short cut. Instead of having to open a socket, lookup TAP device and set HW
addr via the socket, application can just set it via TUN/TAP fd directly.
Also look at one of the comments in your original patch
case SIOCGIFHWADDR:
/* Note: the actual net device's address may be different */
In your TUN model the addresses _must_ be different. How can they be the same
if you're saying it's two different network cards ? That comment was incorrect
and confusing, it makes people think that it's kind of the same thing.
In fact it took me awhile to understand what it was that you were trying to do :).
> Looking over the original patch, I believe I see a bug in tun_net_mclist:
> memset(tun->chr_filter, 0, sizeof tun->chr_filter);
> should be
> memset(tun->net_filter, 0, sizeof tun->net_filter);
Yep that too. It made me think that you're mixing TX and RX filtering.
Especially because net_filter was not used anywhere else.
So the correct way to do this is to clearly state that ioctl modifies TX
filters. ie Instead of using generic SIOCGIFHWADDR it should've been
TUNSETTXFILTER or something. And tun->dev_addr thing looks awfully similar to
the net->dev_addr. Again it should be tun->tx_filter_addr or something like that.
Anyway, do you have a use case for this stuff ?
I have a patch that cleans it all up. Looks like it might be useful for the
virt folks. Let me know what your use case was so that I could make sure it'd
still work. Unfortunately I'm going to have to replace existing SIOC* ioctls.
Max
> On Wed, Jul 9, 2008 at 3:58 PM, Max Krasnyansky <maxk@qualcomm.com> wrote:
>> Yesterday while fixing xoff stuckiness issue in the TUN/TAP driver I got a
>> chance to look into the multicast filtering code in there. And immediately
>> realized how terribly broken & confusing it is. The patch was originally
>> done by Shaun (CC'ed) and went in without any proper ACK from me, Dave or
>> Jeff.
>> Here is the original ref
>> http://marc.info/?l=linux-netdev&m=110490502102308&w=2
>>
>> I'm not going to dive into too much details on what's wrong with the current
>> code. The main issues are that it mixes RX and TX filtering which are
>> orthogonal, and it reuses ioctl names and stuff for manipulating TX filter
>> state as if it was a normal RX multicast state.
>> Later on Brian's patch added insult to the injury
>> http://git.kernel.org/?p=linux/kernel/git/\
>> torvalds/linux-2.6.git;\
>> a=commit;h=36226a8ded46b89a94f9de5976f554bb5e02d84c
>> Brian missed the point of the original patch (not his fault, as I said the
>> original patch was not the best) that the separate address introduced by the
>> MC patch was used for filtering _TX_ packets. It had nothing to do with the
>> HW addr of the local network interface.
>>
>> The problem is that MC stuff is now even more broken and ioctls that were
>> used originally now mean something different. So my first thinking was to
>> just rip the MC stuff out because it's broken and probably nobody uses it
>> (given that we got no complains after Brian's patch broke it completely).
>> But then I realized that if done properly it might be very useful for
>> virtualization.
>>
>> ---
>>
>> So the first question is are there any users out there that ever used the
>> original patch. Shaun, any insight ? How did you intend to use it ?
>>
>> ---
>>
>> The second question is do you guys think that QEMU/KVM/LGUEST/etc would
>> benefit if receive filtering was done by the host OS. Here is a specific
>> example of what I'm talking about.
>> We can do what qemu/hw/e1000.c:receive_filter() does in the _host_ context
>> (that function currently runs in the guest context). By looking at libvirt,
>> typical QEMU based setup is that you have a single bridge and all the TAPs
>> from different VMs are hooked up to that bridge. What that means is that if
>> one VM is getting MC traffic or when the bridge sees MACADDR that is not in
>> its tables the packets get delivered to all the VMs. ie We have to wake all
>> of the up only to so that they could drop that packet. Instead, we could
>> setup filters in the host's side of the TAP device.
>> Does that sound like something useful for QEMU/KVM ?
>> If yes we can talk about the API. If not then I'll just nuke it.
>>
>> Thanx
>> Max
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-11 2:32 ` Brian Braunstein
@ 2008-07-11 3:05 ` Max Krasnyansky
0 siblings, 0 replies; 11+ messages in thread
From: Max Krasnyansky @ 2008-07-11 3:05 UTC (permalink / raw)
To: Brian Braunstein
Cc: Shaun Jackman, Brian Braunstein, Christian Borntraeger, netdev,
virtualization
Brian Braunstein wrote:
> Sorry that I was confused here and it seems I am still confused.
>
> I was thinking that for any one instance of a TAP interface, there
> should be only 1 MAC address, since there is only 1 network interface,
> since the character device is not a network interface but rather the
> interface for the application to send and receive on that virtual
> network interface.
>
Exactly. Your understanding is perfectly correct.
See my previous reply. It should clear up all the confusion.
> For the MC stuff, I have to admit I haven't looked into it much, but it
> seems like the basic operation of setting the MAC address of the network
> interface should be supported, and it seems like an ioctl called
> SIOCSIFHWADDR should Set the InterFace HardWare ADDRess. Sorry if I was
> wrong about this. It might be good to add a comment to SIOCSIFHWADDR
> that says "This does not actually set the network interface hardware
> address, this is for multicast filtering" or whatever it actually is
> suppose to do. Or perhaps create a new ioctl that has something about
> multicast filtering in the name, and leave SIOCSIFHWADDR doing what it
> is doing now.
Yep. That's what I'm going to do (ie a different ioctl). Again see my prev
email. We're totally on the same page :).
Max
>
> brian
>
>
> On Thu, Jul 10, 2008 at 2:38 PM, Shaun Jackman <sjackman@gmail.com
> <mailto:sjackman@gmail.com>> wrote:
>
> Hi Max,
>
> The original patch implemented receive multicast filtering by
> emulating the implementation used by many physical Ethernet
> interfaces: hashing the multicast address. TUN emulates two network
> cards (and communication via the virtual link between them), the guest
> and the host, or the character device and the network device, so there
> are two receive filters: chr_filter and net_filter. I implemented the
> filtering at the character device using chr_filter in tun_chr_readv,
> and left filtering at the network device for someone else to
> implement.
>
> I'm not sure what you mean by TX filtering. Multicast filtering is
> implemented uniquely at the receiver. There are, however, two
> receivers: the character device and the network device.
>
> I believe Brian's patch was mistaken. Two entirely distinct Ethernet
> addresses are required: one for the character device and one for the
> network device, or put another way, one for the virtual Ethernet
> interface at the guest and one for the virtual Ethernet interface at
> the host. For the same reason, there are two distinct multicast
> filters.
>
>
>
> Looking over the original patch, I believe I see a bug in
> tun_net_mclist:
> memset(tun->chr_filter, 0, sizeof tun->chr_filter);
> should be
> memset(tun->net_filter, 0, sizeof tun->net_filter);
>
> Cheers,
> Shaun
>
> On Wed, Jul 9, 2008 at 3:58 PM, Max Krasnyansky <maxk@qualcomm.com
> <mailto:maxk@qualcomm.com>> wrote:
> > Yesterday while fixing xoff stuckiness issue in the TUN/TAP driver
> I got a
> > chance to look into the multicast filtering code in there. And
> immediately
> > realized how terribly broken & confusing it is. The patch was
> originally
> > done by Shaun (CC'ed) and went in without any proper ACK from me,
> Dave or
> > Jeff.
> > Here is the original ref
> > http://marc.info/?l=linux-netdev&m=110490502102308&w=2
> <http://marc.info/?l=linux-netdev&m=110490502102308&w=2>
> >
> > I'm not going to dive into too much details on what's wrong with
> the current
> > code. The main issues are that it mixes RX and TX filtering which are
> > orthogonal, and it reuses ioctl names and stuff for manipulating
> TX filter
> > state as if it was a normal RX multicast state.
> > Later on Brian's patch added insult to the injury
> > http://git.kernel.org/?p=linux/kernel/git/\
> <http://git.kernel.org/?p=linux/kernel/git/%5C>
> > torvalds/linux-2.6.git;\
> > a=commit;h=36226a8ded46b89a94f9de5976f554bb5e02d84c
> > Brian missed the point of the original patch (not his fault, as I
> said the
> > original patch was not the best) that the separate address
> introduced by the
> > MC patch was used for filtering _TX_ packets. It had nothing to do
> with the
> > HW addr of the local network interface.
> >
> > The problem is that MC stuff is now even more broken and ioctls
> that were
> > used originally now mean something different. So my first thinking
> was to
> > just rip the MC stuff out because it's broken and probably nobody
> uses it
> > (given that we got no complains after Brian's patch broke it
> completely).
> > But then I realized that if done properly it might be very useful for
> > virtualization.
> >
> > ---
> >
> > So the first question is are there any users out there that ever
> used the
> > original patch. Shaun, any insight ? How did you intend to use it ?
> >
> > ---
> >
> > The second question is do you guys think that QEMU/KVM/LGUEST/etc
> would
> > benefit if receive filtering was done by the host OS. Here is a
> specific
> > example of what I'm talking about.
> > We can do what qemu/hw/e1000.c:receive_filter() does in the _host_
> context
> > (that function currently runs in the guest context). By looking at
> libvirt,
> > typical QEMU based setup is that you have a single bridge and all
> the TAPs
> > from different VMs are hooked up to that bridge. What that means
> is that if
> > one VM is getting MC traffic or when the bridge sees MACADDR that
> is not in
> > its tables the packets get delivered to all the VMs. ie We have to
> wake all
> > of the up only to so that they could drop that packet. Instead, we
> could
> > setup filters in the host's side of the TAP device.
> > Does that sound like something useful for QEMU/KVM ?
> > If yes we can talk about the API. If not then I'll just nuke it.
> >
> > Thanx
> > Max
> >
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-11 2:20 ` Max Krasnyansky
@ 2008-07-11 7:01 ` Rusty Russell
2008-07-11 8:01 ` Max Krasnyansky
0 siblings, 1 reply; 11+ messages in thread
From: Rusty Russell @ 2008-07-11 7:01 UTC (permalink / raw)
To: Max Krasnyansky
Cc: Christian Borntraeger, Brian Braunstein, Shaun Jackman, netdev,
virtualization
On Friday 11 July 2008 12:20:07 Max Krasnyansky wrote:
> >> I haven't looked at the virtio stuff much, I was assuming that the host
> >> side of it is still the TUN driver. Is it not ?
> >
> > Yes, the host side is still tun/tap. The problem is that qemu doesnt know
> > which multicast addresses are used inside the guest.
>
> Ah, now I see what you meant by virtio_net does not do multicast. I guess
> it should trivial to add. Rusty will clarify it I guess.
Yes, it could certainly be added; that's what feature bits are for :)
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multicast and receive filtering in TUN/TAP
2008-07-11 7:01 ` Rusty Russell
@ 2008-07-11 8:01 ` Max Krasnyansky
0 siblings, 0 replies; 11+ messages in thread
From: Max Krasnyansky @ 2008-07-11 8:01 UTC (permalink / raw)
To: Rusty Russell
Cc: Christian Borntraeger, virtualization, Brian Braunstein,
Shaun Jackman, netdev
Rusty Russell wrote:
> On Friday 11 July 2008 12:20:07 Max Krasnyansky wrote:
>>>> I haven't looked at the virtio stuff much, I was assuming that the host
>>>> side of it is still the TUN driver. Is it not ?
>>> Yes, the host side is still tun/tap. The problem is that qemu doesnt know
>>> which multicast addresses are used inside the guest.
>> Ah, now I see what you meant by virtio_net does not do multicast. I guess
>> it should trivial to add. Rusty will clarify it I guess.
>
> Yes, it could certainly be added; that's what feature bits are for :)
Sounds good.
I'll send the patch that lets you guys setup tx filters on the TAP devices.
Hypervisors will then need to translate rx filters set by the guest OS into
TAP tx filters. I'm thinking of doing it just like E1000 for example. 14 exact
filters and the rest is hashed.
Max
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-07-11 8:01 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-09 22:58 Multicast and receive filtering in TUN/TAP Max Krasnyansky
2008-07-10 8:29 ` Christian Borntraeger
2008-07-10 16:57 ` Max Krasnyansky
2008-07-10 20:23 ` Christian Borntraeger
2008-07-11 2:20 ` Max Krasnyansky
2008-07-11 7:01 ` Rusty Russell
2008-07-11 8:01 ` Max Krasnyansky
2008-07-10 21:38 ` Shaun Jackman
2008-07-11 2:32 ` Brian Braunstein
2008-07-11 3:05 ` Max Krasnyansky
2008-07-11 3:01 ` Max Krasnyansky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).