netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ipv6: alternative version of S/390 shared NIC support
@ 2005-01-16 11:54 Christoph Hellwig
  2005-01-16 14:30 ` jamal
  2005-01-17 21:42 ` David S. Miller
  0 siblings, 2 replies; 16+ messages in thread
From: Christoph Hellwig @ 2005-01-16 11:54 UTC (permalink / raw)
  To: davem, pavlic; +Cc: waldi, netdev

Let's try again solvign the EUI64 generation for S/390.  I looked over
the IBM patch and I think it can be done a lot simpler:

 - put a dev_id field in struct net_device, so that it uses space that
   would be wasted by padding otherwise.
 - if this fields is non-null let ipv6_generate_eui64 use the algorithm
   from the QETH code to generate an EUI that's different for each
   OS instance.  See code comments for details.


--- 1.23/drivers/s390/net/qeth_main.c	2005-01-04 00:49:39 +01:00
+++ edited/drivers/s390/net/qeth_main.c	2005-01-16 12:33:52 +01:00
@@ -5033,27 +5033,6 @@
 	return 0;
 }
 
-#ifdef CONFIG_QETH_IPV6
-int
-qeth_ipv6_generate_eui64(u8 * eui, struct net_device *dev)
-{
-	switch (dev->type) {
-	case ARPHRD_ETHER:
-	case ARPHRD_FDDI:
-	case ARPHRD_IEEE802_TR:
-		if (dev->addr_len != ETH_ALEN)
-			return -1;
-		memcpy(eui, dev->dev_addr, 3);
-		memcpy(eui + 5, dev->dev_addr + 3, 3);
-		eui[3] = (dev->dev_id >> 8) & 0xff;
-		eui[4] = dev->dev_id & 0xff;
-		return 0;
-	}
-	return -1;
-
-}
-#endif
-
 static void
 qeth_get_mac_for_ipm(__u32 ipm, char *mac, struct net_device *dev)
 {
@@ -5587,11 +5566,8 @@
 	}
 #ifdef CONFIG_QETH_IPV6
 	/*IPv6 address autoconfiguration stuff*/
-	card->dev->dev_id = card->info.unique_id & 0xffff;
 	if (!(card->info.unique_id & UNIQUE_ID_NOT_BY_CARD))
-		card->dev->generate_eui64 = qeth_ipv6_generate_eui64;
-
-
+		card->dev->dev_id = card->info.unique_id & 0xffff;
 #endif
 	dev->hard_header_parse = NULL;
 	dev->set_mac_address = qeth_layer2_set_mac_address;
--- 1.95/include/linux/netdevice.h	2005-01-10 21:23:55 +01:00
+++ edited/include/linux/netdevice.h	2005-01-16 12:32:07 +01:00
@@ -345,6 +345,7 @@
 	unsigned char		broadcast[MAX_ADDR_LEN];	/* hw bcast add	*/
 	unsigned char		dev_addr[MAX_ADDR_LEN];	/* hw address	*/
 	unsigned char		addr_len;	/* hardware address length	*/
+	unsigned short          dev_id;		/* for shared network cards */
 
 	struct dev_mc_list	*mc_list;	/* Multicast mac addresses	*/
 	int			mc_count;	/* Number of installed mcasts	*/
--- 1.128/net/ipv6/addrconf.c	2005-01-14 22:30:07 +01:00
+++ edited/net/ipv6/addrconf.c	2005-01-16 12:29:51 +01:00
@@ -1079,10 +1079,29 @@
 		if (dev->addr_len != ETH_ALEN)
 			return -1;
 		memcpy(eui, dev->dev_addr, 3);
-		memcpy(eui + 5, dev->dev_addr+3, 3);
-		eui[3] = 0xFF;
-		eui[4] = 0xFE;
-		eui[0] ^= 2;
+		memcpy(eui + 5, dev->dev_addr + 3, 3);
+
+		/*
+		 * The zSeries OSA network cards can be shared among various
+		 * OS instances, but the OSA cards have only one MAC address.
+		 * This leads to duplicate address conflicts in conjunction
+		 * with IPv6 if more than one instance uses the same card.
+		 * 
+		 * The driver for these cards can deliver a unique 16-bit
+		 * identifier for each instance sharing the same card.  It is
+		 * placed instead of 0xFFFE in the interface identifier.  The
+		 * "u" bit of the interface identifier is not inverted in this
+		 * case.  Hence the resulting interface identifier has local
+		 * scope according to RFC2373.
+		 */
+		if (dev->dev_id) {
+			eui[3] = (dev->dev_id >> 8) & 0xFF;
+			eui[4] = dev->dev_id & 0xFF;
+		} else {
+			eui[3] = 0xFF;
+			eui[4] = 0xFE;
+			eui[0] ^= 2;
+		}
 		return 0;
 	case ARPHRD_ARCNET:
 		/* XXX: inherit EUI-64 from other interface -- yoshfuji */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-16 11:54 [PATCH] ipv6: alternative version of S/390 shared NIC support Christoph Hellwig
@ 2005-01-16 14:30 ` jamal
  2005-01-17 22:59   ` Christoph Hellwig
  2005-01-17 21:42 ` David S. Miller
  1 sibling, 1 reply; 16+ messages in thread
From: jamal @ 2005-01-16 14:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David S. Miller, pavlic, waldi, netdev

I could be missing something:
Its almost like the "cards" in these drivers deserve to be their own
netdevices to begin with. In other words the qeth approach itself seems
to be a hack to begin with. 
Should be along the lines:
--->Physical netdevice
  -->some demux code to select instance
    --> Virtual netdevice specific to OS instance
        ---> standard linux path

This approach would be no different from say any tunnel driver code.
If someone fixes them to be separate netdevices, then there would be any
need to speacial case code in the stack just for them.
Routing, VLANS etc would all work fine.

I know its a lot of work to make those changes - but it is probably
cleaner to just keep it as it was before your patch until someone
converts these into netdevices. Either that or i missed something that
forces the cards to be modeled the way they were.

cheers,
jamal


On Sun, 2005-01-16 at 06:54, Christoph Hellwig wrote:
> Let's try again solvign the EUI64 generation for S/390.  I looked over
> the IBM patch and I think it can be done a lot simpler:
> 
>  - put a dev_id field in struct net_device, so that it uses space that
>    would be wasted by padding otherwise.
>  - if this fields is non-null let ipv6_generate_eui64 use the algorithm
>    from the QETH code to generate an EUI that's different for each
>    OS instance.  See code comments for details.
> 
> 
> --- 1.23/drivers/s390/net/qeth_main.c	2005-01-04 00:49:39 +01:00
> +++ edited/drivers/s390/net/qeth_main.c	2005-01-16 12:33:52 +01:00
> @@ -5033,27 +5033,6 @@
>  	return 0;
>  }
>  
> -#ifdef CONFIG_QETH_IPV6
> -int
> -qeth_ipv6_generate_eui64(u8 * eui, struct net_device *dev)
> -{
> -	switch (dev->type) {
> -	case ARPHRD_ETHER:
> -	case ARPHRD_FDDI:
> -	case ARPHRD_IEEE802_TR:
> -		if (dev->addr_len != ETH_ALEN)
> -			return -1;
> -		memcpy(eui, dev->dev_addr, 3);
> -		memcpy(eui + 5, dev->dev_addr + 3, 3);
> -		eui[3] = (dev->dev_id >> 8) & 0xff;
> -		eui[4] = dev->dev_id & 0xff;
> -		return 0;
> -	}
> -	return -1;
> -
> -}
> -#endif
> -
>  static void
>  qeth_get_mac_for_ipm(__u32 ipm, char *mac, struct net_device *dev)
>  {
> @@ -5587,11 +5566,8 @@
>  	}
>  #ifdef CONFIG_QETH_IPV6
>  	/*IPv6 address autoconfiguration stuff*/
> -	card->dev->dev_id = card->info.unique_id & 0xffff;
>  	if (!(card->info.unique_id & UNIQUE_ID_NOT_BY_CARD))
> -		card->dev->generate_eui64 = qeth_ipv6_generate_eui64;
> -
> -
> +		card->dev->dev_id = card->info.unique_id & 0xffff;
>  #endif
>  	dev->hard_header_parse = NULL;
>  	dev->set_mac_address = qeth_layer2_set_mac_address;
> --- 1.95/include/linux/netdevice.h	2005-01-10 21:23:55 +01:00
> +++ edited/include/linux/netdevice.h	2005-01-16 12:32:07 +01:00
> @@ -345,6 +345,7 @@
>  	unsigned char		broadcast[MAX_ADDR_LEN];	/* hw bcast add	*/
>  	unsigned char		dev_addr[MAX_ADDR_LEN];	/* hw address	*/
>  	unsigned char		addr_len;	/* hardware address length	*/
> +	unsigned short          dev_id;		/* for shared network cards */
>  
>  	struct dev_mc_list	*mc_list;	/* Multicast mac addresses	*/
>  	int			mc_count;	/* Number of installed mcasts	*/
> --- 1.128/net/ipv6/addrconf.c	2005-01-14 22:30:07 +01:00
> +++ edited/net/ipv6/addrconf.c	2005-01-16 12:29:51 +01:00
> @@ -1079,10 +1079,29 @@
>  		if (dev->addr_len != ETH_ALEN)
>  			return -1;
>  		memcpy(eui, dev->dev_addr, 3);
> -		memcpy(eui + 5, dev->dev_addr+3, 3);
> -		eui[3] = 0xFF;
> -		eui[4] = 0xFE;
> -		eui[0] ^= 2;
> +		memcpy(eui + 5, dev->dev_addr + 3, 3);
> +
> +		/*
> +		 * The zSeries OSA network cards can be shared among various
> +		 * OS instances, but the OSA cards have only one MAC address.
> +		 * This leads to duplicate address conflicts in conjunction
> +		 * with IPv6 if more than one instance uses the same card.
> +		 * 
> +		 * The driver for these cards can deliver a unique 16-bit
> +		 * identifier for each instance sharing the same card.  It is
> +		 * placed instead of 0xFFFE in the interface identifier.  The
> +		 * "u" bit of the interface identifier is not inverted in this
> +		 * case.  Hence the resulting interface identifier has local
> +		 * scope according to RFC2373.
> +		 */
> +		if (dev->dev_id) {
> +			eui[3] = (dev->dev_id >> 8) & 0xFF;
> +			eui[4] = dev->dev_id & 0xFF;
> +		} else {
> +			eui[3] = 0xFF;
> +			eui[4] = 0xFE;
> +			eui[0] ^= 2;
> +		}
>  		return 0;
>  	case ARPHRD_ARCNET:
>  		/* XXX: inherit EUI-64 from other interface -- yoshfuji */
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-16 11:54 [PATCH] ipv6: alternative version of S/390 shared NIC support Christoph Hellwig
  2005-01-16 14:30 ` jamal
@ 2005-01-17 21:42 ` David S. Miller
  2005-01-17 22:28   ` jamal
  1 sibling, 1 reply; 16+ messages in thread
From: David S. Miller @ 2005-01-17 21:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: pavlic, waldi, netdev

On Sun, 16 Jan 2005 12:54:31 +0100
Christoph Hellwig <hch@lst.de> wrote:

> Let's try again solvign the EUI64 generation for S/390.  I looked over
> the IBM patch and I think it can be done a lot simpler:
> 
>  - put a dev_id field in struct net_device, so that it uses space that
>    would be wasted by padding otherwise.
>  - if this fields is non-null let ipv6_generate_eui64 use the algorithm
>    from the QETH code to generate an EUI that's different for each
>    OS instance.  See code comments for details.

I'll apply this patch.

It might be nice to genericize that huge comment.  This will likely
be not the last time we see devices which have this characteristic.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-17 21:42 ` David S. Miller
@ 2005-01-17 22:28   ` jamal
  2005-01-17 22:42     ` David S. Miller
  0 siblings, 1 reply; 16+ messages in thread
From: jamal @ 2005-01-17 22:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: Christoph Hellwig, pavlic, waldi, netdev

Dave,

Refer to my comments:
any device that ends up having to do that is a hack in my opinion;->
The dev_id maybe useful for other things. the qeth driver needs fixing
so it can stack the per-os instance interfaces.

cheers,
jamal

On Mon, 2005-01-17 at 16:42, David S. Miller wrote:
> On Sun, 16 Jan 2005 12:54:31 +0100
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > Let's try again solvign the EUI64 generation for S/390.  I looked over
> > the IBM patch and I think it can be done a lot simpler:
> > 
> >  - put a dev_id field in struct net_device, so that it uses space that
> >    would be wasted by padding otherwise.
> >  - if this fields is non-null let ipv6_generate_eui64 use the algorithm
> >    from the QETH code to generate an EUI that's different for each
> >    OS instance.  See code comments for details.
> 
> I'll apply this patch.
> 
> It might be nice to genericize that huge comment.  This will likely
> be not the last time we see devices which have this characteristic.
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-17 22:28   ` jamal
@ 2005-01-17 22:42     ` David S. Miller
  2005-01-17 22:54       ` jamal
  0 siblings, 1 reply; 16+ messages in thread
From: David S. Miller @ 2005-01-17 22:42 UTC (permalink / raw)
  To: hadi; +Cc: hch, pavlic, waldi, netdev

On 17 Jan 2005 17:28:34 -0500
jamal <hadi@cyberus.ca> wrote:

> Refer to my comments:
> any device that ends up having to do that is a hack in my opinion;->
> The dev_id maybe useful for other things. the qeth driver needs fixing
> so it can stack the per-os instance interfaces.

I agree but...

At least Christian's patch is 100 times cleaner than what IBM
has proposed.  People have been "talking" endlessly about providing
a clean version of the infrastructure necessary to support what
qeth needs for EUI64 generation, but everyone did nothing but
talk and suggest.

Christian backed up his "talk" with "code", and since his changes
aren't that unreasonable they go in :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-17 22:42     ` David S. Miller
@ 2005-01-17 22:54       ` jamal
  0 siblings, 0 replies; 16+ messages in thread
From: jamal @ 2005-01-17 22:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: hch, pavlic, waldi, netdev

On Mon, 2005-01-17 at 17:42, David S. Miller wrote:

> At least Christian's patch is 100 times cleaner than what IBM
> has proposed.  People have been "talking" endlessly about providing
> a clean version of the infrastructure necessary to support what
> qeth needs for EUI64 generation, but everyone did nothing but
> talk and suggest.

I think theyd shoot me if i rewrote the driver ;-> And i probably wont
have the cycles. 
But id be more than happy to work with anyone making the changes.

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-16 14:30 ` jamal
@ 2005-01-17 22:59   ` Christoph Hellwig
  2005-01-17 23:11     ` jamal
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2005-01-17 22:59 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, pavlic, waldi, netdev

On Sun, Jan 16, 2005 at 09:30:40AM -0500, jamal wrote:
> I could be missing something:
> Its almost like the "cards" in these drivers deserve to be their own
> netdevices to begin with. In other words the qeth approach itself seems
> to be a hack to begin with. 
> Should be along the lines:
> --->Physical netdevice
>   -->some demux code to select instance
>     --> Virtual netdevice specific to OS instance
>         ---> standard linux path

The problem is again that the OS instances don't talk to each
(well they can, but not in the driver), so your demux code would
have to move into the device firmware, and that'd probably change
the device <-> OS interface completely.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-17 22:59   ` Christoph Hellwig
@ 2005-01-17 23:11     ` jamal
  2005-01-17 23:37       ` Christian Bornträger
  0 siblings, 1 reply; 16+ messages in thread
From: jamal @ 2005-01-17 23:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David S. Miller, pavlic, waldi, netdev

On Mon, 2005-01-17 at 17:59, Christoph Hellwig wrote:

> The problem is again that the OS instances don't talk to each
> (well they can, but not in the driver), so your demux code would
> have to move into the device firmware, and that'd probably change
> the device <-> OS interface completely.

qeth_qdio_input_handler() seems to know what "card" the packet came on,
no? 
Could you not use this information to select the proper netdevice?

Actually, how does that work? Each OS instance would get the same driver
invoked but you will never see any othe instances "cards"?

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-17 23:11     ` jamal
@ 2005-01-17 23:37       ` Christian Bornträger
  2005-01-18  0:49         ` jamal
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Bornträger @ 2005-01-17 23:37 UTC (permalink / raw)
  To: hadi; +Cc: Christoph Hellwig, David S. Miller, pavlic, waldi, netdev

jamal wrote:
> On Mon, 2005-01-17 at 17:59, Christoph Hellwig wrote:
> > The problem is again that the OS instances don't talk to each
> > (well they can, but not in the driver), so your demux code would
> > have to move into the device firmware, and that'd probably change
> > the device <-> OS interface completely.
>
> qeth_qdio_input_handler() seems to know what "card" the packet came on,
> no? Could you not use this information to select the proper netdevice?
> Actually, how does that work? Each OS instance would get the same driver
> invoked but you will never see any othe instances "cards"?

I am trying a small simplification here:
Each physical network adapter offers hundreds of device addresses. You need 
3 of them to have one logical network adapter(read,write,data). S/390 has 
hardware supported virtualization. Therefore can then use the hypervisor 
(LPAR or z/VM) to give specific LPARs or VM guests exactly 3 device 
addresses out of these hundreds. 
The qeth driver has to register the IP address at the logical network card 
(using 3 device addresses) Afterwards the physical network card knows which 
packet belongs to which device numbers.

cheers Christian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-17 23:37       ` Christian Bornträger
@ 2005-01-18  0:49         ` jamal
  2005-01-18 15:53           ` Christian Bornträger
  0 siblings, 1 reply; 16+ messages in thread
From: jamal @ 2005-01-18  0:49 UTC (permalink / raw)
  To: Christian Bornträger
  Cc: Christoph Hellwig, David S. Miller, pavlic, waldi, netdev

On Mon, 2005-01-17 at 18:37, Christian Bornträger wrote:

> I am trying a small simplification here:
> Each physical network adapter offers hundreds of device addresses. You need 
> 3 of them to have one logical network adapter(read,write,data).

the "card" concept is what you call network adapter, correct?
I take it that read and write are control channels and data is where the
skb comes through?
 
>  S/390 has 
> hardware supported virtualization. Therefore can then use the hypervisor 
> (LPAR or z/VM) to give specific LPARs or VM guests exactly 3 device 
> addresses out of these hundreds.

Can you provision multiple of these cards per VM? if yes, is there some
ID that will break it down to OSInstance:cardid?

> The qeth driver has to register the IP address at the logical network card 
> (using 3 device addresses) Afterwards the physical network card knows which 
> packet belongs to which device numbers.

I think i understood  but confused: before you attach IP address though,
you cant receive packets? Is there a concept of MAC address which you
can pass to the hypervisor or can you run in promiscmous mode?

Another question: When that driver runs for the physical card - it runs
in the context of a specific VM, correct? In other words it would be
impossible to see the "card" of another instance? 

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-18  0:49         ` jamal
@ 2005-01-18 15:53           ` Christian Bornträger
  2005-01-18 18:25             ` Frank Pavlic
  2005-01-19 13:49             ` jamal
  0 siblings, 2 replies; 16+ messages in thread
From: Christian Bornträger @ 2005-01-18 15:53 UTC (permalink / raw)
  To: hadi; +Cc: Christoph Hellwig, David S. Miller, pavlic, waldi, netdev

Frank, please correct me, if I am wrong....

jamal wrote:
> On Mon, 2005-01-17 at 18:37, Christian Bornträger wrote:
> > I am trying a small simplification here:
> > Each physical network adapter offers hundreds of device addresses. You
> > need 3 of them to have one logical network adapter(read,write,data).
>
> the "card" concept is what you call network adapter, correct?
> I take it that read and write are control channels and data is where the
> skb comes through?

don't ask me about naming....

> >  S/390 has
> > hardware supported virtualization. Therefore can then use the
> > hypervisor (LPAR or z/VM) to give specific LPARs or VM guests exactly 3
> > device addresses out of these hundreds.
>
> Can you provision multiple of these cards per VM? if yes, is there some
> ID that will break it down to OSInstance:cardid?
> > The qeth driver has to register the IP address at the logical network
> > card (using 3 device addresses) Afterwards the physical network card
> > knows which packet belongs to which device numbers.
>
> I think i understood  but confused: before you attach IP address though,
> you cant receive packets? Is there a concept of MAC address which you
> can pass to the hypervisor or can you run in promiscmous mode?

Right, without registering the IP address, you can not receive any packet.
As the logical network interface has no own MAC address you actually speak 
IP to the card. That also means, that without some additional effort, tools 
like tcpdump fail and you need some patches in the dhcp tools. 
You can define options for routers to get more than your own packages, but 
IIRC you can only define a primary and secondary router per port. 

>
> Another question: When that driver runs for the physical card - it runs

The driver never ever runs for the physical card. It runs for 3 devices 
addresses, which are already behind a layer of virtualization and represent 
a logical IP stack in the physical card. To Linux it looks like a physical 
card. 

> in the context of a specific VM, correct? In other words it would be
> impossible to see the "card" of another instance?

By VM you mean virtual machine? Right. We are talking about real hardware 
emulation. Think about it as VMWARE in hardware or with hardware support.
So you have no access to logical cards of other Linuxes. 
One difference to real hardware is, that multiple Linuxes have the same MAC 
address. The physical OSA network card and z/VM ensure, that incoming 
packets are delivered to the right Linux. 


cheers 
Christian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-18 15:53           ` Christian Bornträger
@ 2005-01-18 18:25             ` Frank Pavlic
  2005-01-19 13:49             ` jamal
  1 sibling, 0 replies; 16+ messages in thread
From: Frank Pavlic @ 2005-01-18 18:25 UTC (permalink / raw)
  To: Christian Bornträger
  Cc: David S. Miller, hadi, Christoph Hellwig, netdev, waldi








Christian Bornträger <christian@borntraeger.net> wrote on 18.01.2005
16:53:58:

> Frank, please correct me, if I am wrong....

It's all correct ....
sorry Christian but I didn't make it to answer :-(

>
> jamal wrote:
> > On Mon, 2005-01-17 at 18:37, Christian Bornträger wrote:
> > > I am trying a small simplification here:
> > > Each physical network adapter offers hundreds of device addresses.
You
> > > need 3 of them to have one logical network adapter(read,write,data).
> >
> > the "card" concept is what you call network adapter, correct?
> > I take it that read and write are control channels and data is where
the
> > skb comes through?
>
> don't ask me about naming....
>
> > >  S/390 has
> > > hardware supported virtualization. Therefore can then use the
> > > hypervisor (LPAR or z/VM) to give specific LPARs or VM guests exactly
3
> > > device addresses out of these hundreds.
> >
> > Can you provision multiple of these cards per VM? if yes, is there some
> > ID that will break it down to OSInstance:cardid?
> > > The qeth driver has to register the IP address at the logical network
> > > card (using 3 device addresses) Afterwards the physical network card
> > > knows which packet belongs to which device numbers.
> >
> > I think i understood  but confused: before you attach IP address
though,
> > you cant receive packets? Is there a concept of MAC address which you
> > can pass to the hypervisor or can you run in promiscmous mode?
>
> Right, without registering the IP address, you can not receive any
packet.
> As the logical network interface has no own MAC address you actually
speak
> IP to the card. That also means, that without some additional effort,
tools
> like tcpdump fail and you need some patches in the dhcp tools.
> You can define options for routers to get more than your own packages,
but
> IIRC you can only define a primary and secondary router per port.
>
> >
> > Another question: When that driver runs for the physical card - it runs
>
> The driver never ever runs for the physical card. It runs for 3 devices
> addresses, which are already behind a layer of virtualization and
represent
> a logical IP stack in the physical card. To Linux it looks like a
physical
> card.
>
> > in the context of a specific VM, correct? In other words it would be
> > impossible to see the "card" of another instance?
>
> By VM you mean virtual machine? Right. We are talking about real hardware

> emulation. Think about it as VMWARE in hardware or with hardware support.
> So you have no access to logical cards of other Linuxes.
> One difference to real hardware is, that multiple Linuxes have the same
MAC
> address. The physical OSA network card and z/VM ensure, that incoming
> packets are delivered to the right Linux.
>
>
> cheers
> Christian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-18 15:53           ` Christian Bornträger
  2005-01-18 18:25             ` Frank Pavlic
@ 2005-01-19 13:49             ` jamal
  2005-01-19 20:52               ` Christian Borntraeger
  2005-01-19 21:32               ` Frank Pavlic
  1 sibling, 2 replies; 16+ messages in thread
From: jamal @ 2005-01-19 13:49 UTC (permalink / raw)
  To: Christian Bornträger
  Cc: Christoph Hellwig, David S. Miller, pavlic, waldi, netdev


On Tue, 2005-01-18 at 10:53, Christian Bornträger wrote:
> Frank, please correct me, if I am wrong....
> 
> jamal wrote:
> > On Mon, 2005-01-17 at 18:37, Christian Bornträger wrote:
> > > I am trying a small simplification here:
> > > Each physical network adapter offers hundreds of device addresses. You
> > > need 3 of them to have one logical network adapter(read,write,data).
> >
> > the "card" concept is what you call network adapter, correct?
> > I take it that read and write are control channels and data is where the
> > skb comes through?
> 
> don't ask me about naming....

thats fine, 
I think it doesnt matter what they are used for; important part is
you need all 3 addresses to have a "card"; so got it.

> 
> > >  S/390 has
> > > hardware supported virtualization. Therefore can then use the
> > > hypervisor (LPAR or z/VM) to give specific LPARs or VM guests exactly 3
> > > device addresses out of these hundreds.
> >
> > Can you provision multiple of these cards per VM? if yes, is there some
> > ID that will break it down to OSInstance:cardid?

You did not answer this question. 
Let me draw a diagram to show what i think the hierachy is:

  Physical Card: MAC address X
     |
     |
     +--- OSInstance A
     |        |
     |        +-- "CARD" with IP A
     |        +-- "CARD" with IP B
     |        +-- "CARD" with IP C
     |        +-- "CARD" with IP D
     .         
     .
     .
     |
     +--- OSInstance N
              |
              +-- "CARD" with IP Z


Is the above reflective of what happens?
In other words, packet comes from the wire (with MAC address X); somehow
the hypervisor(?) or firmware figures based on IP address A (assuming no
other instance has that IP) it has to send packet to OSInstanceA.
OSInstanceA then selects further the CARD based on something probably in
a descriptor?

Let me get to the point:
I think it would make sense for the "CARD" to be just another netdevice
(call it "card" netdevice for this discussion).
The representation of the physical card in the OSInstance is also a
netdevice(call it physical netdevice for this discussion) as it is now
(excpet it has no IP address ever). 
The "card netdevices" are stacked on top of the physical netdevice. This
would be like an upside down bridge stacking relationship of
netdevices....
It actually is no different from a few tunnel netdevices that sit on top
of say eth0 or multiple PPP devices on top of ethx in a PPPOE
relationship.
The demuxing for incoming packets is done at physical card netdevice
to select the "card" netdevice whose receive method is then called.
Reverse direction for transmit (we could go into details later, just
wanna make sure this is sensible to begin with).
Does this sound reasonable? If yes, then if you do this you wont need to
hack anything like IPV6 etc in your driver - they become merely
netdevices. It should also allow for all standard features like ifconfig
up/down etc of the "card" and setting IP addresses, VLANS etc to work as
is. And you wont need to put any speacilized code in the driver.
If its off tangent, then i just wasted 1/2 a cup of coffee energy typing
away ;->

> Right, without registering the IP address, you can not receive any packet.

If this is firmware issue, it would be wise to fix it. You should be
able to register multiple MAC addresses hidden in the firmware (not at
the Linux level) and have your "cards" netdevice use them. i.e the
"card" netdevices would own those.

> As the logical network interface has no own MAC address you actually speak 
> IP to the card. That also means, that without some additional effort, tools 
> like tcpdump fail and you need some patches in the dhcp tools.

Refer to above. If you actually have your virtual/"card" netdevice on
top of the physical netdevice have a MAC address, then all tools should
work as is with zero changes IMO.

>  
> You can define options for routers to get more than your own packages, but 
> IIRC you can only define a primary and secondary router per port. 
> 

Ok, this is new information - what does a "router" mean in relation to a
port?

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-19 13:49             ` jamal
@ 2005-01-19 20:52               ` Christian Borntraeger
  2005-01-19 21:32               ` Frank Pavlic
  1 sibling, 0 replies; 16+ messages in thread
From: Christian Borntraeger @ 2005-01-19 20:52 UTC (permalink / raw)
  To: hadi; +Cc: Christoph Hellwig, David S. Miller, pavlic, waldi, netdev

jamal wrote:
[...]
> > > Can you provision multiple of these cards per VM? if yes, is there
> > > some ID that will break it down to OSInstance:cardid?
>
> You did not answer this question.
> Let me draw a diagram to show what i think the hierachy is:
>
>   Physical Card: MAC address X
>
>
>      +--- OSInstance A
>
>      |        +-- "CARD" with IP A
>      |        +-- "CARD" with IP B
>      |        +-- "CARD" with IP C
>      |        +-- "CARD" with IP D
>
>      .
>      .
>      .
>
>      +--- OSInstance N
>
>               +-- "CARD" with IP Z
>
>
> Is the above reflective of what happens?

No. Its in the simplest case a one level hierarchy, but always outside 
Linux. 
Just for demonstration purposes, just IMAGINE, that a cards has only one 
device address (devno):

-----net-------\
               ||
            |hardware          |
            \------------------/
             |  | | | | |     | 
  /--devno---/  | | | | |     |
  |             | . . . .     |
  |             |             |
Linux1       Linux2  ....... Linuxn

> In other words, packet comes from the wire (with MAC address X); somehow
> the hypervisor(?) or firmware figures based on IP address A (assuming no
> other instance has that IP) it has to send packet to OSInstanceA.

Right. Basically, the card has a table:
DEVNO  | IP address | ROUTER?
0-ffff | ....       | primary, secondary 

If then a packet arrives it has an IP address.
The card looks this address up in this table:

   |
   |
   V
is it registered             ----yes----> forward to the device number
   |
   |
   V
is there a primary router?   ----yes----> forward to device number of the
   |                                      the primary router
   |
   |
   V
is there a secondary router? ----yes----> forward to device number of the
   |                                      the secondary router
   |
   V
drop packet. 

> OSInstanceA then selects further the CARD based on something probably in
> a descriptor?

No. The packet arrives on one "CARD" and Linux gets an "interrupt" and 
fetches the packet from that card.  You can of course have more cards per 
linux by providing more device numbers in the profile of the virtual 
machine. But that makes no sense unless you do some testing or the guest 
operating system is a hypervisor itself and can dispatch the device 
addresses to its virtual machines. 

>
> Let me get to the point:
> I think it would make sense for the "CARD" to be just another netdevice
> (call it "card" netdevice for this discussion).
> The representation of the physical card in the OSInstance is also a
> netdevice(call it physical netdevice for this discussion) as it is now
> (excpet it has no IP address ever).
> The "card netdevices" are stacked on top of the physical netdevice. This
> would be like an upside down bridge stacking relationship of
> netdevices....
> It actually is no different from a few tunnel netdevices that sit on top
> of say eth0 or multiple PPP devices on top of ethx in a PPPOE
> relationship.
> The demuxing for incoming packets is done at physical card netdevice
> to select the "card" netdevice whose receive method is then called.
> Reverse direction for transmit (we could go into details later, just
> wanna make sure this is sensible to begin with).

If I understood you right, you think that Linux has any control of the real 
hardware device and the card itself is created in the qeth linux driver as 
some obscure thing.  
No, for Linux the device backed by the 3 device addresses is as real as you 
and me. The qeth driver calls alloc_etherdev for the device represented by 
the 3 device addresses and, therefore, creating a netdevice for this card. 
Thats all we have. A card that only speaks IPv4 and IPv6 with us and that 
handles ARP for magically. To make it worse, the card only cooperates if we 
tell the card, which IP addresses we want to use. (by the way, the card 
recognizes duplicates)


> Does this sound reasonable? If yes, then if you do this you wont need to
> hack anything like IPV6 etc in your driver - they become merely
> netdevices. It should also allow for all standard features like ifconfig
> up/down etc of the "card" and setting IP addresses, VLANS etc to work as
> is. And you wont need to put any speacilized code in the driver.
> If its off tangent, then i just wasted 1/2 a cup of coffee energy typing
> away ;->

Well, I think  you havent been fully aware of the way the card works. I hope 
my explanation above makes it clearer. 
In one sentence: our Linux qeth "CARDS" _are_ already netdevices. (We are 
speaking about struct net_device, right?)

Furthermore, as long as there is only one MAC address we need one hack for 
IPv6, and Christophs version looks very nice and elegant. 

>
> > Right, without registering the IP address, you can not receive any
> > packet.
>
> If this is firmware issue, it would be wise to fix it. You should be
> able to register multiple MAC addresses hidden in the firmware (not at
> the Linux level) and have your "cards" netdevice use them. i.e the
> "card" netdevices would own those.

Well,newer cards support something like this (see the hardware announcement) 
http://www-306.ibm.com/common/ssi/fcgi-bin/ssialias?infotype=an&subtype=ca&supplier=897&appname=IBMLinkRedirect&letternum=ENUS104-346
) its called Layer 2 support. Unfortunately this feature is only available 
on a subset of cards and newer zSeries machines.

Just for explanation, there is a good reason that you only see the packets 
for your IP address: scalability of virtualization. Imagine you have a 
gigabit card shared among 80 Linuxes (quite realistic numbers). If you dont 
filter early, you have to forward all traffic to every guest system (which 
then can discard unneeded packages).  Then you have to provide an internal 
bandwith of 80 Gbit/sec. Not a good idea. Therefore you have to multiplex 
very very early and cannot set every Linux in promiscous mode for 
performance reasons. 

Furthermore, it is not that easy to change the way the card works. There are 
other operating systems like z/OS z/VSE or z/VM running on the same 
hardware. You dont hastily change the behaviour of hardware, firmware and 
operating systems which runs in lots of banks and insurance companies with 
99.999% availability without making sure that everything works flawlessly. 


> > As the logical network interface has no own MAC address you actually
> > speak IP to the card. That also means, that without some additional
> > effort, tools like tcpdump fail and you need some patches in the dhcp
> > tools.
>
> Refer to above. If you actually have your virtual/"card" netdevice on
> top of the physical netdevice have a MAC address, then all tools should
> work as is with zero changes IMO.

Right. Buts that just not the way the hardware works. 

Pick almost any random linux driver to see ugly code to cicumvent a hardware 
limitation. Have a look at the amount of traps in the kernel to fix cpu 
"errata". There is no perfect hardware. 

> > You can define options for routers to get more than your own packages,
> > but IIRC you can only define a primary and secondary router per port.
>
> Ok, this is new information - what does a "router" mean in relation to a
> port?

see above. You can have one primary and one secondary router. 

I hope that helps to understand the problem. If not, please dont hesitate to 
ask. 

Christian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-19 13:49             ` jamal
  2005-01-19 20:52               ` Christian Borntraeger
@ 2005-01-19 21:32               ` Frank Pavlic
  2005-01-20  4:47                 ` jamal
  1 sibling, 1 reply; 16+ messages in thread
From: Frank Pavlic @ 2005-01-19 21:32 UTC (permalink / raw)
  To: hadi
  Cc: Christian Bornträger, David S. Miller, Christoph Hellwig,
	netdev, waldi







Mit  freundlichen Grüssen / Best regards
Frank Pavlic

Linux for eServer Development
Schoenaicher Str. 220, 71032 Boeblingen
Phone:  ext. +49-(0)7031/16-2463, int. *120-2463
mailto:   pavlic@de.ibm.com




jamal <hadi@cyberus.ca> wrote on 19.01.2005 14:49:28:

>
> On Tue, 2005-01-18 at 10:53, Christian Bornträger wrote:
> > Frank, please correct me, if I am wrong....
> >
> > jamal wrote:
> > > On Mon, 2005-01-17 at 18:37, Christian Bornträger wrote:
> > > > I am trying a small simplification here:
> > > > Each physical network adapter offers hundreds of device addresses.
You
> > > > need 3 of them to have one logical network
adapter(read,write,data).
> > >
> > > the "card" concept is what you call network adapter, correct?
> > > I take it that read and write are control channels and data is where
the
> > > skb comes through?
> >
> > don't ask me about naming....
>
> thats fine,
> I think it doesnt matter what they are used for; important part is
> you need all 3 addresses to have a "card"; so got it.
>
> >
> > > >  S/390 has
> > > > hardware supported virtualization. Therefore can then use the
> > > > hypervisor (LPAR or z/VM) to give specific LPARs or VM guests
exactly 3
> > > > device addresses out of these hundreds.
> > >
> > > Can you provision multiple of these cards per VM? if yes, is there
some
> > > ID that will break it down to OSInstance:cardid?
>
> You did not answer this question.
> Let me draw a diagram to show what i think the hierachy is:
>
>   Physical Card: MAC address X
>      |
>      |
>      +--- OSInstance A
>      |        |
>      |        +-- "CARD" with IP A
>      |        +-- "CARD" with IP B
>      |        +-- "CARD" with IP C
>      |        +-- "CARD" with IP D
>      .
>      .
>      .
>      |
>      +--- OSInstance N
>               |
>               +-- "CARD" with IP Z
>
>
> Is the above reflective of what happens?

> In other words, packet comes from the wire (with MAC address X); somehow
> the hypervisor(?) or firmware figures based on IP address A (assuming no
> other instance has that IP) it has to send packet to OSInstanceA.
> OSInstanceA then selects further the CARD based on something probably in
> a descriptor?




>
> Let me get to the point:
> I think it would make sense for the "CARD" to be just another netdevice
> (call it "card" netdevice for this discussion).
> The representation of the physical card in the OSInstance is also a
> netdevice(call it physical netdevice for this discussion) as it is now
> (excpet it has no IP address ever).
> The "card netdevices" are stacked on top of the physical netdevice. This
> would be like an upside down bridge stacking relationship of
> netdevices....
Let me break your description here. The following two lines are an output
from /proc/qeth.
Issueing a cat /proc/qeth you will get all to this Linux system frank1
attached and configured devices :
0.0.f504/0.0.f505/0.0.f503   xB5   eth0       OSD_1000       0    sw
always_q_2 no   no   64k   16
0.0.f506/0.0.f507/0.0.f508  xB5   eth1       OSD_1000       0    sw
always_q_0 no   no   64k   16

This output shows you already that for every device triple we are
initializing and registering one struct net_device at the
Linux network stack.

But let me take these two lines for trying to explain you how this stuff is
working now ,means how packets make
their way from OSA to the network stack .
As you mentioned above an ethernet frame comes from the wire .
Reading the IP address OSA (firmware or not, I don't care :-) ) checks his
local address table
if an entry for this IP address is existing. Since we have to register the
IP address for eth0 and eth1
OSA knows which read/write/data channel belongs to which IP address.
So if an entry is existing in OSA's table  firmware puts the packet without
Ethernet Header on the corresponding
data device (it's too complicated to explain in short how it really works
but for this description it's ok ) and initiate
an interrupt on the same data device of course. Now the interrupt arrives
at qeth driver.
Out of the interrupt information which is passed to qeth interrupt handler
we are able to determine on which
data device the packet has arrived and thus we know the struct net_device.
 qeth processes then the interrupt, some error checks,  allocating  skb and
put the appropriate members like protocol , pkt_type and so on and passes
the IP packet to the stack !
>From Linux frank1 point of view it has two pyhsical devices ,eth0 and eth1
,running . But the truth is
that both are sending packets out through the same physical built OSA card
,but packets going out to eth0 have to
take another way through the huge machine,means different data channel,
than packets going out through eth1 .
What do you have to do on a x86  to be able to send packets out to the
world and of course receive it on different ways?
 well you have to put in two network cards on  your PCI bus ,motherboard
respectively ...
Maybe I can answer your questions from above now :
"CARD" IP address A is nothing else than eth0 (ok to be clear: struct
qeth_card is more correctly but in struct
qeth_card we have a struct net_device member ) with IP address A, CARD with
IP address B eth1 ,and so on ...

so packets for eth0 will come directly to "CARD" and so of course to
OSInstanceA but OSInstanceA does not make
any decisions at all ,the interrupt information passed to qeth driver is
clear enough .

> It actually is no different from a few tunnel netdevices that sit on top
> of say eth0 or multiple PPP devices on top of ethx in a PPPOE
> relationship.

Yes it is different since you have for every device triple one struct
net_device registered at network stack.

The problem with IPv6 is that other Linux systems like frank2 ,frank3 ....
have also device triples configured (but different from
linux 1 e.g. f508,f509,f510 on frank2 , f512,f513,f514 on frank3 ,...)
which are also from the same OSA card as frank1 has already,
all struct net_device will all get the same MAC address (one MAC per
physical network card). generating an EUI64 address which is
the base for an automatically generated IPv6 address for ethernet hardware
all ethx on all the different frank systems will get
the same IPv6 address and the result is that IPv6 traffic will stall after
a few seconds ....


> The demuxing for incoming packets is done at physical card netdevice
> to select the "card" netdevice whose receive method is then called.
> Reverse direction for transmit (we could go into details later, just
> wanna make sure this is sensible to begin with).
> Does this sound reasonable? If yes, then if you do this you wont need to
> hack anything like IPV6 etc in your driver - they become merely
> netdevices. It should also allow for all standard features like ifconfig
> up/down etc of the "card" and setting IP addresses, VLANS etc to work as
> is. And you wont need to put any speacilized code in the driver.
> If its off tangent, then i just wasted 1/2 a cup of coffee energy typing
> away ;->

I'm sorry to hear this , I can make you some if you want ;-)

>
> > Right, without registering the IP address, you can not receive any
packet.
>
> If this is firmware issue, it would be wise to fix it. You should be
> able to register multiple MAC addresses hidden in the firmware (not at
> the Linux level) and have your "cards" netdevice use them. i.e the
> "card" netdevices would own those.
>
> > As the logical network interface has no own MAC address you actually
speak
> > IP to the card. That also means, that without some additional effort,
tools
> > like tcpdump fail and you need some patches in the dhcp tools.

CORRECT, tcpdump and dhcp has to be patched

I hope my description helps to get a better understanding of the IPv6
problem and how this
works ,if not just say no to save the another 1/2 cup of coffee ;-)
if you are interested to get more detailed information about OSA running in
Linux I can give you a link
to a documentation which describes this stuff pretty well !

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv6: alternative version of S/390 shared NIC support
  2005-01-19 21:32               ` Frank Pavlic
@ 2005-01-20  4:47                 ` jamal
  0 siblings, 0 replies; 16+ messages in thread
From: jamal @ 2005-01-20  4:47 UTC (permalink / raw)
  To: Frank Pavlic
  Cc: Christian Bornträger, David S. Miller, Christoph Hellwig,
	netdev, waldi

On Wed, 2005-01-19 at 16:32, Frank Pavlic wrote:

> Mit  freundlichen Grüssen / Best regards
> Frank Pavlic

Ok, thanks to both you and Christian for the explanations.
I think i understand better now. So essentially the hypervisor
assumes the OSInstance speaks L3 (but not exactly L2);->
Yep, that would be a little challenging .. I have other suggestions
on resolving some of this but i sympathize with the process you have to
go through so i wont burden you with it;-> I think the patch that was in
is good nuf a solution.

BTW, How did this work out in other OSes you mentioned?
For something as expensive as these beasts i have heard are, I
have to say i am a little suprised by this "deficiency".

cheers,
jamal

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-01-20  4:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-16 11:54 [PATCH] ipv6: alternative version of S/390 shared NIC support Christoph Hellwig
2005-01-16 14:30 ` jamal
2005-01-17 22:59   ` Christoph Hellwig
2005-01-17 23:11     ` jamal
2005-01-17 23:37       ` Christian Bornträger
2005-01-18  0:49         ` jamal
2005-01-18 15:53           ` Christian Bornträger
2005-01-18 18:25             ` Frank Pavlic
2005-01-19 13:49             ` jamal
2005-01-19 20:52               ` Christian Borntraeger
2005-01-19 21:32               ` Frank Pavlic
2005-01-20  4:47                 ` jamal
2005-01-17 21:42 ` David S. Miller
2005-01-17 22:28   ` jamal
2005-01-17 22:42     ` David S. Miller
2005-01-17 22:54       ` jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).