Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 3/3 net-next] tg3: Add sysfs file to export sensor data
From: Ben Hutchings @ 2012-06-25 21:25 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, netdev, nsujir, mcarlson
In-Reply-To: <1340658258.4344.43.camel@LTIRV-MCHAN1.corp.ad.broadcom.com>

On Mon, 2012-06-25 at 14:04 -0700, Michael Chan wrote:
> On Sat, 2012-06-23 at 16:02 +0100, Ben Hutchings wrote: 
> > Temperature and voltage can be exposed through an hwmon device (which
> > practically means you use multiple attributes with conventional names).
> > Other diagnostics might possible be suitable for ethtool stats,
> > depending on what they are.
> 
> I think we can extract some common and more useful attributes such as
> temperature and voltage, and use the standard hwmon attributes to expose
> them.
> 
> > 
> > If the driver can't easily parse the information (e.g. it varies greatly
> > between the different chips and firmware versions) then a binary
> > attribute or private ioctl might be appropriate.  But generic interfaces
> > really should be considered first.
> > 
> 
> The rest of the bulk data requires too much parsing in the kernel and
> will have to be exposed as binary data.  What do you mean by binary
> attribute?  A new binary sysfs attribute under hwmon?  Or outside of
> hwmon?

A binary sysfs attribute under your PCI device.  (In fact, for wider
userland compatibility, hwmon sysfs attributes should also be under the
PCI device rather than the hwmon device.  Yes, this *is* a weird
convention.)

> And please elaborate on the private ioctl.

Every driver gets to handle SIOCDEVPRIVATE .. SIOCDEVPRIVATE+15.  But
avoid the first 3, as userland may blindly try to use them for MDIO.
David may of course tell you that you should under no circumstances
actually do this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 3/3 net-next] tg3: Add sysfs file to export sensor data
From: Michael Chan @ 2012-06-25 21:04 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, nsujir, mcarlson
In-Reply-To: <1340463729.6345.25.camel@deadeye.wl.decadent.org.uk>

On Sat, 2012-06-23 at 16:02 +0100, Ben Hutchings wrote: 
> Temperature and voltage can be exposed through an hwmon device (which
> practically means you use multiple attributes with conventional names).
> Other diagnostics might possible be suitable for ethtool stats,
> depending on what they are.

I think we can extract some common and more useful attributes such as
temperature and voltage, and use the standard hwmon attributes to expose
them.

> 
> If the driver can't easily parse the information (e.g. it varies greatly
> between the different chips and firmware versions) then a binary
> attribute or private ioctl might be appropriate.  But generic interfaces
> really should be considered first.
> 

The rest of the bulk data requires too much parsing in the kernel and
will have to be exposed as binary data.  What do you mean by binary
attribute?  A new binary sysfs attribute under hwmon?  Or outside of
hwmon?  And please elaborate on the private ioctl.

Thanks.

^ permalink raw reply

* Re: [RFC net-next (v2) 12/14] ixgbe: set maximal number of default RSS queues
From: Alexander Duyck @ 2012-06-25 21:03 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: eilong, Yuval Mintz, davem, netdev, Jeff Kirsher, John Fastabend
In-Reply-To: <1340649847.2572.15.camel@bwh-desktop.uk.solarflarecom.com>

On 06/25/2012 11:44 AM, Ben Hutchings wrote:
> On Mon, 2012-06-25 at 11:38 -0700, Alexander Duyck wrote:
>> On 06/25/2012 10:53 AM, Eilon Greenstein wrote:
>>> On Mon, 2012-06-25 at 08:44 -0700, Alexander Duyck wrote:
>>>> This doesn't limit queues, only interrupt vectors.  As I told you before
>>>> you should look at the ixgbe_set_rss_queues function if you actually
>>>> want to limit the number of RSS queues.
>>> How about this:
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> index af1a531..23a8609 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> @@ -277,6 +277,8 @@ static inline bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
>>>  	bool ret = false;
>>>  	struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_RSS];
>>>  
>>> +	f->indices = min_t(int, netif_get_num_default_rss_queues(), f->indices);
>>> +
>>>  	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
>>>  		f->mask = 0xF;
>>>  		adapter->num_rx_queues = f->indices;
>>> @@ -302,7 +304,9 @@ static inline bool ixgbe_set_fdir_queues(struct ixgbe_adapter *adapter)
>>>  	bool ret = false;
>>>  	struct ixgbe_ring_feature *f_fdir = &adapter->ring_feature[RING_F_FDIR];
>>>  
>>> -	f_fdir->indices = min_t(int, num_online_cpus(), f_fdir->indices);
>>> +	f_fdir->indices = min_t(int, netif_get_num_default_rss_queues(),
>>> +				f_fdir->indices);
>>> +
>>>  	f_fdir->mask = 0;
>>>  
>>>  	/*
>>> @@ -339,8 +343,7 @@ static inline bool ixgbe_set_fcoe_queues(struct ixgbe_adapter *adapter)
>>>  	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
>>>  		return false;
>>>  
>>> -	f->indices = min_t(int, num_online_cpus(), f->indices);
>>> -
>>> +	f->indices = min_t(int, f->indices, netif_get_num_default_rss_queues());
>>>  	adapter->num_rx_queues = 1;
>>>  	adapter->num_tx_queues = 1;
>>>
>> This makes much more sense, but still needs a few minor changes.  The
>> first change I would recommend is to not alter ixgbe_set_fdir_queues
>> since that controls flow director queues, not RSS queues.  The second
>> would be to update ixgbe_set_dcb_queues since that does RSS per DCB
>> traffic class and will also need to be updated.
> There is a difference between the stated aim (reduce memory allocated
> for RX buffers) and this implementation (limit number of RSS queues
> only).  If we agree on that aim then we should be limiting the total
> number of RX queues.
>
> Ben
The problem is there are certain features that require a certain number
of Tx/Rx queues.  In addition, certain features will behave differently
from RSS in terms of how well they scale based on the number of queues.

For example if we enable DCB we require at least one queue per traffic
class.  In the case of ATR we should have 1 queue per CPU since the Tx
queue selection is based on the number of CPUs.  If we don't have that
1:1 mapping we should be disabling ATR since the feature will not
function correctly in that case.

Thanks,

Alex

^ permalink raw reply

* Re: 3.4-rc: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
From: Francois Romieu @ 2012-06-25 20:25 UTC (permalink / raw)
  To: Tomas Papan; +Cc: netdev
In-Reply-To: <CAMGsXDSSCnOF+CykudBoEjTacSYL4LYG_ZnRTcS+Xzt54LW=Og@mail.gmail.com>

Tomas Papan <tomas.papan@gmail.com> :
[...]
> Is there anything what can I provide or try? I do not want to be stuck
> on 3.2.x kernel.

You can try to revert 036dafa28da1e2565a8529de2ae663c37b7a0060

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: Eldad Zack @ 2012-06-25 20:21 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev
In-Reply-To: <4FE8B535.70900@intel.com>


On Mon, 25 Jun 2012, John Fastabend wrote:
> On 6/25/2012 11:28 AM, Eldad Zack wrote:
> > Hi all,
> > 
> > This series of patches provides a partial LLDP (IEEE Std 802.1ab)
> > implementation.
> 
> Why do you want to implement this in the kernel? We've been doing this
> in user space for sometime without any issues and has some noted
> advantages. Reduces kernel bloat, easier to add extend, etc.

You have a point there. I don't have any real, I just wrote it for fun.

> What are we missing? Can we address any deficiencies here rather than
> embedded this into the kernel.

I will take a look, though it seems too feature complete :)

> No, netlink is much nicer for these types of things. User space
> can register for events and stay in sync and also all the other
> relevant events are using netlink UP/DOWN/DORMANT events for
> example.

I had a hunch that it might be the case. Thanks!

> > 
> > Thanks in advance if you're taking the time to review it or test it!
> > 
> 
> I'll scan the patches shortly.

Thanks for that (if you'd still like to) and all your comments, much 
appreciated!

Eldad

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: Eldad Zack @ 2012-06-25 20:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20120625115416.53655264@nehalam.linuxnetplumber.net>

On Mon, 25 Jun 2012, Stephen Hemminger wrote:

> Since LLDP is a control protocol what is wrong with the existing
> implementation in userspace?
>   https://github.com/vincentbernat/lldpd/wiki

Well, I don't have a business case here, since I start it on my free 
time out of a mix of interest and curiousity.

In any event, thanks for the feedback!

Eldad

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: John Fastabend @ 2012-06-25 19:00 UTC (permalink / raw)
  To: Eldad Zack; +Cc: netdev
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

On 6/25/2012 11:28 AM, Eldad Zack wrote:
> Hi all,
>
> This series of patches provides a partial LLDP (IEEE Std 802.1ab)
> implementation.
>
> I'd really appreciate a review of it.
>
> LLDP is a simple discovery protocol which advertises the identification and
> other info (such as MTU or capabilities) of device on the link. It can also
> help debug misconfigurations on the link layer (wrong MTU, wrong VLAN).
>

Why do you want to implement this in the kernel? We've been doing this
in user space for sometime without any issues and has some noted
advantages. Reduces kernel bloat, easier to add extend, etc.

Take a look at

http://www.open-lldp.org/open-lldp

What are we missing? Can we address any deficiencies here rather than
embedded this into the kernel.

> Some notes/questions:
>
> * Applies against net-next and mainline.
>
> * Included in this series is only LLDP output code.
>    This is not an issue since input and output are decoupled in LLDP anyway.
>    I'm working on the input code as well and will post it at some point in the
>    future.
>
> * Sysctl is used to do (some) configuration. This is done globally right now.
>    Before I add per-device sysctls: is it at all appropriate to use sysctl
>    here?

No, netlink is much nicer for these types of things. User space
can register for events and stay in sync and also all the other
relevant events are using netlink UP/DOWN/DORMANT events for
example.

>
> * By default, transmission is suppressed. To arm it, set
> 	/proc/sys/net/lldp/lldp_op_mode
>    to 1.
>
> * I've tested it on x86_64 and (qemu'd) x86.
>
> * I've tested it on my machine and it works with Ethernet and WLAN.
>
> * The last patch ("8021q/vlan: process NETDEV_GOING_DOWN") is needed
>    to be able to send shutdown PDU on VLAN interfaces, but has otherwise
>    no effect.
>

Do this in user space and you can handle all these events without
changing existing infrastructure.

> * Is there a better way to deal with 16-bit endianness other than masking and
>    shifting, when one field is more than a byte long (in this case 7/9)?
>
> * I usually only work on it on weekends, so I might be slow in responding back.

Please consider user space implementations (you don't have to use
mine) or let us know why we _need_ this in the kernel.

>
> Thanks in advance if you're taking the time to review it or test it!
>

I'll scan the patches shortly.

Thanks,
John

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: Stephen Hemminger @ 2012-06-25 18:54 UTC (permalink / raw)
  To: Eldad Zack; +Cc: netdev
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Since LLDP is a control protocol what is wrong with the existing
implementation in userspace?
  https://github.com/vincentbernat/lldpd/wiki

^ permalink raw reply

* Re: [PATCH 1/8] if_ether.h: Add LLDP ethertype
From: Eldad Zack @ 2012-06-25 18:48 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1340648900-6547-2-git-send-email-eldad@fogrefinery.com>

On Mon, 25 Jun 2012, Eldad Zack wrote:

> The LLDP ethertype is defined in IEEE Std 802.1ab (2005), clause 8.3.
> 
> Signed-off-by: Eldad Zack <eldad@fogrefinery.com>

I'll reiterate here, since the previous mail won't show up on patchwork:

I just noticed that I didn't change the subject on the following
patches to PATCH RFC -- Sorry for the inconvenience!

Eldad

^ permalink raw reply

* Re: [RFC net-next (v2) 12/14] ixgbe: set maximal number of default RSS queues
From: Ben Hutchings @ 2012-06-25 18:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: eilong, Yuval Mintz, davem, netdev, Jeff Kirsher, John Fastabend
In-Reply-To: <4FE8B019.4030807@intel.com>

On Mon, 2012-06-25 at 11:38 -0700, Alexander Duyck wrote:
> On 06/25/2012 10:53 AM, Eilon Greenstein wrote:
> > On Mon, 2012-06-25 at 08:44 -0700, Alexander Duyck wrote:
> >> This doesn't limit queues, only interrupt vectors.  As I told you before
> >> you should look at the ixgbe_set_rss_queues function if you actually
> >> want to limit the number of RSS queues.
> > How about this:
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> > index af1a531..23a8609 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> > @@ -277,6 +277,8 @@ static inline bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
> >  	bool ret = false;
> >  	struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_RSS];
> >  
> > +	f->indices = min_t(int, netif_get_num_default_rss_queues(), f->indices);
> > +
> >  	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
> >  		f->mask = 0xF;
> >  		adapter->num_rx_queues = f->indices;
> > @@ -302,7 +304,9 @@ static inline bool ixgbe_set_fdir_queues(struct ixgbe_adapter *adapter)
> >  	bool ret = false;
> >  	struct ixgbe_ring_feature *f_fdir = &adapter->ring_feature[RING_F_FDIR];
> >  
> > -	f_fdir->indices = min_t(int, num_online_cpus(), f_fdir->indices);
> > +	f_fdir->indices = min_t(int, netif_get_num_default_rss_queues(),
> > +				f_fdir->indices);
> > +
> >  	f_fdir->mask = 0;
> >  
> >  	/*
> > @@ -339,8 +343,7 @@ static inline bool ixgbe_set_fcoe_queues(struct ixgbe_adapter *adapter)
> >  	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
> >  		return false;
> >  
> > -	f->indices = min_t(int, num_online_cpus(), f->indices);
> > -
> > +	f->indices = min_t(int, f->indices, netif_get_num_default_rss_queues());
> >  	adapter->num_rx_queues = 1;
> >  	adapter->num_tx_queues = 1;
> >
> This makes much more sense, but still needs a few minor changes.  The
> first change I would recommend is to not alter ixgbe_set_fdir_queues
> since that controls flow director queues, not RSS queues.  The second
> would be to update ixgbe_set_dcb_queues since that does RSS per DCB
> traffic class and will also need to be updated.

There is a difference between the stated aim (reduce memory allocated
for RX buffers) and this implementation (limit number of RSS queues
only).  If we agree on that aim then we should be limiting the total
number of RX queues.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [RFC net-next (v2) 11/14] igb: set maximal number of default RSS queues
From: Ben Hutchings @ 2012-06-25 18:38 UTC (permalink / raw)
  To: Vick, Matthew
  Cc: Yuval Mintz, davem@davemloft.net, netdev@vger.kernel.org,
	eilong@broadcom.com, Kirsher, Jeffrey T
In-Reply-To: <06DFBC1E25D8024DB214DC7F41A3CD3448870CE4@ORSMSX101.amr.corp.intel.com>

On Mon, 2012-06-25 at 17:31 +0000, Vick, Matthew wrote:
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Yuval Mintz
> > Sent: Monday, June 25, 2012 4:46 AM
> > To: davem@davemloft.net; netdev@vger.kernel.org
> > Cc: eilong@broadcom.com; Yuval Mintz; Kirsher, Jeffrey T
> > Subject: [RFC net-next (v2) 11/14] igb: set maximal number of default
> > RSS queues
> > 
> > Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> > 
> > Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  drivers/net/ethernet/intel/igb/igb_main.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 01ced68..b11ee60 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -2465,7 +2465,8 @@ static int __devinit igb_sw_init(struct
> > igb_adapter *adapter)
> >  		break;
> >  	}
> > 
> > -	adapter->rss_queues = min_t(u32, max_rss_queues,
> > num_online_cpus());
> > +	adapter->rss_queues = min_t(u32, max_rss_queues,
> > +				    netif_get_num_default_rss_queues());
> > 
> >  	/* Determine if we need to pair queues. */
> >  	switch (hw->mac.type) {
> > --
> > 1.7.9.rc2
> 
> Since igb supports a maximum of 8 RSS queues anyway, it's generally
> unaffected by this change.

I'm interested in providing a global configuration mechanism for this,
which is why I asked Yuval to introduce this function.

> That being said, I'm confused about what you're trying to
> accomplish--is it to standardize the number of RSS queues or limit the
> number of interrupts by default? Or is it the former with a hope that
> it will trickle down to the latter?
>
> For example, if you take an igb device that supports 8 RSS queues
> (say, I350) and you change the default to 4 RSS queues, you will end
> up with the same number of interrupt vectors being requested as we
> will disable queue pairing and request separate vectors for Rx/Tx
> queues. I haven't looked at the other drivers affected by this RFC,
> but it's possible other drivers behave the same way.

sfc also supports those two modes, but under control of a module
parameter.  If it's a common enough feature then perhaps it should also
be subject to global configuration (though that might be confusing if
most drivers continue to support only one mode).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [RFC net-next (v2) 12/14] ixgbe: set maximal number of default RSS queues
From: Alexander Duyck @ 2012-06-25 18:38 UTC (permalink / raw)
  To: eilong; +Cc: Yuval Mintz, davem, netdev, Jeff Kirsher, John Fastabend
In-Reply-To: <1340646818.2486.27.camel@lb-tlvb-eilong.il.broadcom.com>

On 06/25/2012 10:53 AM, Eilon Greenstein wrote:
> On Mon, 2012-06-25 at 08:44 -0700, Alexander Duyck wrote:
>> This doesn't limit queues, only interrupt vectors.  As I told you before
>> you should look at the ixgbe_set_rss_queues function if you actually
>> want to limit the number of RSS queues.
> How about this:
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> index af1a531..23a8609 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -277,6 +277,8 @@ static inline bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
>  	bool ret = false;
>  	struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_RSS];
>  
> +	f->indices = min_t(int, netif_get_num_default_rss_queues(), f->indices);
> +
>  	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
>  		f->mask = 0xF;
>  		adapter->num_rx_queues = f->indices;
> @@ -302,7 +304,9 @@ static inline bool ixgbe_set_fdir_queues(struct ixgbe_adapter *adapter)
>  	bool ret = false;
>  	struct ixgbe_ring_feature *f_fdir = &adapter->ring_feature[RING_F_FDIR];
>  
> -	f_fdir->indices = min_t(int, num_online_cpus(), f_fdir->indices);
> +	f_fdir->indices = min_t(int, netif_get_num_default_rss_queues(),
> +				f_fdir->indices);
> +
>  	f_fdir->mask = 0;
>  
>  	/*
> @@ -339,8 +343,7 @@ static inline bool ixgbe_set_fcoe_queues(struct ixgbe_adapter *adapter)
>  	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
>  		return false;
>  
> -	f->indices = min_t(int, num_online_cpus(), f->indices);
> -
> +	f->indices = min_t(int, f->indices, netif_get_num_default_rss_queues());
>  	adapter->num_rx_queues = 1;
>  	adapter->num_tx_queues = 1;
>
This makes much more sense, but still needs a few minor changes.  The
first change I would recommend is to not alter ixgbe_set_fdir_queues
since that controls flow director queues, not RSS queues.  The second
would be to update ixgbe_set_dcb_queues since that does RSS per DCB
traffic class and will also need to be updated.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: Eldad Zack @ 2012-06-25 18:33 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

On Mon, 25 Jun 2012, Eldad Zack wrote:

> Hi all,
> 
> This series of patches provides a partial LLDP (IEEE Std 802.1ab)
> implementation.
> 

I just noticed that I didn't change the subject on the following
patches to PATCH RFC -- Sorry for the inconvenience!

Eldad

^ permalink raw reply

* [PATCH 8/8] 8021q/vlan: process NETDEV_GOING_DOWN
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

In the current flow, when you take down a physical device that has
VLANs configured on it, the NETDEV_GOING_DOWN notification will be
sent too late, i.e., no data can be sent to the wire anymore.

static int __dev_close_many(struct list_head *head)
{
...
	list_for_each_entry(dev, head, unreg_list) {
		call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
...
	}
...
	list_for_each_entry(dev, head, unreg_list) {
		if (ops->ndo_stop)
			ops->ndo_stop(dev);
	}
...
}

static int dev_close_many(struct list_head *head)
{
	...
	__dev_close_many(head);

	list_for_each_entry(dev, head, unreg_list) {
		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
		call_netdevice_notifiers(NETDEV_DOWN, dev);
	}
}

In a setup like this: eth0 with VLANs 2, 3 the flow would be:

eth0 - NETDEV_GOING_DOWN
ndo_stop is called on the device
eth0.2 - NETDEV_GOING_DOWN
eth0.2 - NETDEV_DOWN
eth0.3 - NETDEV_GOING_DOWN
eth0.3 - NETDEV_DOWN
eth0 - NETDEV_DOWN

If instead NETDEV_GOING_DOWN is processed, the flow would be:
eth0.2 - NETDEV_GOING_DOWN
eth0.2 - NETDEV_DOWN
eth0.3 - NETDEV_GOING_DOWN
eth0.3 - NETDEV_DOWN
eth0 - NETDEV_GOING_DOWN
eth0 - NETDEV_DOWN
ndo_stop is called on the device

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/8021q/vlan.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 6089f0c..fd87ecc 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -402,6 +402,12 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 
 		break;
 
+	case NETDEV_GOING_DOWN: /* NETDEV_DOWN */
+	/* If the parent device is going down it will call ndo_stop after
+	 * it sends out NETDEV_GOING_DOWN but before sending out NETDEV_DOWN,
+	 * The effect of which is, that no data can be sent anymore
+	 * by the time the VLAN device sends out its NETDEV_GOING_DOWN.
+	 */
 	case NETDEV_DOWN:
 		/* Put all VLANs for this dev in the down state too.  */
 		for (i = 0; i < VLAN_N_VID; i++) {
-- 
1.7.10

^ permalink raw reply related

* [PATCH 7/8] LLDP: Kconfig and Makefile
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Adds Kconfig and Makefile for LLDP, and sources these in
net/Makefile and net/Kconfig.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/Kconfig       |    1 +
 net/Makefile      |    1 +
 net/lldp/Kconfig  |   21 +++++++++++++++++++++
 net/lldp/Makefile |    7 +++++++
 4 files changed, 30 insertions(+)
 create mode 100644 net/lldp/Kconfig
 create mode 100644 net/lldp/Makefile

diff --git a/net/Kconfig b/net/Kconfig
index 245831b..cb724f2 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -216,6 +216,7 @@ source "net/dcb/Kconfig"
 source "net/dns_resolver/Kconfig"
 source "net/batman-adv/Kconfig"
 source "net/openvswitch/Kconfig"
+source "net/lldp/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 4f4ee08..39fbb8b 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -70,3 +70,4 @@ obj-$(CONFIG_CEPH_LIB)		+= ceph/
 obj-$(CONFIG_BATMAN_ADV)	+= batman-adv/
 obj-$(CONFIG_NFC)		+= nfc/
 obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
+obj-$(CONFIG_LLDP)		+= lldp/
diff --git a/net/lldp/Kconfig b/net/lldp/Kconfig
new file mode 100644
index 0000000..0c59fbf
--- /dev/null
+++ b/net/lldp/Kconfig
@@ -0,0 +1,21 @@
+#
+# Link Layer Discovery Protocol (LLDP)
+# IEEE Std 802.1ab
+#
+
+config LLDP
+	tristate "Link Layer Discovery Protocol (LLDP) Support"
+	depends on EXPERIMENTAL
+	default n
+	---help---
+	  Enables experimental support for the LLDP protocol described in
+	  IEEE Std 802.1ab.
+
+	  The LLDP protocol allows nodes to advertise their identification,
+	  configuration and capabilities to neighbors on the same link.
+	  Currently, only advertisement is implemented.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called lldp.
+
+	  If unsure say N.
diff --git a/net/lldp/Makefile b/net/lldp/Makefile
new file mode 100644
index 0000000..e3c609f
--- /dev/null
+++ b/net/lldp/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the Linux Link Layer Discovery Protocol (LLDP) implementation.
+#
+
+obj-$(CONFIG_LLDP) += lldp.o
+
+lldp-objs :=	lldp_core.o lldp_output.o lldpdu.o sysctl_net_lldp.o
-- 
1.7.10

^ permalink raw reply related

* [PATCH 6/8] LLDP: Core routines
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

* Core LLDP routines: handles modules registration,
  netdevice notifications and timers.

* Adds specific data pointer to netdevice.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 include/linux/netdevice.h |    4 +
 net/lldp/lldp_core.c      |  214 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 218 insertions(+)
 create mode 100644 net/lldp/lldp_core.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d94cb14..813475a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -61,6 +61,7 @@ struct device;
 struct phy_device;
 /* 802.11 specific */
 struct wireless_dev;
+struct lldp_ptr;
 					/* source back-compat hooks */
 #define SET_ETHTOOL_OPS(netdev,ops) \
 	( (netdev)->ethtool_ops = (ops) )
@@ -1158,6 +1159,9 @@ struct net_device {
 	void			*ax25_ptr;	/* AX.25 specific data */
 	struct wireless_dev	*ieee80211_ptr;	/* IEEE 802.11 specific data,
 						   assign before registering */
+#if IS_ENABLED(CONFIG_LLDP)
+	struct lldp_dev __rcu	*lldp_ptr;	/* LLDP specific data */
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/lldp/lldp_core.c b/net/lldp/lldp_core.c
new file mode 100644
index 0000000..3d4a1f1
--- /dev/null
+++ b/net/lldp/lldp_core.c
@@ -0,0 +1,214 @@
+/* LLDP		Link Layer Discovery Protocol impementation for Linux
+ *		IEEE Std 802.1ab
+ *
+ * Author:	Eldad Zack <eldad@fogrefinery.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "LLDP " fmt
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/timer.h>
+#include <linux/if_ether.h>
+#include <linux/etherdevice.h>
+#include <linux/if_arp.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include "lldp.h"
+
+int sysctl_lldp_operational_mode = LLDP_SYSCTL_OP_SUPPRESS;
+int sysctl_lldp_transmit_interval = LLDP_DEFAULT_MSG_TX_INTERVAL;
+int sysctl_lldp_hold_multiplier = LLDP_DEFAULT_MSG_TX_HOLD_MULT;
+
+bool is_valid_lldp_dev(struct net_device *dev)
+{
+	bool ret;
+
+	if (!is_valid_ether_addr(dev->dev_addr))
+		return false;
+
+	ret = (!(dev->flags & IFF_LOOPBACK) &&
+		(dev->type == ARPHRD_ETHER) &&
+		(dev->addr_len == ETH_ALEN));
+
+	return ret;
+}
+
+int init_lldp_dev_info(struct net_device *dev)
+{
+	struct lldp_dev *ldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	if (dev->lldp_ptr != NULL)
+		return 0;
+
+	ldev = kzalloc(sizeof(struct lldp_dev), GFP_ATOMIC);
+	if (ldev == NULL)
+		return -ENOMEM;
+
+	rcu_read_lock();
+	dev->lldp_ptr = ldev;
+	rcu_read_unlock();
+
+	return 0;
+}
+
+void lldp_timer_handler(unsigned long data)
+{
+	struct net_device *dev = (struct net_device *) data;
+	struct lldp_dev *ldev;
+	struct timer_list *tx_timer;
+
+	BUG_ON(dev == NULL);
+
+	rcu_read_lock();
+	if (dev->lldp_ptr == NULL) { /* device went down */
+		rcu_read_unlock();
+		return;
+	}
+
+	ldev = dev->lldp_ptr;
+	tx_timer = ldev->tx_timer;
+
+	if (sysctl_lldp_operational_mode & LLDP_SYSCTL_OP_TX)
+		lldp_send(dev);
+
+	mod_timer(tx_timer, jiffies +
+		msecs_to_jiffies(1000 * sysctl_lldp_transmit_interval));
+	rcu_read_unlock();
+}
+
+void lldp_add_dev(struct net_device *dev)
+{
+	int err;
+	struct lldp_dev *ldev;
+	struct timer_list *tx_timer;
+
+	err = init_lldp_dev_info(dev);
+	if (err < 0) {
+		pr_err("could not start on device %s (%d)\n", dev->name, err);
+		return;
+	}
+
+	rcu_read_lock();
+	ldev = dev->lldp_ptr;
+	rcu_read_unlock();
+
+	if (ldev->tx_timer != NULL)
+		return;
+
+	tx_timer = kzalloc(sizeof(struct timer_list), GFP_ATOMIC);
+	if (tx_timer == NULL) {
+		pr_err("no memory available (device %s)\n", dev->name);
+		return;
+	}
+
+	setup_timer(tx_timer, lldp_timer_handler, (unsigned long) dev);
+
+	rcu_read_lock();
+	ldev->tx_timer = tx_timer;
+	rcu_read_unlock();
+
+	/* Send the first frame without waiting. The timer handler
+	 * will schedule it.
+	 */
+	lldp_timer_handler((unsigned long) dev);
+
+	pr_info("started on device %s\n", dev->name);
+}
+
+static void lldp_del_dev(struct net_device *dev)
+{
+	struct lldp_dev *ldev;
+	struct timer_list *tx_timer;
+
+	if (dev == NULL)
+		return;
+
+	rcu_read_lock();
+	ldev = (struct lldp_dev *)dev->lldp_ptr;
+	if (ldev == NULL) {
+		rcu_read_unlock();
+		return;
+	}
+
+	tx_timer = ldev->tx_timer;
+	if (tx_timer != NULL) {
+		ldev->tx_timer = NULL;
+		del_timer(tx_timer);
+		kfree(tx_timer);
+	}
+
+	dev->lldp_ptr = NULL;
+	kfree(ldev);
+	rcu_read_unlock();
+
+	pr_info("stopped on device %s\n", dev->name);
+}
+
+static int lldp_notify(struct notifier_block *nb, unsigned long event,
+		void *data)
+{
+	struct net_device *dev = (struct net_device *) data;
+
+	if (dev == NULL)
+		return NOTIFY_DONE;
+
+	if (!is_valid_lldp_dev(dev))
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_UP: /* NETDEV_CHANGE */
+	case NETDEV_CHANGE:
+		lldp_add_dev(dev);
+		break;
+	case NETDEV_GOING_DOWN:
+		/* Shutdown PDU, Clause 10.2.1.2 */
+		lldp_send_shutdown(dev);
+		break;
+	case NETDEV_UNREGISTER: /* NETDEV_DOWN */
+	case NETDEV_DOWN:
+		lldp_del_dev(dev);
+		break;
+	}
+
+	return NOTIFY_OK;
+};
+
+static struct notifier_block lldp_dev_notify = {
+	.notifier_call = lldp_notify,
+};
+
+static int __init lldp_init(void)
+{
+	pr_info("module initializing\n");
+
+	register_netdevice_notifier(&lldp_dev_notify);
+	lldp_register_sysctl();
+	return 0;
+}
+
+static void __exit lldp_fini(void)
+{
+	struct net_device *dev;
+
+	lldp_unregister_sysctl();
+	unregister_netdevice_notifier(&lldp_dev_notify);
+
+	for_each_netdev(&init_net, dev)
+		lldp_del_dev(dev);
+
+	pr_info("module stopped\n");
+}
+
+module_init(lldp_init);
+module_exit(lldp_fini);
+
+MODULE_LICENSE("GPL");
-- 
1.7.10

^ permalink raw reply related

* [PATCH 5/8] LLDP: Output routines
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Handles the transmission of LLDPDUs.

Gets the TLV list, allocates the sk_buff according to the
space requirements of the TLV list, writes it to the sk_buff and
if all goes well, sends it to the wire.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/lldp/lldp_output.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 net/lldp/lldp_output.c

diff --git a/net/lldp/lldp_output.c b/net/lldp/lldp_output.c
new file mode 100644
index 0000000..45e6293
--- /dev/null
+++ b/net/lldp/lldp_output.c
@@ -0,0 +1,67 @@
+/* LLDP		Link Layer Discovery Protocol impementation for Linux
+ *		IEEE Std 802.1ab
+ *
+ * Author:	Eldad Zack <eldad@fogrefinery.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/if_ether.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include "lldp.h"
+
+static u8 lldp_dst_address[] = LLDP_MULTICAST_ADDR;
+
+inline struct sk_buff *lldp_create(struct net_device *dev, bool is_shutdown)
+{
+	struct sk_buff *skb;
+	struct list_head head;
+	int head_len = LL_RESERVED_SPACE(dev);
+	int tail_len = dev->needed_tailroom;
+	int lldp_len;
+
+	INIT_LIST_HEAD(&head);
+
+	lldp_tlv_construct_list(&head, dev, is_shutdown);
+	lldp_len = lldp_tlv_list_len(&head);
+
+	skb = alloc_skb(head_len + lldp_len + tail_len, GFP_ATOMIC);
+	if (skb == NULL)
+		goto done;
+
+	skb_reserve(skb, head_len);
+	skb_reset_network_header(skb);
+
+	lldp_tlv_put_skb_list(skb, &head);
+
+	skb->dev = dev;
+	skb->protocol = htons(ETH_P_LLDP);
+
+	if (dev_hard_header(skb, dev, ETH_P_LLDP, lldp_dst_address,
+		dev->dev_addr, skb->len) < 0) {
+		kfree_skb(skb);
+		skb = NULL;
+	}
+
+done:
+	lldp_tlv_destruct_list(&head);
+	return skb;
+}
+
+void __lldp_send(struct net_device *dev, bool is_shutdown)
+{
+	struct sk_buff *skb;
+
+	skb = lldp_create(dev, is_shutdown);
+
+	if (skb == NULL)
+		return;
+
+	dev_queue_xmit(skb);
+}
-- 
1.7.10

^ permalink raw reply related

* [PATCH 4/8] LLDP: PDU-handling routines
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Routines for creation and parsing of LLPDUs.

Highlights:

* lldp_construct_pdu_list() constructs a list of TLVs
  according to device and configuration.

* lldp_tlv_list_len() calculates the space needed to
  send the above TLV list to the wire. Used when an sk_buff
  is allocated.

* lldp_put_tlv_skb_list() writes the TLV list to the sk_buff.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/lldp/lldpdu.c |  307 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 307 insertions(+)
 create mode 100644 net/lldp/lldpdu.c

diff --git a/net/lldp/lldpdu.c b/net/lldp/lldpdu.c
new file mode 100644
index 0000000..e71ddd7
--- /dev/null
+++ b/net/lldp/lldpdu.c
@@ -0,0 +1,307 @@
+/* LLDP		Link Layer Discovery Protocol impementation for Linux
+ *		IEEE Std 802.1ab
+ *
+ * Author:	Eldad Zack <eldad@fogrefinery.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/utsname.h>
+#include "lldp.h"
+
+const unsigned char oui_802_1[LLDP_OUI_LEN] = LLDP_OUI_802_1;
+const unsigned char oui_802_3[LLDP_OUI_LEN] = LLDP_OUI_802_3;
+
+int lldp_tlv_list_len(struct list_head *head)
+{
+	struct lldp_tlv *tlv;
+	int len = 0;
+
+	list_for_each_entry(tlv, head, lh)
+		len += tlv->entry_len;
+
+	return len;
+}
+
+void lldp_tlv_calc_entry_len(struct lldp_tlv *tlv)
+{
+	uint16_t elen;
+
+	BUG_ON(tlv == NULL);
+	BUG_ON(tlv->len > LLDP_LEN_MAX);
+
+	elen = 2;		/* TLV Header */
+
+	if (tlv->subtype > 0)
+		elen += 1;
+
+	BUG_ON((tlv->oui != NULL) &&
+		(tlv->type != LLDP_TLV_ORGANIZATIONAL));
+	if (tlv->oui != NULL)
+		elen += LLDP_OUI_LEN;
+
+	elen += tlv->len;	/* Actual data length */
+
+	tlv->entry_len = elen;
+}
+
+void __lldp_add_tlv_list(struct list_head *head, unsigned short type,
+			unsigned short len, unsigned char *data,
+			unsigned char subtype, const unsigned char *oui)
+{
+	struct lldp_tlv *tlv;
+
+	BUG_ON(len > LLDP_LEN_MAX);
+	BUG_ON(head == NULL);
+
+	tlv = kzalloc(sizeof(struct lldp_tlv), GFP_ATOMIC);
+
+	tlv->type = type;
+	tlv->len = len;
+
+	tlv->val = kmalloc(len, GFP_ATOMIC);
+	memcpy(tlv->val, data, len);
+
+	tlv->subtype = subtype;
+
+	if (type == LLDP_TLV_ORGANIZATIONAL) {
+		BUG_ON(oui == NULL);
+
+		tlv->oui = kmalloc(LLDP_OUI_LEN, GFP_ATOMIC);
+		memcpy(tlv->oui, oui, LLDP_OUI_LEN);
+	}
+
+	lldp_tlv_calc_entry_len(tlv);
+
+	list_add_tail(&(tlv->lh), head);
+}
+
+inline void lldp_add_tlv_list(struct list_head *head, unsigned short type,
+			unsigned short len, unsigned char *data)
+{
+	__lldp_add_tlv_list(head, type, len, data, 0, NULL);
+}
+
+inline void lldp_add_tlv_subtype_list(struct list_head *head,
+				unsigned short type,
+				unsigned short len, unsigned char *data,
+				unsigned char subtype)
+{
+	__lldp_add_tlv_list(head, type, len, data, subtype, NULL);
+}
+
+inline void lldp_add_oui_tlv_list(struct list_head *head,
+				const unsigned char *oui,
+				unsigned short len, unsigned char *data,
+				unsigned char subtype)
+{
+	__lldp_add_tlv_list(head, LLDP_TLV_ORGANIZATIONAL, len, data,
+				subtype, oui);
+}
+
+inline void lldp_add_tlv_chassis_id(struct list_head *head,
+				struct net_device *dev)
+{
+	lldp_add_tlv_subtype_list(head, LLDP_TLV_CHASSIS_ID, ETH_ALEN,
+		dev->dev_addr, LLDP_ST_CHID_MAC_ADDR);
+}
+
+inline void lldp_add_tlv_port_id(struct list_head *head, struct net_device *dev)
+{
+	lldp_add_tlv_subtype_list(head, LLDP_TLV_PORT_ID, strlen(dev->name),
+		dev->name, LLDP_ST_PORTID_IFNAME);
+}
+
+inline void lldp_add_tlv_ttl(struct list_head *head, struct net_device *dev,
+				bool is_shutdown)
+{
+	uint16_t ttl;
+
+	if (is_shutdown) {
+		ttl = 0;
+	} else {
+		ttl = htons(sysctl_lldp_transmit_interval *
+				sysctl_lldp_hold_multiplier);
+	}
+
+	lldp_add_tlv_list(head, LLDP_TLV_TIME_TO_LIVE, sizeof(ttl),
+		(unsigned char *) &ttl);
+}
+
+inline void lldp_add_tlv_vlan(struct list_head *head, struct net_device *dev)
+{
+#if IS_ENABLED(CONFIG_VLAN_8021Q)
+	if (dev->flags & IFF_802_1Q_VLAN) {
+		uint16_t vlan = htons(vlan_dev_vlan_id(dev));
+		/* Clause F.2.1: Value of 0 signifies that the system
+		 * does not know the VLAN ID or doesn"t support it.
+		 */
+		if (vlan != 0) {
+			lldp_add_oui_tlv_list(head, oui_802_1, sizeof(vlan),
+				(unsigned char *) &vlan,
+				LLDP_802_1_PORT_VLANID);
+		}
+	}
+#endif
+}
+
+inline void lldp_add_tlv_mtu(struct list_head *head, struct net_device *dev)
+{
+	uint16_t mtu;
+
+	mtu = htons(dev->mtu);
+	lldp_add_oui_tlv_list(head, oui_802_3, sizeof(mtu),
+		(unsigned char *) &mtu, LLDP_802_3_MTU);
+}
+
+inline void lldp_add_tlv_port_description(struct list_head *head,
+					struct net_device *dev)
+{
+	char *desc = dev->ifalias;
+	if (desc == NULL)
+		return;
+
+	lldp_add_tlv_list(head, LLDP_TLV_PORT_DESCRIPTION, strlen(desc), desc);
+}
+
+inline void lldp_add_tlv_system_name(struct list_head *head,
+					struct net_device *dev)
+{
+	char *name = utsname()->nodename;
+	if (name == NULL)
+		return;
+
+	lldp_add_tlv_list(head, LLDP_TLV_SYSTEM_NAME, strlen(name), name);
+}
+
+inline void lldp_add_tlv_system_description(struct list_head *head,
+					struct net_device *dev)
+{
+	char *desc = utsname()->sysname;
+	if (desc == NULL)
+		return;
+
+	lldp_add_tlv_list(head, LLDP_TLV_SYSTEM_DESCRIPTION,
+		strlen(desc), desc);
+}
+
+inline void lldp_add_tlv_system_capabilities(struct list_head *head,
+					struct net_device *dev)
+{
+	struct lldp_caps caps;
+
+	caps.sys = htons(LLDP_CAP_BRIDGE | LLDP_CAP_ROUTER);
+	caps.enabled = htons(LLDP_CAP_ROUTER);
+
+	lldp_add_tlv_list(head, LLDP_TLV_SYSTEM_CAPABILITIES,
+		sizeof(struct lldp_caps), (unsigned char *) &caps);
+}
+
+void lldp_tlv_construct_list(struct list_head *head, struct net_device *dev,
+				bool is_shutdown)
+{
+	BUG_ON(head == NULL);
+
+	/* Mandatory TLVs with strict order */
+	lldp_add_tlv_chassis_id(head, dev);
+	lldp_add_tlv_port_id(head, dev);
+	lldp_add_tlv_ttl(head, dev, is_shutdown);
+
+	/* Shutdown PDU is limited to mandatory TLVs only */
+	if (is_shutdown)
+		goto end;
+
+	/* Optional TLVs */
+	lldp_add_tlv_port_description(head, dev);
+	lldp_add_tlv_system_name(head, dev);
+	lldp_add_tlv_system_description(head, dev);
+	lldp_add_tlv_system_capabilities(head, dev);
+
+	/* Additional 802.1 and 802.3 private TLVs */
+	lldp_add_tlv_vlan(head, dev);	/* Annex F.1 */
+	lldp_add_tlv_mtu(head, dev);	/* Annex G.5 */
+
+end:
+	/* End TLV */
+	lldp_add_tlv_list(head, LLDP_TLV_END, 0, NULL);
+}
+
+void lldp_tlv_destruct(struct lldp_tlv *tlv)
+{
+	if (tlv->oui != NULL)
+		kfree(tlv->oui);
+
+	if (tlv->val != NULL)
+		kfree(tlv->val);
+}
+
+void lldp_tlv_destruct_list(struct list_head *head)
+{
+	struct lldp_tlv *tlv, *tmp;
+
+	if (head == NULL)
+		return;
+
+	if (list_empty(head))
+		return;
+
+	list_for_each_entry_safe(tlv, tmp, head, lh) {
+		list_del(&tlv->lh);
+		lldp_tlv_destruct(tlv);
+		kfree(tlv);
+	}
+}
+
+inline uint16_t tlv_hdr(struct lldp_tlv *tlv)
+{
+	int len = tlv->entry_len - LLDP_TLV_HDR_LEN;
+
+	BUG_ON(tlv == NULL);
+	BUG_ON(len < 0);
+
+	return htons(((tlv->type << LLDP_TYPE_SHIFT) & LLDP_TYPE_MASK) |
+			(len & LLDP_LEN_MASK));
+}
+
+void lldp_tlv_put_skb_list(struct sk_buff *skb, struct list_head *head)
+{
+	struct lldp_tlv *tlv;
+	struct list_head *p;
+	unsigned char *buf;
+	u16 hdr;
+
+	list_for_each(p, head) {
+		tlv = list_entry(p, struct lldp_tlv, lh);
+
+		hdr = tlv_hdr(tlv);
+		buf = skb_put(skb, sizeof(hdr));
+		memcpy(buf, &hdr, sizeof(hdr));
+
+		BUG_ON((tlv->oui != NULL) &&
+			(tlv->type != LLDP_TLV_ORGANIZATIONAL));
+		if (tlv->oui != NULL) {
+			buf = skb_put(skb, LLDP_OUI_LEN);
+			memcpy(buf, tlv->oui, LLDP_OUI_LEN);
+		}
+
+		if (tlv->subtype > 0) {
+			buf = skb_put(skb, sizeof(unsigned char));
+			*buf = tlv->subtype;
+		}
+
+		if (tlv->len > 0) {
+			BUG_ON(tlv->val == NULL);
+			buf = skb_put(skb, tlv->len);
+			memcpy(buf, tlv->val, tlv->len);
+		}
+	}
+}
-- 
1.7.10

^ permalink raw reply related

* [PATCH 3/8] LLDP: Sysctl interface
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Add 3 config items to LLDP via sysctl:
* Operational mode
* Message transmission interval
* Hold multiplier

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/lldp/sysctl_net_lldp.c |   94 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)
 create mode 100644 net/lldp/sysctl_net_lldp.c

diff --git a/net/lldp/sysctl_net_lldp.c b/net/lldp/sysctl_net_lldp.c
new file mode 100644
index 0000000..dd5f5e3
--- /dev/null
+++ b/net/lldp/sysctl_net_lldp.c
@@ -0,0 +1,94 @@
+/* LLDP		Link Layer Discovery Protocol impementation for Linux
+ *		IEEE Std 802.1ab
+ *
+ * Author:	Eldad Zack <eldad@fogrefinery.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/sysctl.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/netdevice.h>
+#include <linux/init.h>
+#include "lldp.h"
+
+static int lldp_ttl_min = 1;
+static int lldp_ttl_max = 65535;
+
+/* Validate: (10.5.3, 10.5.4.1)
+ * 0 < transmit_interval * hold_multiplier <= 65535
+ */
+static int proc_dointvec_lldp_validate_ttl(ctl_table *table, int write,
+					void __user *buffer,
+					size_t *lenp, loff_t *ppos)
+{
+	int val, val2, ttl, ret;
+
+	ctl_table tmp = {
+		.data =		&val,
+		.maxlen =	sizeof(int),
+		.mode =		table->mode,
+		.extra1 =	&lldp_ttl_min,
+		.extra2 =	&lldp_ttl_max,
+	};
+
+	if (!write)
+		return proc_dointvec(table, write, buffer, lenp, ppos);
+
+	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+	if (ret == 0) {
+		val2 = *((int *)table->extra1);
+		ttl = val * val2;
+
+		if ((ttl >= lldp_ttl_min) && (ttl <= lldp_ttl_max))
+			*((int *)table->data) = val;
+		else
+			return -EINVAL;
+	}
+
+	return ret;
+}
+
+static struct ctl_table_header *lldp_table_header;
+
+static struct ctl_table lldp_table[] = {
+	{
+		.procname =	"lldp_op_mode",
+		.data =		&sysctl_lldp_operational_mode,
+		.maxlen =	sizeof(int),
+		.mode =		0644,
+		.proc_handler =	proc_dointvec,
+	},
+	{
+		.procname =	"transmit_interval",
+		.data =		&sysctl_lldp_transmit_interval,
+		.maxlen =	sizeof(int),
+		.mode =		0644,
+		.proc_handler =	proc_dointvec_lldp_validate_ttl,
+		.extra1 =	&sysctl_lldp_hold_multiplier,
+	},
+	{
+		.procname =	"hold_multiplier",
+		.data =		&sysctl_lldp_hold_multiplier,
+		.maxlen =	sizeof(int),
+		.mode =		0644,
+		.proc_handler =	proc_dointvec_lldp_validate_ttl,
+		.extra1 =	&sysctl_lldp_transmit_interval,
+	},
+	{ 0, },
+};
+
+void __init lldp_register_sysctl(void)
+{
+	lldp_table_header = register_net_sysctl(&init_net, "net/lldp",
+		lldp_table);
+}
+
+void __exit lldp_unregister_sysctl(void)
+{
+	unregister_net_sysctl_table(lldp_table_header);
+}
-- 
1.7.10

^ permalink raw reply related

* [PATCH 2/8] LLDP: Header
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Constants and declarations for the Linux LLDP implementation.
The struct to be added netdevice is defined here.

Highlights:
* TLV struct
* TLV types
* TLV subtypes for Chassis ID and Port ID TLVs
* Capabilities bits and struct
* IEEE private OUIs and their subtypes (802.1, 802.3)
* Multicast address for LLDP

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/lldp/lldp.h |  155 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)
 create mode 100644 net/lldp/lldp.h

diff --git a/net/lldp/lldp.h b/net/lldp/lldp.h
new file mode 100644
index 0000000..d88feea
--- /dev/null
+++ b/net/lldp/lldp.h
@@ -0,0 +1,155 @@
+/* LLDP		Link Layer Discovery Protocol impementation for Linux
+ *		IEEE Std 802.1ab
+ *
+ * Author:	Eldad Zack <eldad@fogrefinery.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _LINUX_LLDP_H
+#define _LINUX_LLDP_H
+#include <linux/types.h>
+
+/* LLDP Multicast Address (Clause 8.1) */
+#define LLDP_MULTICAST_ADDR	{ 0x01, 0x80, 0xc2, 0x00, 0x00, 0x0e }
+
+/* Basic TLV format (Clause 9.4)
+ * TLV header:			16 bits
+ *
+ * LSB
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * :      Length       :   Type    :
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Information string		0-511 bytes
+ */
+#define LLDP_TLV_HDR_LEN		2
+#define LLDP_TYPE_MASK			0xfe00
+#define LLDP_TYPE_SHIFT			9
+#define LLDP_LEN_MASK			0x01ff
+#define LLDP_LEN_MAX			511
+
+/* LLDP TLV types (Clause 9.4.1)
+ * TLVs number 0 to 3 are mandatory, the rest are optional.
+ */
+#define LLDP_TLV_END			0
+#define LLDP_TLV_CHASSIS_ID		1
+#define LLDP_TLV_PORT_ID		2
+#define LLDP_TLV_TIME_TO_LIVE		3
+#define LLDP_TLV_PORT_DESCRIPTION	4
+#define LLDP_TLV_SYSTEM_NAME		5
+#define LLDP_TLV_SYSTEM_DESCRIPTION	6
+#define LLDP_TLV_SYSTEM_CAPABILITIES	7
+#define LLDP_TLV_MANAGEMENT_ADDRESS	8
+/* Reserved: 9-126 */
+#define LLDP_TLV_ORGANIZATIONAL		127	/* Private TLVs */
+
+/* Chassis ID Subtypes, Reserved: 0,8-255. (Clause 9.5.2) */
+#define LLDP_ST_CHID_COMPONENT		1	/* Chassis component */
+#define LLDP_ST_CHID_IFALIAS		2	/* ifAlias (RFC 2863) */
+#define LLDP_ST_CHID_PORT_COMPONENT	3
+#define LLDP_ST_CHID_MAC_ADDR		4
+#define LLDP_ST_CHID_NET_ADDR		5
+#define LLDP_ST_CHID_IFNAME		6
+#define LLDP_ST_CHID_LOCAL		7
+
+/* Port ID Subtypes, Reserved: 0,8-255. (Clause 9.5.3) */
+#define LLDP_ST_PORTID_IFALIAS		1	/* ifAlias (RFC 2863) */
+#define LLDP_ST_PORTID_PORT_COMPONENT	2
+#define LLDP_ST_PORTID_MAC_ADDR		3
+#define LLDP_ST_PORTID_NET_ADDR		4
+#define LLDP_ST_PORTID_IFNAME		5	/* ifName (RFC 2863) */
+#define LLDP_ST_PORTID_AGENT_CIR_ID	6	/* Agent Circuit ID */
+#define LLDP_ST_PORTID_LOCAL		7
+
+/* System Capabilities (Clause 9.5.8.1)
+ * "Station Only" (bit 7) is mutually exclusive with
+ * all the other options.
+ */
+#define LLDP_CAP_OTHER			0x0001
+#define LLDP_CAP_REPEATER		0x0002
+#define LLDP_CAP_BRIDGE			0x0004
+#define LLDP_CAP_WLAN_AP		0x0008
+#define LLDP_CAP_ROUTER			0x0010
+#define LLDP_CAP_TELEPHONE		0x0020
+#define LLDP_CAP_DOCSIS			0x0040
+#define LLDP_CAP_STATION_ONLY		0x0080
+
+/* Organizationally Specific TLVs (Clause 9.6)
+ */
+#define LLDP_OUI_LEN		3
+#define LLDP_OUI_802_1		{ 0x00, 0x80, 0xc2 }
+#define LLDP_OUI_802_3		{ 0x00, 0x12, 0x0f }
+
+/* Annex F: 802.1 OUI Subtypes */
+#define LLDP_802_1_PORT_VLANID		1
+#define LLDP_802_1_PORT_PROT_VLAID	2
+#define LLDP_802_1_VLAN_NAME		3
+#define LLDP_802_1_PROTOCOL_ID		4
+
+/* Annex G: 802.3 OUI Subtypes */
+#define LLDP_802_3_MAC_PHY		1
+#define LLDP_802_3_PMD			2
+#define LLDP_802_3_LAG			3
+#define LLDP_802_3_MTU			4
+
+/* Default transmission parameters (Clause 10.5.3.3) */
+#define LLDP_DEFAULT_MSG_TX_INTERVAL	30
+#define	LLDP_DEFAULT_MSG_TX_HOLD_MULT	4
+
+struct lldp_tlv {
+	struct list_head lh;
+	uint16_t type;
+	uint16_t len;
+	unsigned char *val;
+	unsigned char subtype;
+	unsigned char *oui;
+	uint16_t entry_len;
+};
+
+struct lldp_caps {
+	u16 sys;
+	u16 enabled;
+} __packed;
+
+/* lldp_tx */
+void __lldp_send(struct net_device *dev, bool is_shutdown);
+
+static inline void lldp_send(struct net_device *dev)
+{
+	__lldp_send(dev, false);
+}
+
+static inline void lldp_send_shutdown(struct net_device *dev)
+{
+	__lldp_send(dev, true);
+}
+
+/* lldpdu */
+void lldp_tlv_construct_list(struct list_head *head, struct net_device *dev,
+				bool is_shutdown);
+void lldp_tlv_destruct_list(struct list_head *head);
+void lldp_tlv_put_skb_list(struct sk_buff *skb, struct list_head *head);
+int lldp_tlv_list_len(struct list_head *head);
+
+/* netdevice */
+struct __rcu lldp_dev {
+	struct timer_list *tx_timer;
+};
+
+/* sysctl */
+void __init lldp_register_sysctl(void);
+void __exit lldp_unregister_sysctl(void);
+
+#define LLDP_SYSCTL_OP_SUPPRESS		0x00
+#define LLDP_SYSCTL_OP_TX		0x01
+
+extern int sysctl_lldp_operational_mode;
+extern int sysctl_lldp_transmit_interval;
+extern int sysctl_lldp_hold_multiplier;
+
+#endif /* _LINUX_LLDP_H */
-- 
1.7.10

^ permalink raw reply related

* [PATCH 1/8] if_ether.h: Add LLDP ethertype
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

The LLDP ethertype is defined in IEEE Std 802.1ab (2005), clause 8.3.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 include/linux/if_ether.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index 56d907a..a5f0612 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -81,6 +81,7 @@
 #define ETH_P_8021AD	0x88A8          /* 802.1ad Service VLAN		*/
 #define ETH_P_802_EX1	0x88B5		/* 802.1 Local Experimental 1.  */
 #define ETH_P_TIPC	0x88CA		/* TIPC 			*/
+#define ETH_P_LLDP	0x88CC		/* 802.1ab Link Layer Discovery	*/
 #define ETH_P_8021AH	0x88E7          /* 802.1ah Backbone Service Tag */
 #define ETH_P_1588	0x88F7		/* IEEE 1588 Timesync */
 #define ETH_P_FCOE	0x8906		/* Fibre Channel over Ethernet  */
-- 
1.7.10

^ permalink raw reply related

* [PATCH RFC 0/8] LLDP implementation for Linux
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack

Hi all,

This series of patches provides a partial LLDP (IEEE Std 802.1ab)
implementation.

I'd really appreciate a review of it.

LLDP is a simple discovery protocol which advertises the identification and
other info (such as MTU or capabilities) of device on the link. It can also
help debug misconfigurations on the link layer (wrong MTU, wrong VLAN).

Some notes/questions:

* Applies against net-next and mainline.

* Included in this series is only LLDP output code.
  This is not an issue since input and output are decoupled in LLDP anyway.
  I'm working on the input code as well and will post it at some point in the
  future.

* Sysctl is used to do (some) configuration. This is done globally right now.
  Before I add per-device sysctls: is it at all appropriate to use sysctl
  here?

* By default, transmission is suppressed. To arm it, set
	/proc/sys/net/lldp/lldp_op_mode
  to 1.

* I've tested it on x86_64 and (qemu'd) x86.

* I've tested it on my machine and it works with Ethernet and WLAN.

* The last patch ("8021q/vlan: process NETDEV_GOING_DOWN") is needed
  to be able to send shutdown PDU on VLAN interfaces, but has otherwise
  no effect.

* Is there a better way to deal with 16-bit endianness other than masking and
  shifting, when one field is more than a byte long (in this case 7/9)?

* I usually only work on it on weekends, so I might be slow in responding back.

Thanks in advance if you're taking the time to review it or test it!

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>

^ permalink raw reply

* Re: [RFC net-next (v2) 14/14] ixgbevf: set maximal number of default RSS queues
From: Eilon Greenstein @ 2012-06-25 18:28 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Yuval Mintz, davem, netdev, Jeff Kirsher, Greg Rose
In-Reply-To: <4FE892CD.1020804@intel.com>

On Mon, 2012-06-25 at 09:33 -0700, Alexander Duyck wrote:
> I suggest you locate where we set "adapter->num_rx_queues" and attempt
> to address the issue there.

I see what you mean. On one hand, you use the num_rx_queues as if it can
get higher numbers and the code seems to be ready for that with all the
for loops in place, but on the other hand, it is hard coded to 1 at
ixgbevf_set_num_queues and it seems weird to add min function with
explicit '1'. Do you suggest to leave this driver alone hoping that if
in the future the '1' will change the author will remember to use the
default value?

Thanks,
Eilon

^ permalink raw reply

* RE: [RFC net-next (v2) 11/14] igb: set maximal number of default RSS queues
From: Eilon Greenstein @ 2012-06-25 18:20 UTC (permalink / raw)
  To: Vick, Matthew
  Cc: Yuval Mintz, davem@davemloft.net, netdev@vger.kernel.org,
	Kirsher, Jeffrey T
In-Reply-To: <06DFBC1E25D8024DB214DC7F41A3CD3448870CE4@ORSMSX101.amr.corp.intel.com>

On Mon, 2012-06-25 at 17:31 +0000, Vick, Matthew wrote:
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Yuval Mintz
> > Sent: Monday, June 25, 2012 4:46 AM
> > To: davem@davemloft.net; netdev@vger.kernel.org
> > Cc: eilong@broadcom.com; Yuval Mintz; Kirsher, Jeffrey T
> > Subject: [RFC net-next (v2) 11/14] igb: set maximal number of default
> > RSS queues
> > 
> > Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> > 
> > Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  drivers/net/ethernet/intel/igb/igb_main.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 01ced68..b11ee60 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -2465,7 +2465,8 @@ static int __devinit igb_sw_init(struct
> > igb_adapter *adapter)
> >  		break;
> >  	}
> > 
> > -	adapter->rss_queues = min_t(u32, max_rss_queues,
> > num_online_cpus());
> > +	adapter->rss_queues = min_t(u32, max_rss_queues,
> > +				    netif_get_num_default_rss_queues());
> > 
> >  	/* Determine if we need to pair queues. */
> >  	switch (hw->mac.type) {
> > --
> > 1.7.9.rc2
> 
> Since igb supports a maximum of 8 RSS queues anyway, it's generally unaffected by this change. That being said, I'm confused about what you're trying to accomplish--is it to standardize the number of RSS queues or limit the number of interrupts by default? Or is it the former with a hope that it will trickle down to the latter?
> 
> For example, if you take an igb device that supports 8 RSS queues (say, I350) and you change the default to 4 RSS queues, you will end up with the same number of interrupt vectors being requested as we will disable queue pairing and request separate vectors for Rx/Tx queues. I haven't looked at the other drivers affected by this RFC, but it's possible other drivers behave the same way.

The main motivation is the Rx rings that consume significant amount of
memory. The MSI-X vectors are a nice byproduct, but the logic you
describe still sounds good for this kind of low numbers.

^ permalink raw reply

* Re: [net-next RFC V4 PATCH 0/4] Multiqueue virtio-net
From: Shirley Ma @ 2012-06-25 18:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, habanero, kvm, mst, netdev, linux-kernel,
	virtualization, edumazet, tahm, jwhan, davem
In-Reply-To: <20120625090829.7263.65026.stgit@amd-6168-8-1.englab.nay.redhat.com>

Hello Jason,

Good work. Do you have local guest to guest results?

Thanks
Shirley

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox