Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/4] netdev/phy: Handle IEEE802.3 clause 45 Ethernet PHYs
From: David Miller @ 2012-06-25 22:34 UTC (permalink / raw)
  To: ddaney.cavm
  Cc: grant.likely, rob.herring, devicetree-discuss, netdev,
	linux-kernel, linux-mips, afleming, david.daney
In-Reply-To: <1340411056-18988-2-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <ddaney.cavm@gmail.com>
Date: Fri, 22 Jun 2012 17:24:13 -0700

> From: David Daney <david.daney@cavium.com>
> 
> The IEEE802.3 clause 45 MDIO bus protocol allows for directly
> addressing PHY registers using a 21 bit address, and is used by many
> 10G Ethernet PHYS.  Already existing is the ability of MDIO bus
> drivers to use clause 45, with the MII_ADDR_C45 flag.  Here we add
> struct phy_c45_device_ids to hold the device identifier registers
> present in clause 45. struct phy_device gets a couple of new fields:
> c45_ids to hold the identifiers and is_c45 to signal that it is clause
> 45.
> 
> Normally the MII_ADDR_C45 flag is ORed with the register address to
> indicate a clause 45 transaction.  Here we also use this flag in the
> *device* address passed to get_phy_device() to indicate that probing
> should be done with clause 45 transactions.
> 
> EXPORT phy_device_create() so that the follow-on patch to of_mdio.c
> can use it to create phy devices for PHYs, that have non-standard
> device identifier registers, based on the device tree bindings.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>

I see no value in having two ways to say that clause-45 transactions
should be used.

Either make it a PHY device attribute, or specify it in the address
in the register accesses, but not both.

Also your patch is full of coding style errors, I simply couldn't
stomache applying this even if I agreed with the substance of the
changes:

> +	     i < ARRAY_SIZE(c45_ids->device_ids) &&
> +		     c45_ids->devices_in_package == 0;

c45_ids on the second line should line up with the initial 'i'
on the first line.

> +		c45_ids->devices_in_package = (phy_reg & 0xffff) << 16;
> +
> +
> +		reg_addr = MII_ADDR_C45 | i << 16 | 5;

There is not reason in the world to have two empty lines there, it
looks awful.

> +		/*
> +		 * If mostly Fs, there is no device there,
> +		 * let's get out of here.
> +		 */

Format comments:

	/* Like
	 * this.
	 */

Not.

	/*
	 * Like
	 * this.
	 */

> +		c45_ids->device_ids[i] = (phy_reg & 0xffff) << 16;
> +
> +
> +		reg_addr = MII_ADDR_C45 | i << 16 | MII_PHYSID2;

Two empty lines.  This is extremely irritating, it looks like you
had some kind of debugging code here and then were very lazy about
removing it.

> +/*
> + * Or MII_ADDR_C45 into regnum for read/write on mii_bus to enable the
> + * 21 bit IEEE 802.3ae clause 45 addressing mode used by 10GIGE phy
> + * chips.  Also may be ORed into the device address in
> + * get_phy_device().
> + */

Comment formatting.

> +/*
> + * phy_c45_device_ids: 802.3-c45 Device Identifiers
> + *
> + * devices_in_package: Bit vector of devices present.
> + * device_ids: The device identifer for each present device.
> + */

If you're going to list the struct members use the correct kerneldoc
format to do so.

^ permalink raw reply

* Re: [PATCH] [XFRM] Fix unexpected SA hard expiration after changing date
From: David Miller @ 2012-06-25 22:29 UTC (permalink / raw)
  To: fdu; +Cc: herbert, netdev
In-Reply-To: <1340259961-30354-2-git-send-email-fdu@windriver.com>

From: Fan Du <fdu@windriver.com>
Date: Thu, 21 Jun 2012 14:26:01 +0800

> After SA is setup, one timer is armed to detect soft/hard expiration,
> however the timer handler uses xtime to do the math. This makes hard
> expiration occurs first before soft expiration after setting new date
> with big interval. As a result new child SA is deleted before rekeying
> the new one.
> 
> Signed-off-by: Fan Du <fdu@windriver.com>
> ---
>  include/net/xfrm.h    |  3 +++
>  net/xfrm/xfrm_state.c | 21 +++++++++++++++++----
>  2 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index 2933d74..8d16777 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -197,6 +197,8 @@ struct xfrm_state
>  
>  	struct xfrm_lifetime_cur curlft;
>  	struct timer_list	timer;
> +	/* used to fix curlft->add_time when changing date */
> +	long		saved_tmo;

'saved_tmo' is not properly indented to line up with the
struct member names above it.

^ permalink raw reply

* Re: [PATCH] netxen : Error return off by one for XG port.
From: David Miller @ 2012-06-25 22:27 UTC (permalink / raw)
  To: rajesh.borundia; +Cc: santoshprasadnayak, sony.chacko, netdev, kernel-janitors
In-Reply-To: <13A253B3F9BEFE43B93C09CF75F63CAA81A886EF1A@MNEXMB1.qlogic.org>

From: Rajesh Borundia <rajesh.borundia@qlogic.com>
Date: Wed, 20 Jun 2012 08:06:04 -0500

> ______________________________________
> From: santosh nayak [santoshprasadnayak@gmail.com]
> Sent: Wednesday, June 20, 2012 4:22 PM
> To: Sony Chacko; Rajesh Borundia
> Cc: netdev; kernel-janitors@vger.kernel.org; Santosh Nayak
> Subject: [PATCH] netxen : Error return off by one for XG port.

This is not the correct way to submit patches written by other
people.

^ permalink raw reply

* Re: [PATCH] netxen : Error return off by one for XG port.
From: David Miller @ 2012-06-25 22:27 UTC (permalink / raw)
  To: santoshprasadnayak; +Cc: sony.chacko, rajesh.borundia, netdev, kernel-janitors
In-Reply-To: <1340189578-18308-1-git-send-email-santoshprasadnayak@gmail.com>

From: santosh nayak <santoshprasadnayak@gmail.com>
Date: Wed, 20 Jun 2012 16:22:58 +0530

> From: Santosh Nayak <santoshprasadnayak@gmail.com>
> 
> There are  NETXEN_NIU_MAX_XG_PORTS ports.
> Port indexing starts from zero.
> Hence we should also return error for  'port == NETXEN_NIU_MAX_XG_PORTS'.
> 
> Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.
From: David Miller @ 2012-06-25 22:27 UTC (permalink / raw)
  To: santoshprasadnayak; +Cc: sony.chacko, rajesh.borundia, netdev, kernel-janitors
In-Reply-To: <1340177259-14083-1-git-send-email-santoshprasadnayak@gmail.com>

From: santosh nayak <santoshprasadnayak@gmail.com>
Date: Wed, 20 Jun 2012 12:57:39 +0530

> From: Santosh Nayak <santoshprasadnayak@gmail.com>
> 
> There are 'NETXEN_NIU_MAX_GBE_PORTS'  GBE ports. Port indexing starts
> from zero. 
> Hence we should also return error for "port == NETXEN_NIU_MAX_GBE_PORTS"
> 
> Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/3 net-next] tg3: Add sysfs file to export sensor data
From: David Miller @ 2012-06-25 22:24 UTC (permalink / raw)
  To: bhutchings; +Cc: mchan, netdev, nsujir, mcarlson
In-Reply-To: <1340659554.2572.52.camel@bwh-desktop.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 25 Jun 2012 22:25:54 +0100

> On Mon, 2012-06-25 at 14:04 -0700, Michael Chan wrote:
>> And please elaborate on the private ioctl.
> 
> Every driver gets to handle SIOCDEVPRIVATE .. SIOCDEVPRIVATE+15.  But
> avoid the first 3, as userland may blindly try to use them for MDIO.
> David may of course tell you that you should under no circumstances
> actually do this.

Indeed. :-)

^ permalink raw reply

* Re: [PATCH 3/3 net-next] tg3: Add sysfs file to export sensor data
From: Michael Chan @ 2012-06-25 21:50 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, nsujir, mcarlson
In-Reply-To: <1340659554.2572.52.camel@bwh-desktop.uk.solarflarecom.com>

On Mon, 2012-06-25 at 22:25 +0100, Ben Hutchings wrote: 
> > The rest of the bulk data requires too much parsing in the kernel and
> > will have to be exposed as binary data.  What do you mean by binary
> > attribute?  A new binary sysfs attribute under hwmon?  Or outside of
> > hwmon?
> 
> A binary sysfs attribute under your PCI device.  (In fact, for wider
> userland compatibility, hwmon sysfs attributes should also be under the
> PCI device rather than the hwmon device.  Yes, this *is* a weird
> convention.)
> 
> > And please elaborate on the private ioctl.
> 
> Every driver gets to handle SIOCDEVPRIVATE .. SIOCDEVPRIVATE+15.  But
> avoid the first 3, as userland may blindly try to use them for MDIO.
> David may of course tell you that you should under no circumstances
> actually do this. 

Yeah, we won't touch SIOCDEVPRIVATE with a 10-foot pole.

^ permalink raw reply

* Re: [PATCH 3/3 net-next] tg3: Add sysfs file to export sensor data
From: Ben Hutchings @ 2012-06-25 21:25 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, netdev, nsujir, mcarlson
In-Reply-To: <1340658258.4344.43.camel@LTIRV-MCHAN1.corp.ad.broadcom.com>

On Mon, 2012-06-25 at 14:04 -0700, Michael Chan wrote:
> On Sat, 2012-06-23 at 16:02 +0100, Ben Hutchings wrote: 
> > Temperature and voltage can be exposed through an hwmon device (which
> > practically means you use multiple attributes with conventional names).
> > Other diagnostics might possible be suitable for ethtool stats,
> > depending on what they are.
> 
> I think we can extract some common and more useful attributes such as
> temperature and voltage, and use the standard hwmon attributes to expose
> them.
> 
> > 
> > If the driver can't easily parse the information (e.g. it varies greatly
> > between the different chips and firmware versions) then a binary
> > attribute or private ioctl might be appropriate.  But generic interfaces
> > really should be considered first.
> > 
> 
> The rest of the bulk data requires too much parsing in the kernel and
> will have to be exposed as binary data.  What do you mean by binary
> attribute?  A new binary sysfs attribute under hwmon?  Or outside of
> hwmon?

A binary sysfs attribute under your PCI device.  (In fact, for wider
userland compatibility, hwmon sysfs attributes should also be under the
PCI device rather than the hwmon device.  Yes, this *is* a weird
convention.)

> And please elaborate on the private ioctl.

Every driver gets to handle SIOCDEVPRIVATE .. SIOCDEVPRIVATE+15.  But
avoid the first 3, as userland may blindly try to use them for MDIO.
David may of course tell you that you should under no circumstances
actually do this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 3/3 net-next] tg3: Add sysfs file to export sensor data
From: Michael Chan @ 2012-06-25 21:04 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, nsujir, mcarlson
In-Reply-To: <1340463729.6345.25.camel@deadeye.wl.decadent.org.uk>

On Sat, 2012-06-23 at 16:02 +0100, Ben Hutchings wrote: 
> Temperature and voltage can be exposed through an hwmon device (which
> practically means you use multiple attributes with conventional names).
> Other diagnostics might possible be suitable for ethtool stats,
> depending on what they are.

I think we can extract some common and more useful attributes such as
temperature and voltage, and use the standard hwmon attributes to expose
them.

> 
> If the driver can't easily parse the information (e.g. it varies greatly
> between the different chips and firmware versions) then a binary
> attribute or private ioctl might be appropriate.  But generic interfaces
> really should be considered first.
> 

The rest of the bulk data requires too much parsing in the kernel and
will have to be exposed as binary data.  What do you mean by binary
attribute?  A new binary sysfs attribute under hwmon?  Or outside of
hwmon?  And please elaborate on the private ioctl.

Thanks.

^ permalink raw reply

* Re: [RFC net-next (v2) 12/14] ixgbe: set maximal number of default RSS queues
From: Alexander Duyck @ 2012-06-25 21:03 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: eilong, Yuval Mintz, davem, netdev, Jeff Kirsher, John Fastabend
In-Reply-To: <1340649847.2572.15.camel@bwh-desktop.uk.solarflarecom.com>

On 06/25/2012 11:44 AM, Ben Hutchings wrote:
> On Mon, 2012-06-25 at 11:38 -0700, Alexander Duyck wrote:
>> On 06/25/2012 10:53 AM, Eilon Greenstein wrote:
>>> On Mon, 2012-06-25 at 08:44 -0700, Alexander Duyck wrote:
>>>> This doesn't limit queues, only interrupt vectors.  As I told you before
>>>> you should look at the ixgbe_set_rss_queues function if you actually
>>>> want to limit the number of RSS queues.
>>> How about this:
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> index af1a531..23a8609 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> @@ -277,6 +277,8 @@ static inline bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
>>>  	bool ret = false;
>>>  	struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_RSS];
>>>  
>>> +	f->indices = min_t(int, netif_get_num_default_rss_queues(), f->indices);
>>> +
>>>  	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
>>>  		f->mask = 0xF;
>>>  		adapter->num_rx_queues = f->indices;
>>> @@ -302,7 +304,9 @@ static inline bool ixgbe_set_fdir_queues(struct ixgbe_adapter *adapter)
>>>  	bool ret = false;
>>>  	struct ixgbe_ring_feature *f_fdir = &adapter->ring_feature[RING_F_FDIR];
>>>  
>>> -	f_fdir->indices = min_t(int, num_online_cpus(), f_fdir->indices);
>>> +	f_fdir->indices = min_t(int, netif_get_num_default_rss_queues(),
>>> +				f_fdir->indices);
>>> +
>>>  	f_fdir->mask = 0;
>>>  
>>>  	/*
>>> @@ -339,8 +343,7 @@ static inline bool ixgbe_set_fcoe_queues(struct ixgbe_adapter *adapter)
>>>  	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
>>>  		return false;
>>>  
>>> -	f->indices = min_t(int, num_online_cpus(), f->indices);
>>> -
>>> +	f->indices = min_t(int, f->indices, netif_get_num_default_rss_queues());
>>>  	adapter->num_rx_queues = 1;
>>>  	adapter->num_tx_queues = 1;
>>>
>> This makes much more sense, but still needs a few minor changes.  The
>> first change I would recommend is to not alter ixgbe_set_fdir_queues
>> since that controls flow director queues, not RSS queues.  The second
>> would be to update ixgbe_set_dcb_queues since that does RSS per DCB
>> traffic class and will also need to be updated.
> There is a difference between the stated aim (reduce memory allocated
> for RX buffers) and this implementation (limit number of RSS queues
> only).  If we agree on that aim then we should be limiting the total
> number of RX queues.
>
> Ben
The problem is there are certain features that require a certain number
of Tx/Rx queues.  In addition, certain features will behave differently
from RSS in terms of how well they scale based on the number of queues.

For example if we enable DCB we require at least one queue per traffic
class.  In the case of ATR we should have 1 queue per CPU since the Tx
queue selection is based on the number of CPUs.  If we don't have that
1:1 mapping we should be disabling ATR since the feature will not
function correctly in that case.

Thanks,

Alex

^ permalink raw reply

* Re: 3.4-rc: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
From: Francois Romieu @ 2012-06-25 20:25 UTC (permalink / raw)
  To: Tomas Papan; +Cc: netdev
In-Reply-To: <CAMGsXDSSCnOF+CykudBoEjTacSYL4LYG_ZnRTcS+Xzt54LW=Og@mail.gmail.com>

Tomas Papan <tomas.papan@gmail.com> :
[...]
> Is there anything what can I provide or try? I do not want to be stuck
> on 3.2.x kernel.

You can try to revert 036dafa28da1e2565a8529de2ae663c37b7a0060

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: Eldad Zack @ 2012-06-25 20:21 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev
In-Reply-To: <4FE8B535.70900@intel.com>


On Mon, 25 Jun 2012, John Fastabend wrote:
> On 6/25/2012 11:28 AM, Eldad Zack wrote:
> > Hi all,
> > 
> > This series of patches provides a partial LLDP (IEEE Std 802.1ab)
> > implementation.
> 
> Why do you want to implement this in the kernel? We've been doing this
> in user space for sometime without any issues and has some noted
> advantages. Reduces kernel bloat, easier to add extend, etc.

You have a point there. I don't have any real, I just wrote it for fun.

> What are we missing? Can we address any deficiencies here rather than
> embedded this into the kernel.

I will take a look, though it seems too feature complete :)

> No, netlink is much nicer for these types of things. User space
> can register for events and stay in sync and also all the other
> relevant events are using netlink UP/DOWN/DORMANT events for
> example.

I had a hunch that it might be the case. Thanks!

> > 
> > Thanks in advance if you're taking the time to review it or test it!
> > 
> 
> I'll scan the patches shortly.

Thanks for that (if you'd still like to) and all your comments, much 
appreciated!

Eldad

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: Eldad Zack @ 2012-06-25 20:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20120625115416.53655264@nehalam.linuxnetplumber.net>



On Mon, 25 Jun 2012, Stephen Hemminger wrote:

> Since LLDP is a control protocol what is wrong with the existing
> implementation in userspace?
>   https://github.com/vincentbernat/lldpd/wiki

Well, I don't have a business case here, since I start it on my free 
time out of a mix of interest and curiousity.

In any event, thanks for the feedback!

Eldad

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: John Fastabend @ 2012-06-25 19:00 UTC (permalink / raw)
  To: Eldad Zack; +Cc: netdev
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

On 6/25/2012 11:28 AM, Eldad Zack wrote:
> Hi all,
>
> This series of patches provides a partial LLDP (IEEE Std 802.1ab)
> implementation.
>
> I'd really appreciate a review of it.
>
> LLDP is a simple discovery protocol which advertises the identification and
> other info (such as MTU or capabilities) of device on the link. It can also
> help debug misconfigurations on the link layer (wrong MTU, wrong VLAN).
>

Why do you want to implement this in the kernel? We've been doing this
in user space for sometime without any issues and has some noted
advantages. Reduces kernel bloat, easier to add extend, etc.

Take a look at

http://www.open-lldp.org/open-lldp

What are we missing? Can we address any deficiencies here rather than
embedded this into the kernel.

> Some notes/questions:
>
> * Applies against net-next and mainline.
>
> * Included in this series is only LLDP output code.
>    This is not an issue since input and output are decoupled in LLDP anyway.
>    I'm working on the input code as well and will post it at some point in the
>    future.
>
> * Sysctl is used to do (some) configuration. This is done globally right now.
>    Before I add per-device sysctls: is it at all appropriate to use sysctl
>    here?

No, netlink is much nicer for these types of things. User space
can register for events and stay in sync and also all the other
relevant events are using netlink UP/DOWN/DORMANT events for
example.

>
> * By default, transmission is suppressed. To arm it, set
> 	/proc/sys/net/lldp/lldp_op_mode
>    to 1.
>
> * I've tested it on x86_64 and (qemu'd) x86.
>
> * I've tested it on my machine and it works with Ethernet and WLAN.
>
> * The last patch ("8021q/vlan: process NETDEV_GOING_DOWN") is needed
>    to be able to send shutdown PDU on VLAN interfaces, but has otherwise
>    no effect.
>

Do this in user space and you can handle all these events without
changing existing infrastructure.

> * Is there a better way to deal with 16-bit endianness other than masking and
>    shifting, when one field is more than a byte long (in this case 7/9)?
>
> * I usually only work on it on weekends, so I might be slow in responding back.

Please consider user space implementations (you don't have to use
mine) or let us know why we _need_ this in the kernel.

>
> Thanks in advance if you're taking the time to review it or test it!
>

I'll scan the patches shortly.

Thanks,
John

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: Stephen Hemminger @ 2012-06-25 18:54 UTC (permalink / raw)
  To: Eldad Zack; +Cc: netdev
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Since LLDP is a control protocol what is wrong with the existing
implementation in userspace?
  https://github.com/vincentbernat/lldpd/wiki

^ permalink raw reply

* Re: [PATCH 1/8] if_ether.h: Add LLDP ethertype
From: Eldad Zack @ 2012-06-25 18:48 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1340648900-6547-2-git-send-email-eldad@fogrefinery.com>



On Mon, 25 Jun 2012, Eldad Zack wrote:

> The LLDP ethertype is defined in IEEE Std 802.1ab (2005), clause 8.3.
> 
> Signed-off-by: Eldad Zack <eldad@fogrefinery.com>

I'll reiterate here, since the previous mail won't show up on patchwork:

I just noticed that I didn't change the subject on the following
patches to PATCH RFC -- Sorry for the inconvenience!

Eldad

^ permalink raw reply

* Re: [RFC net-next (v2) 12/14] ixgbe: set maximal number of default RSS queues
From: Ben Hutchings @ 2012-06-25 18:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: eilong, Yuval Mintz, davem, netdev, Jeff Kirsher, John Fastabend
In-Reply-To: <4FE8B019.4030807@intel.com>

On Mon, 2012-06-25 at 11:38 -0700, Alexander Duyck wrote:
> On 06/25/2012 10:53 AM, Eilon Greenstein wrote:
> > On Mon, 2012-06-25 at 08:44 -0700, Alexander Duyck wrote:
> >> This doesn't limit queues, only interrupt vectors.  As I told you before
> >> you should look at the ixgbe_set_rss_queues function if you actually
> >> want to limit the number of RSS queues.
> > How about this:
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> > index af1a531..23a8609 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> > @@ -277,6 +277,8 @@ static inline bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
> >  	bool ret = false;
> >  	struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_RSS];
> >  
> > +	f->indices = min_t(int, netif_get_num_default_rss_queues(), f->indices);
> > +
> >  	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
> >  		f->mask = 0xF;
> >  		adapter->num_rx_queues = f->indices;
> > @@ -302,7 +304,9 @@ static inline bool ixgbe_set_fdir_queues(struct ixgbe_adapter *adapter)
> >  	bool ret = false;
> >  	struct ixgbe_ring_feature *f_fdir = &adapter->ring_feature[RING_F_FDIR];
> >  
> > -	f_fdir->indices = min_t(int, num_online_cpus(), f_fdir->indices);
> > +	f_fdir->indices = min_t(int, netif_get_num_default_rss_queues(),
> > +				f_fdir->indices);
> > +
> >  	f_fdir->mask = 0;
> >  
> >  	/*
> > @@ -339,8 +343,7 @@ static inline bool ixgbe_set_fcoe_queues(struct ixgbe_adapter *adapter)
> >  	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
> >  		return false;
> >  
> > -	f->indices = min_t(int, num_online_cpus(), f->indices);
> > -
> > +	f->indices = min_t(int, f->indices, netif_get_num_default_rss_queues());
> >  	adapter->num_rx_queues = 1;
> >  	adapter->num_tx_queues = 1;
> >
> This makes much more sense, but still needs a few minor changes.  The
> first change I would recommend is to not alter ixgbe_set_fdir_queues
> since that controls flow director queues, not RSS queues.  The second
> would be to update ixgbe_set_dcb_queues since that does RSS per DCB
> traffic class and will also need to be updated.

There is a difference between the stated aim (reduce memory allocated
for RX buffers) and this implementation (limit number of RSS queues
only).  If we agree on that aim then we should be limiting the total
number of RX queues.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [RFC net-next (v2) 11/14] igb: set maximal number of default RSS queues
From: Ben Hutchings @ 2012-06-25 18:38 UTC (permalink / raw)
  To: Vick, Matthew
  Cc: Yuval Mintz, davem@davemloft.net, netdev@vger.kernel.org,
	eilong@broadcom.com, Kirsher, Jeffrey T
In-Reply-To: <06DFBC1E25D8024DB214DC7F41A3CD3448870CE4@ORSMSX101.amr.corp.intel.com>

On Mon, 2012-06-25 at 17:31 +0000, Vick, Matthew wrote:
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Yuval Mintz
> > Sent: Monday, June 25, 2012 4:46 AM
> > To: davem@davemloft.net; netdev@vger.kernel.org
> > Cc: eilong@broadcom.com; Yuval Mintz; Kirsher, Jeffrey T
> > Subject: [RFC net-next (v2) 11/14] igb: set maximal number of default
> > RSS queues
> > 
> > Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> > 
> > Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  drivers/net/ethernet/intel/igb/igb_main.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 01ced68..b11ee60 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -2465,7 +2465,8 @@ static int __devinit igb_sw_init(struct
> > igb_adapter *adapter)
> >  		break;
> >  	}
> > 
> > -	adapter->rss_queues = min_t(u32, max_rss_queues,
> > num_online_cpus());
> > +	adapter->rss_queues = min_t(u32, max_rss_queues,
> > +				    netif_get_num_default_rss_queues());
> > 
> >  	/* Determine if we need to pair queues. */
> >  	switch (hw->mac.type) {
> > --
> > 1.7.9.rc2
> 
> Since igb supports a maximum of 8 RSS queues anyway, it's generally
> unaffected by this change.

I'm interested in providing a global configuration mechanism for this,
which is why I asked Yuval to introduce this function.

> That being said, I'm confused about what you're trying to
> accomplish--is it to standardize the number of RSS queues or limit the
> number of interrupts by default? Or is it the former with a hope that
> it will trickle down to the latter?
>
> For example, if you take an igb device that supports 8 RSS queues
> (say, I350) and you change the default to 4 RSS queues, you will end
> up with the same number of interrupt vectors being requested as we
> will disable queue pairing and request separate vectors for Rx/Tx
> queues. I haven't looked at the other drivers affected by this RFC,
> but it's possible other drivers behave the same way.

sfc also supports those two modes, but under control of a module
parameter.  If it's a common enough feature then perhaps it should also
be subject to global configuration (though that might be confusing if
most drivers continue to support only one mode).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [RFC net-next (v2) 12/14] ixgbe: set maximal number of default RSS queues
From: Alexander Duyck @ 2012-06-25 18:38 UTC (permalink / raw)
  To: eilong; +Cc: Yuval Mintz, davem, netdev, Jeff Kirsher, John Fastabend
In-Reply-To: <1340646818.2486.27.camel@lb-tlvb-eilong.il.broadcom.com>

On 06/25/2012 10:53 AM, Eilon Greenstein wrote:
> On Mon, 2012-06-25 at 08:44 -0700, Alexander Duyck wrote:
>> This doesn't limit queues, only interrupt vectors.  As I told you before
>> you should look at the ixgbe_set_rss_queues function if you actually
>> want to limit the number of RSS queues.
> How about this:
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> index af1a531..23a8609 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -277,6 +277,8 @@ static inline bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
>  	bool ret = false;
>  	struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_RSS];
>  
> +	f->indices = min_t(int, netif_get_num_default_rss_queues(), f->indices);
> +
>  	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
>  		f->mask = 0xF;
>  		adapter->num_rx_queues = f->indices;
> @@ -302,7 +304,9 @@ static inline bool ixgbe_set_fdir_queues(struct ixgbe_adapter *adapter)
>  	bool ret = false;
>  	struct ixgbe_ring_feature *f_fdir = &adapter->ring_feature[RING_F_FDIR];
>  
> -	f_fdir->indices = min_t(int, num_online_cpus(), f_fdir->indices);
> +	f_fdir->indices = min_t(int, netif_get_num_default_rss_queues(),
> +				f_fdir->indices);
> +
>  	f_fdir->mask = 0;
>  
>  	/*
> @@ -339,8 +343,7 @@ static inline bool ixgbe_set_fcoe_queues(struct ixgbe_adapter *adapter)
>  	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
>  		return false;
>  
> -	f->indices = min_t(int, num_online_cpus(), f->indices);
> -
> +	f->indices = min_t(int, f->indices, netif_get_num_default_rss_queues());
>  	adapter->num_rx_queues = 1;
>  	adapter->num_tx_queues = 1;
>
This makes much more sense, but still needs a few minor changes.  The
first change I would recommend is to not alter ixgbe_set_fdir_queues
since that controls flow director queues, not RSS queues.  The second
would be to update ixgbe_set_dcb_queues since that does RSS per DCB
traffic class and will also need to be updated.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH RFC 0/8] LLDP implementation for Linux
From: Eldad Zack @ 2012-06-25 18:33 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>


On Mon, 25 Jun 2012, Eldad Zack wrote:

> Hi all,
> 
> This series of patches provides a partial LLDP (IEEE Std 802.1ab)
> implementation.
> 

I just noticed that I didn't change the subject on the following
patches to PATCH RFC -- Sorry for the inconvenience!

Eldad

^ permalink raw reply

* [PATCH 8/8] 8021q/vlan: process NETDEV_GOING_DOWN
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

In the current flow, when you take down a physical device that has
VLANs configured on it, the NETDEV_GOING_DOWN notification will be
sent too late, i.e., no data can be sent to the wire anymore.

static int __dev_close_many(struct list_head *head)
{
...
	list_for_each_entry(dev, head, unreg_list) {
		call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
...
	}
...
	list_for_each_entry(dev, head, unreg_list) {
		if (ops->ndo_stop)
			ops->ndo_stop(dev);
	}
...
}

static int dev_close_many(struct list_head *head)
{
	...
	__dev_close_many(head);

	list_for_each_entry(dev, head, unreg_list) {
		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
		call_netdevice_notifiers(NETDEV_DOWN, dev);
	}
}

In a setup like this: eth0 with VLANs 2, 3 the flow would be:

eth0 - NETDEV_GOING_DOWN
ndo_stop is called on the device
eth0.2 - NETDEV_GOING_DOWN
eth0.2 - NETDEV_DOWN
eth0.3 - NETDEV_GOING_DOWN
eth0.3 - NETDEV_DOWN
eth0 - NETDEV_DOWN

If instead NETDEV_GOING_DOWN is processed, the flow would be:
eth0.2 - NETDEV_GOING_DOWN
eth0.2 - NETDEV_DOWN
eth0.3 - NETDEV_GOING_DOWN
eth0.3 - NETDEV_DOWN
eth0 - NETDEV_GOING_DOWN
eth0 - NETDEV_DOWN
ndo_stop is called on the device

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/8021q/vlan.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 6089f0c..fd87ecc 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -402,6 +402,12 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 
 		break;
 
+	case NETDEV_GOING_DOWN: /* NETDEV_DOWN */
+	/* If the parent device is going down it will call ndo_stop after
+	 * it sends out NETDEV_GOING_DOWN but before sending out NETDEV_DOWN,
+	 * The effect of which is, that no data can be sent anymore
+	 * by the time the VLAN device sends out its NETDEV_GOING_DOWN.
+	 */
 	case NETDEV_DOWN:
 		/* Put all VLANs for this dev in the down state too.  */
 		for (i = 0; i < VLAN_N_VID; i++) {
-- 
1.7.10

^ permalink raw reply related

* [PATCH 7/8] LLDP: Kconfig and Makefile
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Adds Kconfig and Makefile for LLDP, and sources these in
net/Makefile and net/Kconfig.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/Kconfig       |    1 +
 net/Makefile      |    1 +
 net/lldp/Kconfig  |   21 +++++++++++++++++++++
 net/lldp/Makefile |    7 +++++++
 4 files changed, 30 insertions(+)
 create mode 100644 net/lldp/Kconfig
 create mode 100644 net/lldp/Makefile

diff --git a/net/Kconfig b/net/Kconfig
index 245831b..cb724f2 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -216,6 +216,7 @@ source "net/dcb/Kconfig"
 source "net/dns_resolver/Kconfig"
 source "net/batman-adv/Kconfig"
 source "net/openvswitch/Kconfig"
+source "net/lldp/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 4f4ee08..39fbb8b 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -70,3 +70,4 @@ obj-$(CONFIG_CEPH_LIB)		+= ceph/
 obj-$(CONFIG_BATMAN_ADV)	+= batman-adv/
 obj-$(CONFIG_NFC)		+= nfc/
 obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
+obj-$(CONFIG_LLDP)		+= lldp/
diff --git a/net/lldp/Kconfig b/net/lldp/Kconfig
new file mode 100644
index 0000000..0c59fbf
--- /dev/null
+++ b/net/lldp/Kconfig
@@ -0,0 +1,21 @@
+#
+# Link Layer Discovery Protocol (LLDP)
+# IEEE Std 802.1ab
+#
+
+config LLDP
+	tristate "Link Layer Discovery Protocol (LLDP) Support"
+	depends on EXPERIMENTAL
+	default n
+	---help---
+	  Enables experimental support for the LLDP protocol described in
+	  IEEE Std 802.1ab.
+
+	  The LLDP protocol allows nodes to advertise their identification,
+	  configuration and capabilities to neighbors on the same link.
+	  Currently, only advertisement is implemented.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called lldp.
+
+	  If unsure say N.
diff --git a/net/lldp/Makefile b/net/lldp/Makefile
new file mode 100644
index 0000000..e3c609f
--- /dev/null
+++ b/net/lldp/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the Linux Link Layer Discovery Protocol (LLDP) implementation.
+#
+
+obj-$(CONFIG_LLDP) += lldp.o
+
+lldp-objs :=	lldp_core.o lldp_output.o lldpdu.o sysctl_net_lldp.o
-- 
1.7.10

^ permalink raw reply related

* [PATCH 6/8] LLDP: Core routines
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

* Core LLDP routines: handles modules registration,
  netdevice notifications and timers.

* Adds specific data pointer to netdevice.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 include/linux/netdevice.h |    4 +
 net/lldp/lldp_core.c      |  214 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 218 insertions(+)
 create mode 100644 net/lldp/lldp_core.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d94cb14..813475a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -61,6 +61,7 @@ struct device;
 struct phy_device;
 /* 802.11 specific */
 struct wireless_dev;
+struct lldp_ptr;
 					/* source back-compat hooks */
 #define SET_ETHTOOL_OPS(netdev,ops) \
 	( (netdev)->ethtool_ops = (ops) )
@@ -1158,6 +1159,9 @@ struct net_device {
 	void			*ax25_ptr;	/* AX.25 specific data */
 	struct wireless_dev	*ieee80211_ptr;	/* IEEE 802.11 specific data,
 						   assign before registering */
+#if IS_ENABLED(CONFIG_LLDP)
+	struct lldp_dev __rcu	*lldp_ptr;	/* LLDP specific data */
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/lldp/lldp_core.c b/net/lldp/lldp_core.c
new file mode 100644
index 0000000..3d4a1f1
--- /dev/null
+++ b/net/lldp/lldp_core.c
@@ -0,0 +1,214 @@
+/* LLDP		Link Layer Discovery Protocol impementation for Linux
+ *		IEEE Std 802.1ab
+ *
+ * Author:	Eldad Zack <eldad@fogrefinery.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "LLDP " fmt
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/timer.h>
+#include <linux/if_ether.h>
+#include <linux/etherdevice.h>
+#include <linux/if_arp.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include "lldp.h"
+
+int sysctl_lldp_operational_mode = LLDP_SYSCTL_OP_SUPPRESS;
+int sysctl_lldp_transmit_interval = LLDP_DEFAULT_MSG_TX_INTERVAL;
+int sysctl_lldp_hold_multiplier = LLDP_DEFAULT_MSG_TX_HOLD_MULT;
+
+bool is_valid_lldp_dev(struct net_device *dev)
+{
+	bool ret;
+
+	if (!is_valid_ether_addr(dev->dev_addr))
+		return false;
+
+	ret = (!(dev->flags & IFF_LOOPBACK) &&
+		(dev->type == ARPHRD_ETHER) &&
+		(dev->addr_len == ETH_ALEN));
+
+	return ret;
+}
+
+int init_lldp_dev_info(struct net_device *dev)
+{
+	struct lldp_dev *ldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	if (dev->lldp_ptr != NULL)
+		return 0;
+
+	ldev = kzalloc(sizeof(struct lldp_dev), GFP_ATOMIC);
+	if (ldev == NULL)
+		return -ENOMEM;
+
+	rcu_read_lock();
+	dev->lldp_ptr = ldev;
+	rcu_read_unlock();
+
+	return 0;
+}
+
+void lldp_timer_handler(unsigned long data)
+{
+	struct net_device *dev = (struct net_device *) data;
+	struct lldp_dev *ldev;
+	struct timer_list *tx_timer;
+
+	BUG_ON(dev == NULL);
+
+	rcu_read_lock();
+	if (dev->lldp_ptr == NULL) { /* device went down */
+		rcu_read_unlock();
+		return;
+	}
+
+	ldev = dev->lldp_ptr;
+	tx_timer = ldev->tx_timer;
+
+	if (sysctl_lldp_operational_mode & LLDP_SYSCTL_OP_TX)
+		lldp_send(dev);
+
+	mod_timer(tx_timer, jiffies +
+		msecs_to_jiffies(1000 * sysctl_lldp_transmit_interval));
+	rcu_read_unlock();
+}
+
+void lldp_add_dev(struct net_device *dev)
+{
+	int err;
+	struct lldp_dev *ldev;
+	struct timer_list *tx_timer;
+
+	err = init_lldp_dev_info(dev);
+	if (err < 0) {
+		pr_err("could not start on device %s (%d)\n", dev->name, err);
+		return;
+	}
+
+	rcu_read_lock();
+	ldev = dev->lldp_ptr;
+	rcu_read_unlock();
+
+	if (ldev->tx_timer != NULL)
+		return;
+
+	tx_timer = kzalloc(sizeof(struct timer_list), GFP_ATOMIC);
+	if (tx_timer == NULL) {
+		pr_err("no memory available (device %s)\n", dev->name);
+		return;
+	}
+
+	setup_timer(tx_timer, lldp_timer_handler, (unsigned long) dev);
+
+	rcu_read_lock();
+	ldev->tx_timer = tx_timer;
+	rcu_read_unlock();
+
+	/* Send the first frame without waiting. The timer handler
+	 * will schedule it.
+	 */
+	lldp_timer_handler((unsigned long) dev);
+
+	pr_info("started on device %s\n", dev->name);
+}
+
+static void lldp_del_dev(struct net_device *dev)
+{
+	struct lldp_dev *ldev;
+	struct timer_list *tx_timer;
+
+	if (dev == NULL)
+		return;
+
+	rcu_read_lock();
+	ldev = (struct lldp_dev *)dev->lldp_ptr;
+	if (ldev == NULL) {
+		rcu_read_unlock();
+		return;
+	}
+
+	tx_timer = ldev->tx_timer;
+	if (tx_timer != NULL) {
+		ldev->tx_timer = NULL;
+		del_timer(tx_timer);
+		kfree(tx_timer);
+	}
+
+	dev->lldp_ptr = NULL;
+	kfree(ldev);
+	rcu_read_unlock();
+
+	pr_info("stopped on device %s\n", dev->name);
+}
+
+static int lldp_notify(struct notifier_block *nb, unsigned long event,
+		void *data)
+{
+	struct net_device *dev = (struct net_device *) data;
+
+	if (dev == NULL)
+		return NOTIFY_DONE;
+
+	if (!is_valid_lldp_dev(dev))
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_UP: /* NETDEV_CHANGE */
+	case NETDEV_CHANGE:
+		lldp_add_dev(dev);
+		break;
+	case NETDEV_GOING_DOWN:
+		/* Shutdown PDU, Clause 10.2.1.2 */
+		lldp_send_shutdown(dev);
+		break;
+	case NETDEV_UNREGISTER: /* NETDEV_DOWN */
+	case NETDEV_DOWN:
+		lldp_del_dev(dev);
+		break;
+	}
+
+	return NOTIFY_OK;
+};
+
+static struct notifier_block lldp_dev_notify = {
+	.notifier_call = lldp_notify,
+};
+
+static int __init lldp_init(void)
+{
+	pr_info("module initializing\n");
+
+	register_netdevice_notifier(&lldp_dev_notify);
+	lldp_register_sysctl();
+	return 0;
+}
+
+static void __exit lldp_fini(void)
+{
+	struct net_device *dev;
+
+	lldp_unregister_sysctl();
+	unregister_netdevice_notifier(&lldp_dev_notify);
+
+	for_each_netdev(&init_net, dev)
+		lldp_del_dev(dev);
+
+	pr_info("module stopped\n");
+}
+
+module_init(lldp_init);
+module_exit(lldp_fini);
+
+MODULE_LICENSE("GPL");
-- 
1.7.10

^ permalink raw reply related

* [PATCH 5/8] LLDP: Output routines
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Handles the transmission of LLDPDUs.

Gets the TLV list, allocates the sk_buff according to the
space requirements of the TLV list, writes it to the sk_buff and
if all goes well, sends it to the wire.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/lldp/lldp_output.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 net/lldp/lldp_output.c

diff --git a/net/lldp/lldp_output.c b/net/lldp/lldp_output.c
new file mode 100644
index 0000000..45e6293
--- /dev/null
+++ b/net/lldp/lldp_output.c
@@ -0,0 +1,67 @@
+/* LLDP		Link Layer Discovery Protocol impementation for Linux
+ *		IEEE Std 802.1ab
+ *
+ * Author:	Eldad Zack <eldad@fogrefinery.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/if_ether.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include "lldp.h"
+
+static u8 lldp_dst_address[] = LLDP_MULTICAST_ADDR;
+
+inline struct sk_buff *lldp_create(struct net_device *dev, bool is_shutdown)
+{
+	struct sk_buff *skb;
+	struct list_head head;
+	int head_len = LL_RESERVED_SPACE(dev);
+	int tail_len = dev->needed_tailroom;
+	int lldp_len;
+
+	INIT_LIST_HEAD(&head);
+
+	lldp_tlv_construct_list(&head, dev, is_shutdown);
+	lldp_len = lldp_tlv_list_len(&head);
+
+	skb = alloc_skb(head_len + lldp_len + tail_len, GFP_ATOMIC);
+	if (skb == NULL)
+		goto done;
+
+	skb_reserve(skb, head_len);
+	skb_reset_network_header(skb);
+
+	lldp_tlv_put_skb_list(skb, &head);
+
+	skb->dev = dev;
+	skb->protocol = htons(ETH_P_LLDP);
+
+	if (dev_hard_header(skb, dev, ETH_P_LLDP, lldp_dst_address,
+		dev->dev_addr, skb->len) < 0) {
+		kfree_skb(skb);
+		skb = NULL;
+	}
+
+done:
+	lldp_tlv_destruct_list(&head);
+	return skb;
+}
+
+void __lldp_send(struct net_device *dev, bool is_shutdown)
+{
+	struct sk_buff *skb;
+
+	skb = lldp_create(dev, is_shutdown);
+
+	if (skb == NULL)
+		return;
+
+	dev_queue_xmit(skb);
+}
-- 
1.7.10

^ permalink raw reply related

* [PATCH 4/8] LLDP: PDU-handling routines
From: Eldad Zack @ 2012-06-25 18:28 UTC (permalink / raw)
  To: netdev; +Cc: Eldad Zack
In-Reply-To: <1340648900-6547-1-git-send-email-eldad@fogrefinery.com>

Routines for creation and parsing of LLPDUs.

Highlights:

* lldp_construct_pdu_list() constructs a list of TLVs
  according to device and configuration.

* lldp_tlv_list_len() calculates the space needed to
  send the above TLV list to the wire. Used when an sk_buff
  is allocated.

* lldp_put_tlv_skb_list() writes the TLV list to the sk_buff.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/lldp/lldpdu.c |  307 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 307 insertions(+)
 create mode 100644 net/lldp/lldpdu.c

diff --git a/net/lldp/lldpdu.c b/net/lldp/lldpdu.c
new file mode 100644
index 0000000..e71ddd7
--- /dev/null
+++ b/net/lldp/lldpdu.c
@@ -0,0 +1,307 @@
+/* LLDP		Link Layer Discovery Protocol impementation for Linux
+ *		IEEE Std 802.1ab
+ *
+ * Author:	Eldad Zack <eldad@fogrefinery.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/utsname.h>
+#include "lldp.h"
+
+const unsigned char oui_802_1[LLDP_OUI_LEN] = LLDP_OUI_802_1;
+const unsigned char oui_802_3[LLDP_OUI_LEN] = LLDP_OUI_802_3;
+
+int lldp_tlv_list_len(struct list_head *head)
+{
+	struct lldp_tlv *tlv;
+	int len = 0;
+
+	list_for_each_entry(tlv, head, lh)
+		len += tlv->entry_len;
+
+	return len;
+}
+
+void lldp_tlv_calc_entry_len(struct lldp_tlv *tlv)
+{
+	uint16_t elen;
+
+	BUG_ON(tlv == NULL);
+	BUG_ON(tlv->len > LLDP_LEN_MAX);
+
+	elen = 2;		/* TLV Header */
+
+	if (tlv->subtype > 0)
+		elen += 1;
+
+	BUG_ON((tlv->oui != NULL) &&
+		(tlv->type != LLDP_TLV_ORGANIZATIONAL));
+	if (tlv->oui != NULL)
+		elen += LLDP_OUI_LEN;
+
+	elen += tlv->len;	/* Actual data length */
+
+	tlv->entry_len = elen;
+}
+
+void __lldp_add_tlv_list(struct list_head *head, unsigned short type,
+			unsigned short len, unsigned char *data,
+			unsigned char subtype, const unsigned char *oui)
+{
+	struct lldp_tlv *tlv;
+
+	BUG_ON(len > LLDP_LEN_MAX);
+	BUG_ON(head == NULL);
+
+	tlv = kzalloc(sizeof(struct lldp_tlv), GFP_ATOMIC);
+
+	tlv->type = type;
+	tlv->len = len;
+
+	tlv->val = kmalloc(len, GFP_ATOMIC);
+	memcpy(tlv->val, data, len);
+
+	tlv->subtype = subtype;
+
+	if (type == LLDP_TLV_ORGANIZATIONAL) {
+		BUG_ON(oui == NULL);
+
+		tlv->oui = kmalloc(LLDP_OUI_LEN, GFP_ATOMIC);
+		memcpy(tlv->oui, oui, LLDP_OUI_LEN);
+	}
+
+	lldp_tlv_calc_entry_len(tlv);
+
+	list_add_tail(&(tlv->lh), head);
+}
+
+inline void lldp_add_tlv_list(struct list_head *head, unsigned short type,
+			unsigned short len, unsigned char *data)
+{
+	__lldp_add_tlv_list(head, type, len, data, 0, NULL);
+}
+
+inline void lldp_add_tlv_subtype_list(struct list_head *head,
+				unsigned short type,
+				unsigned short len, unsigned char *data,
+				unsigned char subtype)
+{
+	__lldp_add_tlv_list(head, type, len, data, subtype, NULL);
+}
+
+inline void lldp_add_oui_tlv_list(struct list_head *head,
+				const unsigned char *oui,
+				unsigned short len, unsigned char *data,
+				unsigned char subtype)
+{
+	__lldp_add_tlv_list(head, LLDP_TLV_ORGANIZATIONAL, len, data,
+				subtype, oui);
+}
+
+inline void lldp_add_tlv_chassis_id(struct list_head *head,
+				struct net_device *dev)
+{
+	lldp_add_tlv_subtype_list(head, LLDP_TLV_CHASSIS_ID, ETH_ALEN,
+		dev->dev_addr, LLDP_ST_CHID_MAC_ADDR);
+}
+
+inline void lldp_add_tlv_port_id(struct list_head *head, struct net_device *dev)
+{
+	lldp_add_tlv_subtype_list(head, LLDP_TLV_PORT_ID, strlen(dev->name),
+		dev->name, LLDP_ST_PORTID_IFNAME);
+}
+
+inline void lldp_add_tlv_ttl(struct list_head *head, struct net_device *dev,
+				bool is_shutdown)
+{
+	uint16_t ttl;
+
+	if (is_shutdown) {
+		ttl = 0;
+	} else {
+		ttl = htons(sysctl_lldp_transmit_interval *
+				sysctl_lldp_hold_multiplier);
+	}
+
+	lldp_add_tlv_list(head, LLDP_TLV_TIME_TO_LIVE, sizeof(ttl),
+		(unsigned char *) &ttl);
+}
+
+inline void lldp_add_tlv_vlan(struct list_head *head, struct net_device *dev)
+{
+#if IS_ENABLED(CONFIG_VLAN_8021Q)
+	if (dev->flags & IFF_802_1Q_VLAN) {
+		uint16_t vlan = htons(vlan_dev_vlan_id(dev));
+		/* Clause F.2.1: Value of 0 signifies that the system
+		 * does not know the VLAN ID or doesn"t support it.
+		 */
+		if (vlan != 0) {
+			lldp_add_oui_tlv_list(head, oui_802_1, sizeof(vlan),
+				(unsigned char *) &vlan,
+				LLDP_802_1_PORT_VLANID);
+		}
+	}
+#endif
+}
+
+inline void lldp_add_tlv_mtu(struct list_head *head, struct net_device *dev)
+{
+	uint16_t mtu;
+
+	mtu = htons(dev->mtu);
+	lldp_add_oui_tlv_list(head, oui_802_3, sizeof(mtu),
+		(unsigned char *) &mtu, LLDP_802_3_MTU);
+}
+
+inline void lldp_add_tlv_port_description(struct list_head *head,
+					struct net_device *dev)
+{
+	char *desc = dev->ifalias;
+	if (desc == NULL)
+		return;
+
+	lldp_add_tlv_list(head, LLDP_TLV_PORT_DESCRIPTION, strlen(desc), desc);
+}
+
+inline void lldp_add_tlv_system_name(struct list_head *head,
+					struct net_device *dev)
+{
+	char *name = utsname()->nodename;
+	if (name == NULL)
+		return;
+
+	lldp_add_tlv_list(head, LLDP_TLV_SYSTEM_NAME, strlen(name), name);
+}
+
+inline void lldp_add_tlv_system_description(struct list_head *head,
+					struct net_device *dev)
+{
+	char *desc = utsname()->sysname;
+	if (desc == NULL)
+		return;
+
+	lldp_add_tlv_list(head, LLDP_TLV_SYSTEM_DESCRIPTION,
+		strlen(desc), desc);
+}
+
+inline void lldp_add_tlv_system_capabilities(struct list_head *head,
+					struct net_device *dev)
+{
+	struct lldp_caps caps;
+
+	caps.sys = htons(LLDP_CAP_BRIDGE | LLDP_CAP_ROUTER);
+	caps.enabled = htons(LLDP_CAP_ROUTER);
+
+	lldp_add_tlv_list(head, LLDP_TLV_SYSTEM_CAPABILITIES,
+		sizeof(struct lldp_caps), (unsigned char *) &caps);
+}
+
+void lldp_tlv_construct_list(struct list_head *head, struct net_device *dev,
+				bool is_shutdown)
+{
+	BUG_ON(head == NULL);
+
+	/* Mandatory TLVs with strict order */
+	lldp_add_tlv_chassis_id(head, dev);
+	lldp_add_tlv_port_id(head, dev);
+	lldp_add_tlv_ttl(head, dev, is_shutdown);
+
+	/* Shutdown PDU is limited to mandatory TLVs only */
+	if (is_shutdown)
+		goto end;
+
+	/* Optional TLVs */
+	lldp_add_tlv_port_description(head, dev);
+	lldp_add_tlv_system_name(head, dev);
+	lldp_add_tlv_system_description(head, dev);
+	lldp_add_tlv_system_capabilities(head, dev);
+
+	/* Additional 802.1 and 802.3 private TLVs */
+	lldp_add_tlv_vlan(head, dev);	/* Annex F.1 */
+	lldp_add_tlv_mtu(head, dev);	/* Annex G.5 */
+
+end:
+	/* End TLV */
+	lldp_add_tlv_list(head, LLDP_TLV_END, 0, NULL);
+}
+
+void lldp_tlv_destruct(struct lldp_tlv *tlv)
+{
+	if (tlv->oui != NULL)
+		kfree(tlv->oui);
+
+	if (tlv->val != NULL)
+		kfree(tlv->val);
+}
+
+void lldp_tlv_destruct_list(struct list_head *head)
+{
+	struct lldp_tlv *tlv, *tmp;
+
+	if (head == NULL)
+		return;
+
+	if (list_empty(head))
+		return;
+
+	list_for_each_entry_safe(tlv, tmp, head, lh) {
+		list_del(&tlv->lh);
+		lldp_tlv_destruct(tlv);
+		kfree(tlv);
+	}
+}
+
+inline uint16_t tlv_hdr(struct lldp_tlv *tlv)
+{
+	int len = tlv->entry_len - LLDP_TLV_HDR_LEN;
+
+	BUG_ON(tlv == NULL);
+	BUG_ON(len < 0);
+
+	return htons(((tlv->type << LLDP_TYPE_SHIFT) & LLDP_TYPE_MASK) |
+			(len & LLDP_LEN_MASK));
+}
+
+void lldp_tlv_put_skb_list(struct sk_buff *skb, struct list_head *head)
+{
+	struct lldp_tlv *tlv;
+	struct list_head *p;
+	unsigned char *buf;
+	u16 hdr;
+
+	list_for_each(p, head) {
+		tlv = list_entry(p, struct lldp_tlv, lh);
+
+		hdr = tlv_hdr(tlv);
+		buf = skb_put(skb, sizeof(hdr));
+		memcpy(buf, &hdr, sizeof(hdr));
+
+		BUG_ON((tlv->oui != NULL) &&
+			(tlv->type != LLDP_TLV_ORGANIZATIONAL));
+		if (tlv->oui != NULL) {
+			buf = skb_put(skb, LLDP_OUI_LEN);
+			memcpy(buf, tlv->oui, LLDP_OUI_LEN);
+		}
+
+		if (tlv->subtype > 0) {
+			buf = skb_put(skb, sizeof(unsigned char));
+			*buf = tlv->subtype;
+		}
+
+		if (tlv->len > 0) {
+			BUG_ON(tlv->val == NULL);
+			buf = skb_put(skb, tlv->len);
+			memcpy(buf, tlv->val, tlv->len);
+		}
+	}
+}
-- 
1.7.10

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox