Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2 00/27] HFI: minimal device driver/ip driver
From: dykmanj @ 2011-04-18  3:21 UTC (permalink / raw)
  To: netdev; +Cc: Jim Dykman

From: Jim Dykman <dykmanj@linux.vnet.ibm.com>

The HFI ("Host Fabric Interface") network interface is the internal cluster
fabric of IBM's PERCS supercomputer. The hardware design is under US export
control, so we cannot release hardware specs. There is a writeup of
publically available information about the system available here:
http://sourceforge.net/projects/hfidevicedriver/files/docs/hfi_general_desc_v2.1.txt

hfi_core contains the resource management to set up communications paths for
network traffic. Calls are provided for kernel drivers, and also for setting
up direct user-space access to HFI windows.
hfi_ip contains the kernel network driver.

The driver has been running in the lab for several months. The full patch is
around 22000 lines, so we've split out a minimal device/network driver that
can send and receive through the simplest path.  Once that much gets accepted
we'll start adding on to it.

Patches are against net-next-2.6.

Jim Dykman

Changelog:
----------
v2:
	Remove return; at the end of void funcs
	hfidd_free_adapter: p_acs = NULL unneccesssary, remove
	remove net_stats, and use netdev->stats, remove hf_get_stats
	rename network driver to hfi_ip
	hf_inet_event: NETDEV_UP needs to check event is for us, check 
		netdev->netdev_ops == ours
	change printk()s to netdev_err() and friends
	hf_net_close: remove redundant CLOSE check
	hf_change_mtu: minimum mtu should be 68
	remove NETIF_F_SG flag
	hf_init_netdev: Use ERR_PTR
	hf_init_module: %ld / formatting
			pass up return code from failed call 
	use unsigned int instead of u32 for bit fields <32 bits
	use struct ethhdr instead of hf_hwhdr 
	hf_get_sset_count: default return -EINVAL not -EOPNOTSUPP
	Remove "hfidd_callback_event: enter" message that printed on every 
		recv interrupt
	hfidd_destroy_devices: hfidd_rmdev() after hfidd_free_adapter() so 
		dev_printk doesn't oops on rmmod

^ permalink raw reply

* Re: [PATCH 24/27] HFI: hf network driver
From: Jim Dykman @ 2011-04-18  3:21 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, Piyush Chaudhary, Fu-Chung Chang, William S. Cadden,
	Wen C. Chen, Scot Sakolish, Jian Xiao, Carol L. Soto,
	Sarah J. Sheppard
In-Reply-To: <1299105612.4277.34.camel@localhost>

On 3/2/2011 5:40 PM, Ben Hutchings wrote:
> On Wed, 2011-03-02 at 16:10 -0500, dykmanj@linux.vnet.ibm.com wrote:
>> From: Jim Dykman <dykmanj@linux.vnet.ibm.com>
>>
>> It is a separate binary because it is not strictly necessary to use the HFI.
>> This patch includes module load/unload and the window open/setup with the
>> hfi device driver.
> [...]
>> diff --git a/drivers/net/hfi/ip/Kconfig b/drivers/net/hfi/ip/Kconfig
>> new file mode 100644
>> index 0000000..1a2c21d
>> --- /dev/null
>> +++ b/drivers/net/hfi/ip/Kconfig
>> @@ -0,0 +1,9 @@
>> +config HFI_IP
>> +	tristate "IP-over-HFI"
>> +	depends on NETDEVICES && INET && HFI
>> +	---help---
>> +	Support for the IP over HFI. It transports IP
>> +	packets over HFI.
>> +
>> +	To compile the driver as a module, choose M here. The module
>> +	will be called hf.
> 
> You actually call it hf_if!  But why it is not called hfi_ip?
> 
That IS a good name. Ok, we'll call it hfi_ip.

>> diff --git a/drivers/net/hfi/ip/Makefile b/drivers/net/hfi/ip/Makefile
>> new file mode 100644
>> index 0000000..59eff9b
>> --- /dev/null
>> +++ b/drivers/net/hfi/ip/Makefile
>> @@ -0,0 +1,6 @@
>> +#
>> +# Makefile for the HF IP interface for IBM eServer System p
>> +#
>> +obj-$(CONFIG_HFI_IP) += hf_if.o
>> +
>> +hf_if-objs :=	hf_if_main.o
>> diff --git a/drivers/net/hfi/ip/hf_if_main.c b/drivers/net/hfi/ip/hf_if_main.c
>> new file mode 100644
>> index 0000000..329baa1
>> --- /dev/null
>> +++ b/drivers/net/hfi/ip/hf_if_main.c
> [...]
>> +static int hf_inet_event(struct notifier_block *this,
>> +			 unsigned long event,
>> +			 void *ifa)
>> +{
>> +	struct in_device	*in_dev;
>> +	struct net_device	*netdev;
>> +
>> +	in_dev = ((struct in_ifaddr *)ifa)->ifa_dev;
>> +
>> +	netdev = in_dev->dev;
>> +
>> +	if (!net_eq(dev_net(netdev), &init_net))
>> +		return NOTIFY_DONE;
>> +
>> +	if (event == NETDEV_UP) {
>> +		struct hf_if	*net_if;
>> +
>> +		net_if = &(((struct hf_net *)(netdev_priv(netdev)))->hfif);
> 
> Try running:
> 
> # ifconfig lo down
> # ifconfig lo up
> 
> and watch the explosion.
> 
> You need to check that this is actually one of your devices.  I've done
> this by comparing netdev->netdev_ops pointer.
> 
Check added to v2.

> [...]
>> +static int hf_alloc_tx_resource(struct hf_if *net_if)
>> +{
> [...]
>> +	if (net_if->tx_fifo.addr == 0) {
>> +		printk(KERN_ERR "%s: hf_alloc_tx_resource: "
>> +			"tx_fifo fail, size=0x%x\n",
>> +			net_if->name, net_if->tx_fifo.size);
> [...]
> 
> The netdev_err() and netif_err() (etc.) macros are the standard way to
> format messages relating to a net device.
> 
Fixed in v2

> [...]
>> +static int hf_set_mac_addr(struct net_device *netdev, void *p)
>> +{
>> +	struct hf_net		*net = netdev_priv(netdev);
>> +	struct hf_if		*net_if = &(net->hfif);
>> +
>> +	/* Mac address format: 02:ClusterID:ISR:ISR:HFI_WIN:WIN */
>> +
>> +	/* Locally administered MAC address */
>> +	netdev->dev_addr[0] = 0x2; /* bit6=1, bit7=0 */
>> +
>> +	netdev->dev_addr[1] = 0x0; /* cluster id */
>> +
>> +	*(u16 *)(&(netdev->dev_addr[2])) = (u16)(net_if->isr_id);
>> +
>> +	*(u16 *)(&(netdev->dev_addr[4])) = (u16)
>> +	(((net_if->ai) << HF_MAC_HFI_SHIFT) | (net_if->client.window));
> 
> These two assignments should perhaps include an explicit cpu_to_be16().
> 
The HFIs live in a chip on the motherboard of one specific Power7 server.
Power arch is big-endian. I'm going to leave this asis.

> [...]
>> +static int hf_net_close(struct net_device *netdev)
>> +{
>> +	struct hf_net		*net = netdev_priv(netdev);
>> +	struct hf_if		*net_if = &(net->hfif);
>> +	struct hfidd_acs	*p_acs = HF_ACS(net_if);
>> +
>> +	if (net_if->state == HF_NET_CLOSE)
>> +		return 0;
> 
> I'm a bit puzzled by this.  Do you not trust the networking core to keep
> track of your device state?
> 
Removed in v2.
>> +	spin_lock(&(net_if->lock));
>> +	if (net_if->state == HF_NET_OPEN) {
>> +		hf_close_ip_window(net_if, p_acs);
>> +
>> +		hf_free_resource(net_if);
>> +	}
>> +
>> +	hf_register_hfi_ready_callback(netdev, p_acs,
>> +			HFIDD_REQ_EVENT_UNREGISTER);
>> +
>> +	net_if->state = HF_NET_CLOSE;
>> +	spin_unlock(&(net_if->lock));
>> +
>> +	return 0;
>> +}
>> +
>> +struct net_device_stats *hf_get_stats(struct net_device *netdev)
>> +{
>> +	struct hf_net	*net = netdev_priv(netdev);
>> +	struct hf_if	*net_if = &(net->hfif);
>> +
>> +	return &(net_if->net_stats);
>> +}
> 
> Please use the stats contained in struct net_device instead.
> 
Ok
>> +static int hf_change_mtu(struct net_device *netdev, int new_mtu)
>> +{
>> +	if ((new_mtu <= 0) || (new_mtu > HF_NET_MTU))
>> +		return -ERANGE;
> 
> Since this interface apparently only passes ARP and IPv4, the minimum
> MTU should be the minimum for IPv4, which is 68.  (The spec says 576 but
> the Linux IPv4 implementation uses this value.)
> 
Ok
> [...]
>> +static void hf_if_setup(struct net_device *netdev)
>> +{
>> +	netdev->type		= ARPHRD_HFI;
>> +	netdev->mtu		= HF_NET_MTU;
>> +	netdev->tx_queue_len	= 1000;
>> +	netdev->flags		= IFF_BROADCAST;
>> +	netdev->hard_header_len	= HF_HLEN;
>> +	netdev->addr_len	= HF_ALEN;
>> +	netdev->needed_headroom	= 0;
>> +
>> +	netdev->header_ops	= &hf_header_ops;
>> +	netdev->netdev_ops	= &hf_netdev_ops;
>> +
>> +	netdev->features       |= NETIF_F_SG;
> 
> You can't provide NETIF_F_SG without checksum offload.
> 
Removed in v2.
>> +	memcpy(netdev->broadcast, hfi_bcast_addr, HF_ALEN);
>> +}
>> +
>> +static struct hf_net *hf_init_netdev(int idx, int ai)
>> +{
>> +	struct net_device	*netdev;
>> +	struct hf_net		*net;
>> +	int			ii;
>> +	int			rc;
>> +	char			ifname[HF_MAX_NAME_LEN];
>> +
>> +	ii = (idx * MAX_HFIS) + ai;
>> +	sprintf(ifname, "hf%d", ii);
>> +	netdev = alloc_netdev(sizeof(struct hf_net), ifname, hf_if_setup);
>> +	if (!netdev) {
>> +		printk(KERN_ERR "hf_init_netdev: "
>> +				"alloc_netdev for hfi%d:hf%d fail\n", ai, idx);
>> +		return (struct hf_net *) -ENODEV;
> 
> Use ERR_PTR() instead of writing this sort of cast yourself.
> 
Ok.
> [...]
>> +static int __init hf_init_module(void)
>> +{
>> +	u32		idx, ai;
>> +	struct hf_net	*net;
>> +
>> +	memset(&hf_ginfo, 0, sizeof(struct hf_global_info));
>> +
>> +	for (idx = 0; idx < MAX_HF_PER_HFI; idx++) {
>> +		for (ai = 0; ai < MAX_HFIS; ai++) {
>> +			net = hf_init_netdev(idx, ai);
>> +			if (IS_ERR(net)) {
>> +				printk(KERN_ERR "hf_init_module: hf_init_netdev"
>> +						" for idx %d ai %d failed rc"
>> +						" 0x%016llx\n",
>> +						idx, ai, (u64)(PTR_ERR(net)));
> 
> Whyever are you formatting the error like this?  Use %ld and remove the
> (u64) cast.
> 
> 
Fixed in v2.
>> +
>> +				goto err_out;
>> +			}
>> +
>> +			hf_ginfo.net[idx][ai] = net;
>> +		}
>> +	}
>> +
>> +	register_inetaddr_notifier(&hf_inet_notifier);
>> +
>> +	printk(KERN_INFO "hf module loaded\n");
>> +	return 0;
>> +
>> +err_out:
>> +	for (idx = 0; idx < MAX_HF_PER_HFI; idx++) {
>> +		for (ai = 0; ai < MAX_HFIS; ai++) {
>> +			net = hf_ginfo.net[idx][ai];
>> +			if (net != NULL) {
>> +				hf_del_netdev(net);
>> +				hf_ginfo.net[idx][ai] = NULL;
>> +			}
>> +		}
>> +	}
>> +
>> +	return -EINVAL;
> 
> Use the error code you were given:
> 
> 	return PTR_ERR(net);
> 
Also fixed in v2.
>> +}
>> +
>> +static void __exit hf_cleanup_module(void)
>> +{
>> +	u32		idx, ai;
>> +	struct hf_net	*net;
>> +
>> +	unregister_inetaddr_notifier(&hf_inet_notifier);
>> +	for (idx = 0; idx < MAX_HF_PER_HFI; idx++) {
>> +		for (ai = 0; ai < MAX_HFIS; ai++) {
>> +			net = hf_ginfo.net[idx][ai];
>> +			if (net != NULL) {
>> +				hf_del_netdev(net);
>> +				hf_ginfo.net[idx][ai] = NULL;
>> +			}
>> +		}
>> +	}
>> +
>> +	return;
> 
> Redundant statement is redundant.
> 
These are all removed in v2.

>> +}
> [...]
>> --- /dev/null
>> +++ b/include/linux/hfi/hf_if.h
> [...]
>> +struct hfi_ip_extended_hdr {            /* 16B */
>> +	u32		immediate_len:7;/* In bytes */
>> +	u32		num_desc:3;     /* number of descriptors */
>> +					/* Logical Port ID: */
>> +	u32		lpid_valid:1;   /* set by sending HFI */
>> +	u32		lpid:4;         /* set by sending HFI */
>> +	/* Ethernet Service Header is 113 bits, which is 14 bytes + 1 bit */
>> +	u32		ethernet_svc_hdr_hi:1;    /* Not used by HFI */
>> +	char            ethernet_svc_hdr[12];     /* Not used by HFI */
>> +	__sum16         bcast_csum;
>> +} __packed;
> 
> It looks like you're relying on gcc to treat a set of bitfields with
> type u32 and only 16 bits assigned as having a size of 2 in a packed
> structure.  This might be true now, but I wouldn't want to rely on that
> being true for later versions.  Why not define the set of bitfields with
> type u16?
> 
They're not 16 bits long either, so I'm changing these to unsigned int

> Also the above appears to assume big-endian byte and bit order.
> 
Again, Power arch is big endian and HFI is Power-only.

> [...]
>> +#define HF_ALEN				6
>> +struct hf_hwhdr {
>> +	u8				h_dest[HF_ALEN];
>> +	u8				h_source[HF_ALEN];
>> +	__be16				h_proto;
>> +};
>> +
>> +#define HF_HLEN				sizeof(struct hf_hwhdr)
> [...]
> 
> This looks familiar!  Maybe you should just use the existing struct
> ethhdr?
> 
Ok
> Ben.
> 

Jim


^ permalink raw reply

* Re: [PATCH 24/27] HFI: hf network driver
From: Jim Dykman @ 2011-04-18  3:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, Piyush Chaudhary, Fu-Chung Chang, William S. Cadden,
	Wen C. Chen, Scot Sakolish, Jian Xiao, Carol L. Soto,
	Sarah J. Sheppard
In-Reply-To: <20110302142617.112bd9cb@nehalam>

On 3/2/2011 5:26 PM, Stephen Hemminger wrote:
> On Wed,  2 Mar 2011 16:10:10 -0500
> dykmanj@linux.vnet.ibm.com wrote:
> 
>> +struct hf_if {
>> +	u32			idx;			/* 0, 1, 2, 3 ...   */
>> +	u32			ai;			/* 0=hfi0, 1=hfi1   */
>> +	char			name[HF_MAX_NAME_LEN];
>> +	u32			isr_id;
>> +	u32			ip_addr;
>> +	u32			state;			/* CLOSE, OPEN */
>> +	spinlock_t		lock;			/* lock for state */
>> +	u32			sfifo_fv_polarity;
>> +	u32			sfifo_slots_per_blk;
>> +	u32			sfifo_packets;
>> +	void __iomem		*doorbell;		/* mapped mmio_regs */
>> +	struct hf_fifo		tx_fifo;
>> +	struct hf_fifo		rx_fifo;
>> +	struct hfi_client_info	client;
>> +	struct sk_buff		**tx_skb;		/* array to store tx
>> +							   2k skb */
>> +	void			*sfifo_finishvec;
>> +	struct net_device_stats	net_stats;
>> +};
> 
> You don't need net_stats in this structure if you use
> the standard netdev->stats structure instead.
> 
> You won't need hf_get_stats then..
> 
Ok, done in v2

Jim


^ permalink raw reply

* Re: [PATCH 02/27] HFI: Add HFI adapter control structure
From: Jim Dykman @ 2011-04-18  3:21 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Stephen Hemminger, netdev, Piyush Chaudhary, Fu-Chung Chang,
	William S. Cadden, Wen C. Chen, Scot Sakolish, Jian Xiao,
	Carol L. Soto, Sarah J. Sheppard
In-Reply-To: <1299105870.4277.37.camel@localhost>

On 3/2/2011 5:44 PM, Ben Hutchings wrote:
> On Wed, 2011-03-02 at 14:21 -0800, Stephen Hemminger wrote:
>> On Wed,  2 Mar 2011 16:09:48 -0500
>> dykmanj@linux.vnet.ibm.com wrote:
>>
>>> diff --git a/drivers/net/hfi/core/Makefile b/drivers/net/hfi/core/Makefile
>>> index 80790c6..6fe4e60 100644
>>> --- a/drivers/net/hfi/core/Makefile
>>> +++ b/drivers/net/hfi/core/Makefile
>>> @@ -1,5 +1,6 @@
>>>  #
>>>  # Makefile for the HFI device driver for IBM eServer System p
>>>  #
>>> -hfi_core-objs:=	hfidd_init.o
>>> +hfi_core-objs:=	hfidd_adpt.o \
>>> +		hfidd_init.o
>>>  obj-$(CONFIG_HFI) += hfi_core.o
>>> diff --git a/drivers/net/hfi/core/hfidd_adpt.c b/drivers/net/hfi/core/hfidd_adpt.c
>>> new file mode 100644
>>> index 0000000..d64fa38
>>> --- /dev/null
>>> +++ b/drivers/net/hfi/core/hfidd_adpt.c
> [...]
>>> +void hfidd_free_adapter(struct hfidd_acs *p_acs)
>>> +{
>>> +	kfree(p_acs);
>>> +	p_acs = NULL;
>>> +	return;
>>> +}
>>
>> If these were not in a separate file the could be marked as static.
> 
> I assume they're intending to add some more interesting code here in the
> next installment.
> 

Yes, there is more to come.

>> Doing a return; on last line of a void function is considered poor
>> style since it is unnecessary.
> 
> Assigning to a local variable just before returning is also silly.
> 
> Ben.
> 
Both removed in v2. 

Jim


^ permalink raw reply

* Re: [PATCH 0/42] Kill -Wunused-but-set warnings
From: Ben Hutchings @ 2011-04-18  2:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110417.173233.193710905.davem@davemloft.net>

On Sun, 2011-04-17 at 17:32 -0700, David Miller wrote:
> The weather is real nice, so I decided to lock myself inside and
> fix compiler warnings.
> 
> I started using gcc-4.6.x on my primary sparc64 build test machine the
> other week and it now enables -Wunused-but-set with -Wall and it does
> catch a bunch of bogus stuff.
[...]

In most of these cases there is presumably no need for the variable at
all.  In others, the variable should have been used - which I believe is
the main reason for the warning - and by removing it you will be hiding
a bug.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 6/42] decnet: Fix set-but-unused variable.
From: Ben Hutchings @ 2011-04-18  2:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110417.173250.183050174.davem@davemloft.net>

On Sun, 2011-04-17 at 17:32 -0700, David Miller wrote:
> "next" in dn_rebuild_zone() is set but not actually used,
> kill it off.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/decnet/dn_table.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/net/decnet/dn_table.c b/net/decnet/dn_table.c
> index 99d8d3a..d8ea583 100644
> --- a/net/decnet/dn_table.c
> +++ b/net/decnet/dn_table.c
> @@ -124,11 +124,10 @@ static inline void dn_rebuild_zone(struct dn_zone *dz,
>  				   int old_divisor)
>  {
>  	int i;
> -	struct dn_fib_node *f, **fp, *next;
> +	struct dn_fib_node *f, **fp;
>  
>  	for(i = 0; i < old_divisor; i++) {
>  		for(f = old_ht[i]; f; f = f->fn_next) {
> -			next = f->fn_next;
>  			for(fp = dn_chain_p(f->fn_key, dz);
>  				*fp && dn_key_leq((*fp)->fn_key, f->fn_key);
>  				fp = &(*fp)->fn_next)

This function is rebuilding a hash table after the number of buckets is
changed.  After moving each element into a new bucket, it needs to carry
on iterating over the old bucket.  Therefore the 'next' variable is
really needed and the second for-loop should use it: 'f = next', not
'f = f->fn_next'.

Currently this must just leak routes as the table grows, but I suppose
no-one really uses DECnet any more.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: net: Automatic IRQ siloing for network devices
From: Neil Horman @ 2011-04-18  1:08 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Stephen Hemminger, netdev, davem, Thomas Gleixner
In-Reply-To: <1303065539.5282.938.camel@localhost>

On Sun, Apr 17, 2011 at 07:38:59PM +0100, Ben Hutchings wrote:
> On Sun, 2011-04-17 at 13:20 -0400, Neil Horman wrote:
> > On Sat, Apr 16, 2011 at 09:17:04AM -0700, Stephen Hemminger wrote:
> [...]
> > > My gut feeling is that:
> > >   * kernel should default to a simple static sane irq policy without user
> > >     space.  This is especially true for multi-queue devices where the default
> > >     puts all IRQ's on one cpu.
> > > 
> > Thats not how it currently works, AFAICS.  The default kernel policy is
> > currently that cpu affinity for any newly requested irq is all cpus.  Any
> > restriction beyond that is the purview and doing of userspace (irqbalance or
> > manual affinity setting).
> 
> Right.  Though it may be reasonable for the kernel to use the hint as
> the initial affinity for a newly allocated IRQ (not sure quite how we
> determine that).
> 
So I understand what your saying here, but I'm having a hard time reconciling
the two notions.  Currently as it stands, affinity_hint gets set by a single
function call in the kernel (irq_set_affinity_hint), and is called by drivers
wishing to guide irqbalances behavior (currently only ixgbe does this).  The
behavior a driver is capable of guiding however are either overly simple (ixgbe
just tells irqbalance to place each irq on a separate cpu, which irqbalance
would do anyway) or overly complex (forcing policy into the kernel, which I
tried to do with this patch series, but based on the responses I've gotten here,
that seems non-desireable).

I personally like the idea of using affinity_hint to export various guidelines
to drive irqbalances behavior, but I think I'm in the minority on that.  Given
the responses here, It almost seems to me like we should do away with
affinity_hint alltogether (or perhaps just continue to ignore it), and rather
export data relevent to balancing in various other locations, and just have
irqbalance use that info directly.


> [...]
> > >   * irqbalance should not do the hacks it does to try and guess at network traffic.
> > > 
> > Well, I can certainly agree with that, but I'm not sure what that looks like.
> > 
> > I could envision something like:
> > 
> > 1) Use irqbalance to do a one time placement of interrupts, keeping a simple
> > (possibly sub-optimal) policy, perhaps something like new irqs get assigned to
> > the least loaded cpu within the numa node of the device the irq is originating
> > from.
> > 
> > 2) Add a udev event on the addition of new interrupts, to rerun irqbalance
> 
> Yes, making irqbalance more (or entirely) event-driven seems like a good
> thing.
> 
Yeah, I can do that, it shouldn't be hard.  Do we need to worry about bursty
interfaces (i.e. irqs that have high volume counts for periods of time followed
by periods of low activity).  A periodic irqbalance can reblance behavior like
that wheras a one-shot cannot.  Can we just assume that the one-shot rebalance
would give a 'good enough' placement in that situation?

> > 3) Add some exported information to identify processes that are high users of
> > network traffic, and correlate that usage to a rxq/irq that produces that
> > information (possibly some per-task proc file)
> > 
> > 4) Create/expand an additional user space daemon to monitor the highest users of
> > network traffic on various rxq/irqs (as identified in (3)) and restrict those
> > processes execution to those cpus which are on the same L2 cache as the irq
> > itself.  The cpuset cgroup could be usefull in doing this perhaps.
> 
> I just don't see that you're going to get processes associated with
> specific RX queues unless you make use of flow steering.
> 
> The 128-entry flow hash indirection table is part of Microsoft's
> requirements for RSS so most multiqueue hardware is going to let you do
> limited flow steering that way.
> 
Yes, Agreed.  I had presumed that any feature like this would assume RFS was
available and enabled. If it wasn't, we'd have to re-implement some metric
gathering code along with code to correlate sockets to rx queues and irqs.  That
would be 90% of the RFS code and this patch series :)

> > Actually, as I read back to myself, that acutally sounds kind of good to me.  It
> > keeps all the policy for this in user space, and minimizes what we have to add
> > to the kernel to make it happen (some process information in /proc and another
> > udev event).  I'd like to get some feedback before I start implementing this,
> > but I think this could be done.  What do you think?
> 
> I don't think it's a good idea to override the scheduler dynamically
> like this.
> 
Why not?  Not disagreeing here, but I'm curious as to why you think this is bad.
We already have several interfaces for doing this in user space (cgroups and
taskset come to mind).  Nominally they are used directly by sysadmins, and used
sparingly for specific configurations.  All I'm suggesting is that we create a
daemon to identify processes that would benefit from running closer to the nics
they are getting data from, and restricting them to cpus that fit that benefit.
If a sysadmin doesn't want that behavior, they can stop the daemon, or change
its configuration to avoid including processes they don't want to move/restrict.

Thanks & Regards
Neil
 
> Ben.
> 
> -- 
> Ben Hutchings, Senior Software Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
> 
> 

^ permalink raw reply

* Re: [PATCH 2/2] via-rhine: Assign random MAC address if necessary
From: David Miller @ 2011-04-18  0:57 UTC (permalink / raw)
  To: joe; +Cc: mr.nuke.me, rl, netdev, linux-kernel
In-Reply-To: <c7efaaf6a943f911162261edd7a2186086cda953.1302998994.git.joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Sat, 16 Apr 2011 17:15:26 -0700

> Roger Luethi has had several reports of Rhine NICs providing
> an invalid MAC address.  If so, assign a random MAC address so
> the hardware can still be used.
> 
> Tested as a standalone interface, as carrier for ppp, and as a
> bonding slave.
> 
> Original-patch-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] via_rhine: Use netdev_<level> and pr_<level>
From: David Miller @ 2011-04-18  0:56 UTC (permalink / raw)
  To: joe; +Cc: mr.nuke.me, rl, netdev, linux-kernel
In-Reply-To: <15b0ca3fc5ba6a16b207fb32b0c2e3c809f376d1.1302998994.git.joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Sat, 16 Apr 2011 17:15:25 -0700

> Use the more current logging styles.
> 
> Add #define DEBUG to make netdev_dbg always active.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] bridge: fix accidental creation of sysfs directory
From: David Miller @ 2011-04-18  0:53 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20110416090915.2594a78e@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Sat, 16 Apr 2011 09:09:15 -0700

> Commit bb900b27a2f49b37bc38c08e656ea13048fee13b introduced a bug in net-next
> because of a typo in notifier. Every device would have the sysfs
> bridge directory (and files). 
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied, thanks Stephen.

^ permalink raw reply

* Re: [PATCH v3] net: cxgb4{,vf}: convert to hw_features
From: David Miller @ 2011-04-18  0:51 UTC (permalink / raw)
  To: dm; +Cc: mirq-linux, netdev, leedom
In-Reply-To: <4DAA3A76.3010508@chelsio.com>

From: Dimitris Michailidis <dm@chelsio.com>
Date: Sat, 16 Apr 2011 17:55:18 -0700

> Michał Mirosław wrote:
>> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> 
> Looks good for both drivers.
> 
> Acked-by: Dimitris Michailidis <dm@chelsio.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6] bnx2x: Fix port identification problem
From: David Miller @ 2011-04-18  0:50 UTC (permalink / raw)
  To: yanivr; +Cc: netdev, eilong
In-Reply-To: <1303022132.18593.16.camel@lb-tlvb-dmitry>

From: "Yaniv Rosner" <yanivr@broadcom.com>
Date: Sun, 17 Apr 2011 09:35:32 +0300

> This patch fixes port identification on optic devices when there's no link on the port.
> 
> Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied to net-2.6, thanks.

^ permalink raw reply

* Re: [PATCH 1/1] drivers/net/usb/usbnet.c: Use FIELD_SIZEOF macro in usbnet_init() function.
From: David Miller @ 2011-04-18  0:49 UTC (permalink / raw)
  To: tfransosi; +Cc: linux-kernel, dbrownell, gregkh, netdev, linux-usb
In-Reply-To: <3f3e63ecef46abc5d23053c051b3fef52104cbe7.1303080236.git.tfransosi@gmail.com>

From: Thiago Farina <tfransosi@gmail.com>
Date: Sun, 17 Apr 2011 19:46:04 -0300

> Signed-off-by: Thiago Farina <tfransosi@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] r8169: add Realtek as maintainer.
From: David Miller @ 2011-04-18  0:46 UTC (permalink / raw)
  To: romieu; +Cc: netdev, hayeswang, nic_swsd
In-Reply-To: <20110417141512.GA18236@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Sun, 17 Apr 2011 16:15:12 +0200

> Per Hayes's request.
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

Applied to net-2.6, thanks.

^ permalink raw reply

* Re: [PATCH] r8169: Be verbose when unable to load fw patch
From: David Miller @ 2011-04-18  0:45 UTC (permalink / raw)
  To: romieu; +Cc: bp, linux-kernel, borislav.petkov, netdev
In-Reply-To: <20110417135936.GA18165@electric-eye.fr.zoreil.com>

From: François Romieu <romieu@metropolis.fr.zoreil.com>
Date: Sun, 17 Apr 2011 15:59:36 +0200

> On Sat, Apr 16, 2011 at 01:03:48PM +0200, François Romieu wrote:
>> [...]
>> > Ok.  So will you submit this yourself later or should I just
>> > apply this myself directly to net-2.6?
>> 
>> Please apply it directly.
> 
> Please don't.

Ok.

^ permalink raw reply

* Re: [PATCH] net: greth: convert to hw_features
From: David Miller @ 2011-04-18  0:44 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, kristoffer
In-Reply-To: <20110417101547.D59E313A6F@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:47 +0200 (CEST)

> Note: Driver modifies its struct net_device_ops. This will break if used for
> multiple devices that are not all the same (if that HW config is possible).
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

I guess for how this chip is used, this is OK for now.

Applied.

^ permalink raw reply

* Re: [PATCH] net: ibm_newemac: convert to hw_features
From: David Miller @ 2011-04-18  0:44 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev
In-Reply-To: <20110417101547.C207E13A6C@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:47 +0200 (CEST)

> Side effect: allow toggling of TX offloads.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: typhoon: convert to hw_features
From: David Miller @ 2011-04-18  0:44 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, dave
In-Reply-To: <20110417101547.7E01A13A6B@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:47 +0200 (CEST)

> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: niu: convert to hw_features
From: David Miller @ 2011-04-18  0:44 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev
In-Reply-To: <20110417101547.A22AF13A6D@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:47 +0200 (CEST)

> Side effect: allow toggling of TX offloads.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: chelsio: convert to hw_features
From: David Miller @ 2011-04-18  0:42 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev
In-Reply-To: <20110417101546.EE73713A67@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:46 +0200 (CEST)

> @@ -1369,6 +1369,7 @@ static void sge_rx(struct sge *sge, struct freelQ *fl, unsigned int len)
>  	const struct cpl_rx_pkt *p;
>  	struct adapter *adapter = sge->adapter;
>  	struct sge_port_stats *st;
> +	struct net_device *dev = adapter->port[p->iff].dev;
>  
>  	skb = get_packet(adapter->pdev, fl, len - sge->rx_pkt_pad);
>  	if (unlikely(!skb)) {

You're now using 'p' before it gets initialzed.

^ permalink raw reply

* Re: [PATCH] net: benet: convert to hw_features - fixup
From: David Miller @ 2011-04-18  0:41 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, sathya.perla, subbu.seetharaman, ajit.khaparde
In-Reply-To: <20110417101547.5F75513A6A@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:47 +0200 (CEST)

> Remove be_set_flags() as it's already covered by hw_features.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ehea: convert to hw_features
From: David Miller @ 2011-04-18  0:41 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, leitao
In-Reply-To: <20110417101547.11CF913A69@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:47 +0200 (CEST)

> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: mv643xx: convert to hw_features
From: David Miller @ 2011-04-18  0:41 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, buytenh
In-Reply-To: <20110417101546.AD95813A68@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:46 +0200 (CEST)

> Side effect: don't reenable RXCSUM on every ifdown/ifup.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: cxgb3: convert to hw_features
From: David Miller @ 2011-04-18  0:41 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, divy
In-Reply-To: <20110417101546.7EDF613A66@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:46 +0200 (CEST)

> This removes some of the remnants of LRO -> GRO conversion.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: tehuti: convert to hw_features
From: David Miller @ 2011-04-18  0:41 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, baum, andy
In-Reply-To: <20110417101546.8F6DD13A65@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Sun, 17 Apr 2011 12:15:46 +0200 (CEST)

> As a side effect, make TX offloads changeable.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox