Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] bonding: rejoin multicast groups on VLANs
From: Andy Gospodarek @ 2010-09-29 19:54 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev
In-Reply-To: <20100929193539.GC2864@redhat.com>

On Wed, Sep 29, 2010 at 04:35:39PM -0300, Flavio Leitner wrote:
> On Wed, Sep 29, 2010 at 02:44:13PM -0400, Andy Gospodarek wrote:
> > On Wed, Sep 29, 2010 at 04:12:24AM -0300, Flavio Leitner wrote:
> > > It fixes bonding to rejoin multicast groups added
> > > to VLAN devices on top of bonding when a failover
> > > happens.
> > > 
> > > The first packet may be discarded, so the timer
> > > assure that at least 3 Reports are sent.
> > > 
> > 
> > Good find, Flavio.  Clearly the fact that multicast membership is broken
> > needs to be fixed, but I would rather not see timers used at all.  We
> > worked hard in the past to eliminate timers for several reasons, so I
> > would rather see a workqueue used.
> 
> I noticed that the code is using workqueues now, just thought a
> simple thing which may run couple times would fit perfectly with
> a simple timer.
> 

Timers runs in softirq context, so I'd rather not add code that takes
locks and runs in softirq context.

> 
> > I also don't like retransmitting the membership report 3 times when it
> > may not be needed.  Though many switches can handle it, the cost of
> > receiving and processing what might be a large list of multicast
> > addresses every 200ms for 600ms doesn't seem ideal.  It also feels like
> > a hack. :)
> 
> Definitely a parameter is much better, but I wasn't sure about
> the patch approach so I was expecting a review like this and then
> do the refinements needed. Better to post early, right? :)
> 
> I see your point to change the default to one membership report,
> but we can't assure during a failover if everything has been
> received. Also, it isn't supposed to keep failing flooding the
> network, so I would rather have couple membership reports being
> send than watch an important multicast application failing.
> 
> Perhaps 3 is too much, but one sounds too few to me.
> 
> what you think?
> 

Adding a tunable parameter allows the administrator to decide how many
is enough.  I would rather keep the default at one and add the tunable
parameter (which needs to be added to bond_sysfs.c to be effective).

I have not heard loud complaints about only sending one since the code
to send retransmits of membership reports was added a few years ago, so
I'm inclined to think it is working well for most users (or no one is
using bonding).

Maybe it would be best to break this into 2 patches.  One that simply
fixes the failover code so it works with VLANs (that could be done
easily today) and another patch that can add the code to send multiple
retransmits.  Would you be willing to do that?

> Flavio
> 
> > 
> > If retransmission of the membership reports is a requirement for some
> > users, I would rather see it as a configuration option.
> > 
> > Maybe something like this?
> > 
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index 3b16f62..b7b4a74 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -109,6 +109,7 @@ static char *arp_validate;
> >  static char *fail_over_mac;
> >  static int all_slaves_active = 0;
> >  static struct bond_params bonding_defaults;
> > +static int resend_igmp = BOND_DEFAULT_RESEND_IGMP;
> >  
> >  module_param(max_bonds, int, 0);
> >  MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
> > @@ -163,6 +164,8 @@ module_param(all_slaves_active, int, 0);
> >  MODULE_PARM_DESC(all_slaves_active, "Keep all frames received on an interface"
> >  				     "by setting active flag for all slaves.  "
> >  				     "0 for never (default), 1 for always.");
> > +module_param(resend_igmp, int, 0);
> > +MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on link failure");
> >  
> >  /*----------------------------- Global variables ----------------------------*/
> >  
> > @@ -865,18 +868,14 @@ static void bond_mc_del(struct bonding *bond, void *addr)
> >  }
> >  
> >  
> > -/*
> > - * Retrieve the list of registered multicast addresses for the bonding
> > - * device and retransmit an IGMP JOIN request to the current active
> > - * slave.
> > - */
> > -static void bond_resend_igmp_join_requests(struct bonding *bond)
> > +static void __bond_resend_igmp_join_requests(struct net_device *dev)
> >  {
> >  	struct in_device *in_dev;
> >  	struct ip_mc_list *im;
> >  
> >  	rcu_read_lock();
> > -	in_dev = __in_dev_get_rcu(bond->dev);
> > +
> > +	in_dev = __in_dev_get_rcu(dev);
> >  	if (in_dev) {
> >  		for (im = in_dev->mc_list; im; im = im->next)
> >  			ip_mc_rejoin_group(im);
> > @@ -885,6 +884,48 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
> >  	rcu_read_unlock();
> >  }
> >  
> > +
> > +/*
> > + * Retrieve the list of registered multicast addresses for the bonding
> > + * device and retransmit an IGMP JOIN request to the current active
> > + * slave.
> > + */
> > +static void bond_resend_igmp_join_requests(struct bonding *bond)
> > +{
> > +	struct net_device *vlan_dev;
> > +	struct vlan_entry *vlan;
> > +
> > +	read_lock(&bond->lock);
> > +	if (bond->kill_timers)
> > +		goto out;
> > +
> > +	/* rejoin all groups on bond device */
> > +	__bond_resend_igmp_join_requests(bond->dev);
> > +
> > +	/* rejoin all groups on vlan devices */
> > +	if (bond->vlgrp) {
> > +		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
> > +			vlan_dev = vlan_group_get_device(bond->vlgrp,
> > +							 vlan->vlan_id);
> > +			if (vlan_dev)
> > +				__bond_resend_igmp_join_requests(vlan_dev);
> > +		}
> > +	}
> > +
> > +	if (--bond->igmp_retrans > 0)
> > +		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
> > +
> > +out:
> > +	read_unlock(&bond->lock);
> > +}
> > +
> > +void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
> > +{
> > +	struct bonding *bond = container_of(work, struct bonding,
> > +					    mcast_work.work);
> > +	bond_resend_igmp_join_requests(bond);
> > +}
> > +
> >  /*
> >   * flush all members of flush->mc_list from device dev->mc_list
> >   */
> > @@ -944,7 +985,10 @@ static void bond_mc_swap(struct bonding *bond, struct slave *new_active,
> >  
> >  		netdev_for_each_mc_addr(ha, bond->dev)
> >  			dev_mc_add(new_active->dev, ha->addr);
> > -		bond_resend_igmp_join_requests(bond);
> > +
> > +		/* rejoin multicast groups */
> > +		bond->igmp_retrans = bond->params.resend_igmp;
> > +		queue_delayed_work(bond->wq, &bond->mcast_work, 0);
> >  	}
> >  }
> >  
> > @@ -3744,6 +3788,9 @@ static int bond_open(struct net_device *bond_dev)
> >  
> >  	bond->kill_timers = 0;
> >  
> > +	/* multicast retrans */
> > +	INIT_DELAYED_WORK(&bond->mcast_work, bond_resend_igmp_join_requests_delayed);
> > +
> >  	if (bond_is_lb(bond)) {
> >  		/* bond_alb_initialize must be called before the timer
> >  		 * is started.
> > @@ -3828,6 +3875,8 @@ static int bond_close(struct net_device *bond_dev)
> >  		break;
> >  	}
> >  
> > +	if (delayed_work_pending(&bond->mcast_work))
> > +		cancel_delayed_work(&bond->ad_work);
> >  
> >  	if (bond_is_lb(bond)) {
> >  		/* Must be called only after all
> > @@ -4699,6 +4748,9 @@ static void bond_work_cancel_all(struct bonding *bond)
> >  	if (bond->params.mode == BOND_MODE_8023AD &&
> >  	    delayed_work_pending(&bond->ad_work))
> >  		cancel_delayed_work(&bond->ad_work);
> > +
> > +	if (delayed_work_pending(&bond->mcast_work))
> > +		cancel_delayed_work(&bond->ad_work);
> >  }
> >  
> >  /*
> > @@ -4891,6 +4943,12 @@ static int bond_check_params(struct bond_params *params)
> >  		all_slaves_active = 0;
> >  	}
> >  
> > +	if (resend_igmp < 0 || resend_igmp > 255) {
> > +		pr_warning("Warning: resend_igmp (%d) should be between "
> > +			   "0 and 255, resetting to %d\n",
> > +			   resend_igmp, BOND_DEFAULT_RESEND_IGMP);
> > +		resend_igmp = BOND_DEFAULT_RESEND_IGMP;
> > +	}
> >  	/* reset values for TLB/ALB */
> >  	if ((bond_mode == BOND_MODE_TLB) ||
> >  	    (bond_mode == BOND_MODE_ALB)) {
> > @@ -5063,6 +5121,7 @@ static int bond_check_params(struct bond_params *params)
> >  	params->fail_over_mac = fail_over_mac_value;
> >  	params->tx_queues = tx_queues;
> >  	params->all_slaves_active = all_slaves_active;
> > +	params->resend_igmp = resend_igmp;
> >  
> >  	if (primary) {
> >  		strncpy(params->primary, primary, IFNAMSIZ);
> > diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
> > index c6fdd85..c49bdb0 100644
> > --- a/drivers/net/bonding/bonding.h
> > +++ b/drivers/net/bonding/bonding.h
> > @@ -136,6 +136,7 @@ struct bond_params {
> >  	__be32 arp_targets[BOND_MAX_ARP_TARGETS];
> >  	int tx_queues;
> >  	int all_slaves_active;
> > +	int resend_igmp;
> >  };
> >  
> >  struct bond_parm_tbl {
> > @@ -198,6 +199,7 @@ struct bonding {
> >  	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
> >  	rwlock_t lock;
> >  	rwlock_t curr_slave_lock;
> > +	s8       igmp_retrans;
> >  	s8       kill_timers;
> >  	s8	 send_grat_arp;
> >  	s8	 send_unsol_na;
> > @@ -223,6 +225,7 @@ struct bonding {
> >  	struct   delayed_work arp_work;
> >  	struct   delayed_work alb_work;
> >  	struct   delayed_work ad_work;
> > +	struct   delayed_work mcast_work;
> >  #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> >  	struct   in6_addr master_ipv6;
> >  #endif
> > diff --git a/include/linux/if_bonding.h b/include/linux/if_bonding.h
> > index 2c79943..d2f78b7 100644
> > --- a/include/linux/if_bonding.h
> > +++ b/include/linux/if_bonding.h
> > @@ -84,6 +84,9 @@
> >  #define BOND_DEFAULT_MAX_BONDS  1   /* Default maximum number of devices to support */
> >  
> >  #define BOND_DEFAULT_TX_QUEUES 16   /* Default number of tx queues per device */
> > +
> > +#define BOND_DEFAULT_RESEND_IGMP	1 /* Default number of IGMP membership reports
> > +					     to resend on link failure. */
> >  /* hashing types */
> >  #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
> >  #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
> 
> -- 
> Flavio
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] bonding: rejoin multicast groups on VLANs
From: Flavio Leitner @ 2010-09-29 19:35 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev
In-Reply-To: <20100929184413.GY7497@gospo.rdu.redhat.com>

On Wed, Sep 29, 2010 at 02:44:13PM -0400, Andy Gospodarek wrote:
> On Wed, Sep 29, 2010 at 04:12:24AM -0300, Flavio Leitner wrote:
> > It fixes bonding to rejoin multicast groups added
> > to VLAN devices on top of bonding when a failover
> > happens.
> > 
> > The first packet may be discarded, so the timer
> > assure that at least 3 Reports are sent.
> > 
> 
> Good find, Flavio.  Clearly the fact that multicast membership is broken
> needs to be fixed, but I would rather not see timers used at all.  We
> worked hard in the past to eliminate timers for several reasons, so I
> would rather see a workqueue used.

I noticed that the code is using workqueues now, just thought a
simple thing which may run couple times would fit perfectly with
a simple timer.


> I also don't like retransmitting the membership report 3 times when it
> may not be needed.  Though many switches can handle it, the cost of
> receiving and processing what might be a large list of multicast
> addresses every 200ms for 600ms doesn't seem ideal.  It also feels like
> a hack. :)

Definitely a parameter is much better, but I wasn't sure about
the patch approach so I was expecting a review like this and then
do the refinements needed. Better to post early, right? :)

I see your point to change the default to one membership report,
but we can't assure during a failover if everything has been
received. Also, it isn't supposed to keep failing flooding the
network, so I would rather have couple membership reports being
send than watch an important multicast application failing.

Perhaps 3 is too much, but one sounds too few to me.

what you think?

Flavio

> 
> If retransmission of the membership reports is a requirement for some
> users, I would rather see it as a configuration option.
> 
> Maybe something like this?
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 3b16f62..b7b4a74 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -109,6 +109,7 @@ static char *arp_validate;
>  static char *fail_over_mac;
>  static int all_slaves_active = 0;
>  static struct bond_params bonding_defaults;
> +static int resend_igmp = BOND_DEFAULT_RESEND_IGMP;
>  
>  module_param(max_bonds, int, 0);
>  MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
> @@ -163,6 +164,8 @@ module_param(all_slaves_active, int, 0);
>  MODULE_PARM_DESC(all_slaves_active, "Keep all frames received on an interface"
>  				     "by setting active flag for all slaves.  "
>  				     "0 for never (default), 1 for always.");
> +module_param(resend_igmp, int, 0);
> +MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on link failure");
>  
>  /*----------------------------- Global variables ----------------------------*/
>  
> @@ -865,18 +868,14 @@ static void bond_mc_del(struct bonding *bond, void *addr)
>  }
>  
>  
> -/*
> - * Retrieve the list of registered multicast addresses for the bonding
> - * device and retransmit an IGMP JOIN request to the current active
> - * slave.
> - */
> -static void bond_resend_igmp_join_requests(struct bonding *bond)
> +static void __bond_resend_igmp_join_requests(struct net_device *dev)
>  {
>  	struct in_device *in_dev;
>  	struct ip_mc_list *im;
>  
>  	rcu_read_lock();
> -	in_dev = __in_dev_get_rcu(bond->dev);
> +
> +	in_dev = __in_dev_get_rcu(dev);
>  	if (in_dev) {
>  		for (im = in_dev->mc_list; im; im = im->next)
>  			ip_mc_rejoin_group(im);
> @@ -885,6 +884,48 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
>  	rcu_read_unlock();
>  }
>  
> +
> +/*
> + * Retrieve the list of registered multicast addresses for the bonding
> + * device and retransmit an IGMP JOIN request to the current active
> + * slave.
> + */
> +static void bond_resend_igmp_join_requests(struct bonding *bond)
> +{
> +	struct net_device *vlan_dev;
> +	struct vlan_entry *vlan;
> +
> +	read_lock(&bond->lock);
> +	if (bond->kill_timers)
> +		goto out;
> +
> +	/* rejoin all groups on bond device */
> +	__bond_resend_igmp_join_requests(bond->dev);
> +
> +	/* rejoin all groups on vlan devices */
> +	if (bond->vlgrp) {
> +		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
> +			vlan_dev = vlan_group_get_device(bond->vlgrp,
> +							 vlan->vlan_id);
> +			if (vlan_dev)
> +				__bond_resend_igmp_join_requests(vlan_dev);
> +		}
> +	}
> +
> +	if (--bond->igmp_retrans > 0)
> +		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
> +
> +out:
> +	read_unlock(&bond->lock);
> +}
> +
> +void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
> +{
> +	struct bonding *bond = container_of(work, struct bonding,
> +					    mcast_work.work);
> +	bond_resend_igmp_join_requests(bond);
> +}
> +
>  /*
>   * flush all members of flush->mc_list from device dev->mc_list
>   */
> @@ -944,7 +985,10 @@ static void bond_mc_swap(struct bonding *bond, struct slave *new_active,
>  
>  		netdev_for_each_mc_addr(ha, bond->dev)
>  			dev_mc_add(new_active->dev, ha->addr);
> -		bond_resend_igmp_join_requests(bond);
> +
> +		/* rejoin multicast groups */
> +		bond->igmp_retrans = bond->params.resend_igmp;
> +		queue_delayed_work(bond->wq, &bond->mcast_work, 0);
>  	}
>  }
>  
> @@ -3744,6 +3788,9 @@ static int bond_open(struct net_device *bond_dev)
>  
>  	bond->kill_timers = 0;
>  
> +	/* multicast retrans */
> +	INIT_DELAYED_WORK(&bond->mcast_work, bond_resend_igmp_join_requests_delayed);
> +
>  	if (bond_is_lb(bond)) {
>  		/* bond_alb_initialize must be called before the timer
>  		 * is started.
> @@ -3828,6 +3875,8 @@ static int bond_close(struct net_device *bond_dev)
>  		break;
>  	}
>  
> +	if (delayed_work_pending(&bond->mcast_work))
> +		cancel_delayed_work(&bond->ad_work);
>  
>  	if (bond_is_lb(bond)) {
>  		/* Must be called only after all
> @@ -4699,6 +4748,9 @@ static void bond_work_cancel_all(struct bonding *bond)
>  	if (bond->params.mode == BOND_MODE_8023AD &&
>  	    delayed_work_pending(&bond->ad_work))
>  		cancel_delayed_work(&bond->ad_work);
> +
> +	if (delayed_work_pending(&bond->mcast_work))
> +		cancel_delayed_work(&bond->ad_work);
>  }
>  
>  /*
> @@ -4891,6 +4943,12 @@ static int bond_check_params(struct bond_params *params)
>  		all_slaves_active = 0;
>  	}
>  
> +	if (resend_igmp < 0 || resend_igmp > 255) {
> +		pr_warning("Warning: resend_igmp (%d) should be between "
> +			   "0 and 255, resetting to %d\n",
> +			   resend_igmp, BOND_DEFAULT_RESEND_IGMP);
> +		resend_igmp = BOND_DEFAULT_RESEND_IGMP;
> +	}
>  	/* reset values for TLB/ALB */
>  	if ((bond_mode == BOND_MODE_TLB) ||
>  	    (bond_mode == BOND_MODE_ALB)) {
> @@ -5063,6 +5121,7 @@ static int bond_check_params(struct bond_params *params)
>  	params->fail_over_mac = fail_over_mac_value;
>  	params->tx_queues = tx_queues;
>  	params->all_slaves_active = all_slaves_active;
> +	params->resend_igmp = resend_igmp;
>  
>  	if (primary) {
>  		strncpy(params->primary, primary, IFNAMSIZ);
> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
> index c6fdd85..c49bdb0 100644
> --- a/drivers/net/bonding/bonding.h
> +++ b/drivers/net/bonding/bonding.h
> @@ -136,6 +136,7 @@ struct bond_params {
>  	__be32 arp_targets[BOND_MAX_ARP_TARGETS];
>  	int tx_queues;
>  	int all_slaves_active;
> +	int resend_igmp;
>  };
>  
>  struct bond_parm_tbl {
> @@ -198,6 +199,7 @@ struct bonding {
>  	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
>  	rwlock_t lock;
>  	rwlock_t curr_slave_lock;
> +	s8       igmp_retrans;
>  	s8       kill_timers;
>  	s8	 send_grat_arp;
>  	s8	 send_unsol_na;
> @@ -223,6 +225,7 @@ struct bonding {
>  	struct   delayed_work arp_work;
>  	struct   delayed_work alb_work;
>  	struct   delayed_work ad_work;
> +	struct   delayed_work mcast_work;
>  #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
>  	struct   in6_addr master_ipv6;
>  #endif
> diff --git a/include/linux/if_bonding.h b/include/linux/if_bonding.h
> index 2c79943..d2f78b7 100644
> --- a/include/linux/if_bonding.h
> +++ b/include/linux/if_bonding.h
> @@ -84,6 +84,9 @@
>  #define BOND_DEFAULT_MAX_BONDS  1   /* Default maximum number of devices to support */
>  
>  #define BOND_DEFAULT_TX_QUEUES 16   /* Default number of tx queues per device */
> +
> +#define BOND_DEFAULT_RESEND_IGMP	1 /* Default number of IGMP membership reports
> +					     to resend on link failure. */
>  /* hashing types */
>  #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
>  #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */

-- 
Flavio

^ permalink raw reply

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
From: Jesse Gross @ 2010-09-29 19:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1285785794.2813.292.camel@edumazet-laptop>

On Wed, Sep 29, 2010 at 11:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :
>
>> The tx lock has another use here: to break local loops.  With this
>> change, a misconfigured tunnel can bring down the machine with a stack
>> overflow.  There are clearly other ways to fix this that don't require
>> a lock that restricts parallelism, such as a loop counter, but that's
>> the way it is now.
>
> Thats a very good point !
>
> We could use a loop counter in the skb, but this use a bit of ram,
> or percpu counters in tunnel drivers, to avoid a given level of
> recursion.

I agree, a percpu loop counter is the way to go.  I implemented
something nearly identical to deal with this problem in Open vSwitch.

There are actually a number of optimizations in the Open vSwitch
tunneling stack that you may be interested in taking a look at as well
(for example, it has had this sort of lockless transmit for a while):

http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/tunnel.c;hb=HEAD

The plan is to upstream all of the kernel code (or at least offer it),
just trying to get the userspace interfaces settled down first.

^ permalink raw reply

* [PATCH net-next-2.6] net: add a recursion limit in xmit path
From: Eric Dumazet @ 2010-09-29 19:04 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev
In-Reply-To: <1285785794.2813.292.camel@edumazet-laptop>

Le mercredi 29 septembre 2010 à 20:43 +0200, Eric Dumazet a écrit :
> Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :
> 
> > The tx lock has another use here: to break local loops.  With this
> > change, a misconfigured tunnel can bring down the machine with a stack
> > overflow.  There are clearly other ways to fix this that don't require
> > a lock that restricts parallelism, such as a loop counter, but that's
> > the way it is now.
> 
> Thats a very good point !
> 
> We could use a loop counter in the skb, but this use a bit of ram,
> or percpu counters in tunnel drivers, to avoid a given level of
> recursion.
> 
> /* this should be shared by all tunnels */
> DEFINE_PER_CPU(tunnel_xmit_count);
> 
> 
> 
> tunnel_xmit()
> {
> if (__this_cpu_read(tunnel_xmit_count) >= LIMIT)
> 	goto tx_error;
> __this_cpu_inc(tunnel_xmit_count);
> 
> ....
> 
> __IPTUNNEL_XMIT(tstats, &dev->stats);
> 
> __this_cpu_dec(tunnel_xmit_count),
> return NETDEV_TX_OK;
> 
> }
> 
> 

This can be handled generically in net/core/dev.c

David, if you accept the lockless patches, please apply this one
before ;)

Thanks !

[PATCH net-next-2.6] net: add a recursion limit in xmit path

As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.

Add a percpu variable, and limit to three the number of stacked xmits.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 48ad47f..50dacca 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2177,6 +2177,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	return rc;
 }
 
+static DEFINE_PER_CPU(int, xmit_recursion);
+#define RECURSION_LIMIT 3
+
 /**
  *	dev_queue_xmit - transmit a buffer
  *	@skb: buffer to transmit
@@ -2242,10 +2245,15 @@ int dev_queue_xmit(struct sk_buff *skb)
 
 		if (txq->xmit_lock_owner != cpu) {
 
+			if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT)
+				goto recursion_alert;
+
 			HARD_TX_LOCK(dev, txq, cpu);
 
 			if (!netif_tx_queue_stopped(txq)) {
+				__this_cpu_inc(xmit_recursion);
 				rc = dev_hard_start_xmit(skb, dev, txq);
+				__this_cpu_dec(xmit_recursion);
 				if (dev_xmit_complete(rc)) {
 					HARD_TX_UNLOCK(dev, txq);
 					goto out;
@@ -2257,7 +2265,9 @@ int dev_queue_xmit(struct sk_buff *skb)
 				       "queue packet!\n", dev->name);
 		} else {
 			/* Recursion is detected! It is possible,
-			 * unfortunately */
+			 * unfortunately
+			 */
+recursion_alert:
 			if (net_ratelimit())
 				printk(KERN_CRIT "Dead loop on virtual device "
 				       "%s, fix it urgently!\n", dev->name);



^ permalink raw reply related

* Re: [PATCH] bonding: rejoin multicast groups on VLANs
From: Andy Gospodarek @ 2010-09-29 18:44 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev
In-Reply-To: <1285744344-1231-1-git-send-email-fleitner@redhat.com>

On Wed, Sep 29, 2010 at 04:12:24AM -0300, Flavio Leitner wrote:
> It fixes bonding to rejoin multicast groups added
> to VLAN devices on top of bonding when a failover
> happens.
> 
> The first packet may be discarded, so the timer
> assure that at least 3 Reports are sent.
> 

Good find, Flavio.  Clearly the fact that multicast membership is broken
needs to be fixed, but I would rather not see timers used at all.  We
worked hard in the past to eliminate timers for several reasons, so I
would rather see a workqueue used.

I also don't like retransmitting the membership report 3 times when it
may not be needed.  Though many switches can handle it, the cost of
receiving and processing what might be a large list of multicast
addresses every 200ms for 600ms doesn't seem ideal.  It also feels like
a hack. :)

If retransmission of the membership reports is a requirement for some
users, I would rather see it as a configuration option.

Maybe something like this?

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3b16f62..b7b4a74 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -109,6 +109,7 @@ static char *arp_validate;
 static char *fail_over_mac;
 static int all_slaves_active = 0;
 static struct bond_params bonding_defaults;
+static int resend_igmp = BOND_DEFAULT_RESEND_IGMP;
 
 module_param(max_bonds, int, 0);
 MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
@@ -163,6 +164,8 @@ module_param(all_slaves_active, int, 0);
 MODULE_PARM_DESC(all_slaves_active, "Keep all frames received on an interface"
 				     "by setting active flag for all slaves.  "
 				     "0 for never (default), 1 for always.");
+module_param(resend_igmp, int, 0);
+MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on link failure");
 
 /*----------------------------- Global variables ----------------------------*/
 
@@ -865,18 +868,14 @@ static void bond_mc_del(struct bonding *bond, void *addr)
 }
 
 
-/*
- * Retrieve the list of registered multicast addresses for the bonding
- * device and retransmit an IGMP JOIN request to the current active
- * slave.
- */
-static void bond_resend_igmp_join_requests(struct bonding *bond)
+static void __bond_resend_igmp_join_requests(struct net_device *dev)
 {
 	struct in_device *in_dev;
 	struct ip_mc_list *im;
 
 	rcu_read_lock();
-	in_dev = __in_dev_get_rcu(bond->dev);
+
+	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev) {
 		for (im = in_dev->mc_list; im; im = im->next)
 			ip_mc_rejoin_group(im);
@@ -885,6 +884,48 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
 	rcu_read_unlock();
 }
 
+
+/*
+ * Retrieve the list of registered multicast addresses for the bonding
+ * device and retransmit an IGMP JOIN request to the current active
+ * slave.
+ */
+static void bond_resend_igmp_join_requests(struct bonding *bond)
+{
+	struct net_device *vlan_dev;
+	struct vlan_entry *vlan;
+
+	read_lock(&bond->lock);
+	if (bond->kill_timers)
+		goto out;
+
+	/* rejoin all groups on bond device */
+	__bond_resend_igmp_join_requests(bond->dev);
+
+	/* rejoin all groups on vlan devices */
+	if (bond->vlgrp) {
+		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
+			vlan_dev = vlan_group_get_device(bond->vlgrp,
+							 vlan->vlan_id);
+			if (vlan_dev)
+				__bond_resend_igmp_join_requests(vlan_dev);
+		}
+	}
+
+	if (--bond->igmp_retrans > 0)
+		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
+
+out:
+	read_unlock(&bond->lock);
+}
+
+void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
+{
+	struct bonding *bond = container_of(work, struct bonding,
+					    mcast_work.work);
+	bond_resend_igmp_join_requests(bond);
+}
+
 /*
  * flush all members of flush->mc_list from device dev->mc_list
  */
@@ -944,7 +985,10 @@ static void bond_mc_swap(struct bonding *bond, struct slave *new_active,
 
 		netdev_for_each_mc_addr(ha, bond->dev)
 			dev_mc_add(new_active->dev, ha->addr);
-		bond_resend_igmp_join_requests(bond);
+
+		/* rejoin multicast groups */
+		bond->igmp_retrans = bond->params.resend_igmp;
+		queue_delayed_work(bond->wq, &bond->mcast_work, 0);
 	}
 }
 
@@ -3744,6 +3788,9 @@ static int bond_open(struct net_device *bond_dev)
 
 	bond->kill_timers = 0;
 
+	/* multicast retrans */
+	INIT_DELAYED_WORK(&bond->mcast_work, bond_resend_igmp_join_requests_delayed);
+
 	if (bond_is_lb(bond)) {
 		/* bond_alb_initialize must be called before the timer
 		 * is started.
@@ -3828,6 +3875,8 @@ static int bond_close(struct net_device *bond_dev)
 		break;
 	}
 
+	if (delayed_work_pending(&bond->mcast_work))
+		cancel_delayed_work(&bond->ad_work);
 
 	if (bond_is_lb(bond)) {
 		/* Must be called only after all
@@ -4699,6 +4748,9 @@ static void bond_work_cancel_all(struct bonding *bond)
 	if (bond->params.mode == BOND_MODE_8023AD &&
 	    delayed_work_pending(&bond->ad_work))
 		cancel_delayed_work(&bond->ad_work);
+
+	if (delayed_work_pending(&bond->mcast_work))
+		cancel_delayed_work(&bond->ad_work);
 }
 
 /*
@@ -4891,6 +4943,12 @@ static int bond_check_params(struct bond_params *params)
 		all_slaves_active = 0;
 	}
 
+	if (resend_igmp < 0 || resend_igmp > 255) {
+		pr_warning("Warning: resend_igmp (%d) should be between "
+			   "0 and 255, resetting to %d\n",
+			   resend_igmp, BOND_DEFAULT_RESEND_IGMP);
+		resend_igmp = BOND_DEFAULT_RESEND_IGMP;
+	}
 	/* reset values for TLB/ALB */
 	if ((bond_mode == BOND_MODE_TLB) ||
 	    (bond_mode == BOND_MODE_ALB)) {
@@ -5063,6 +5121,7 @@ static int bond_check_params(struct bond_params *params)
 	params->fail_over_mac = fail_over_mac_value;
 	params->tx_queues = tx_queues;
 	params->all_slaves_active = all_slaves_active;
+	params->resend_igmp = resend_igmp;
 
 	if (primary) {
 		strncpy(params->primary, primary, IFNAMSIZ);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index c6fdd85..c49bdb0 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -136,6 +136,7 @@ struct bond_params {
 	__be32 arp_targets[BOND_MAX_ARP_TARGETS];
 	int tx_queues;
 	int all_slaves_active;
+	int resend_igmp;
 };
 
 struct bond_parm_tbl {
@@ -198,6 +199,7 @@ struct bonding {
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;
+	s8       igmp_retrans;
 	s8       kill_timers;
 	s8	 send_grat_arp;
 	s8	 send_unsol_na;
@@ -223,6 +225,7 @@ struct bonding {
 	struct   delayed_work arp_work;
 	struct   delayed_work alb_work;
 	struct   delayed_work ad_work;
+	struct   delayed_work mcast_work;
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 	struct   in6_addr master_ipv6;
 #endif
diff --git a/include/linux/if_bonding.h b/include/linux/if_bonding.h
index 2c79943..d2f78b7 100644
--- a/include/linux/if_bonding.h
+++ b/include/linux/if_bonding.h
@@ -84,6 +84,9 @@
 #define BOND_DEFAULT_MAX_BONDS  1   /* Default maximum number of devices to support */
 
 #define BOND_DEFAULT_TX_QUEUES 16   /* Default number of tx queues per device */
+
+#define BOND_DEFAULT_RESEND_IGMP	1 /* Default number of IGMP membership reports
+					     to resend on link failure. */
 /* hashing types */
 #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
 #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */

^ permalink raw reply related

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
From: Eric Dumazet @ 2010-09-29 18:43 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev
In-Reply-To: <AANLkTinCz_0jHsMYW9_Wko78i9U20qefFbTYCQQQ-pz5@mail.gmail.com>

Le mercredi 29 septembre 2010 à 10:33 -0700, Jesse Gross a écrit :

> The tx lock has another use here: to break local loops.  With this
> change, a misconfigured tunnel can bring down the machine with a stack
> overflow.  There are clearly other ways to fix this that don't require
> a lock that restricts parallelism, such as a loop counter, but that's
> the way it is now.

Thats a very good point !

We could use a loop counter in the skb, but this use a bit of ram,
or percpu counters in tunnel drivers, to avoid a given level of
recursion.

/* this should be shared by all tunnels */
DEFINE_PER_CPU(tunnel_xmit_count);



tunnel_xmit()
{
if (__this_cpu_read(tunnel_xmit_count) >= LIMIT)
	goto tx_error;
__this_cpu_inc(tunnel_xmit_count);

....

__IPTUNNEL_XMIT(tstats, &dev->stats);

__this_cpu_dec(tunnel_xmit_count),
return NETDEV_TX_OK;

}




^ permalink raw reply

* Re: [PATCH v2 1/2] Phonet: Implement Pipe Controller to support Nokia Slim Modems
From: Rémi Denis-Courmont @ 2010-09-29 18:21 UTC (permalink / raw)
  To: ext Kumar SANGHVI, netdev@vger.kernel.org
  Cc: STEricsson_nomadik_linux, Sudeep DIVAKARAN, Gulshan KARMANI,
	Linus WALLEIJ
In-Reply-To: <20100929063219.GA18733@bnru01.bnr.st.com>


	Hello,

On Wednesday 29 September 2010 09:32:20 ext Kumar SANGHVI, you wrote:
> We can introduce corrections in Phonet stack for the PS path which
> would be minor. But then those changes look more like a hack. Further,
> this change would be required at several places where PS communication
> is happening with modem e.g. pep_reply, pipe_skb_send, pipe_snd_status,
> etc.
> 
> Further, if a new function tomorrow is introduced for PS path
> communication with modem then, that hack will have to be added to that
> function also.

It seems to me that you really want to implement the connect() socket call, so 
that one of the two endpoints will stand up for the missing controller. That's 
still much cleaner than CREATE and DESTROY ioctl()'s.

Besides, the pipe should be destroyed when the last socket descriptor is 
closed anyway. So ioctl(DESTROY) really seems like a bad idea to me.

Similarly, ENABLE and DISABLE should be revectored into a boolean socket 
option.

-- 
Rémi Denis-Courmont
Nokia Devices R&D, Maemo Software, Helsinki

^ permalink raw reply

* Re: VLAN packets silently dropped in promiscuous mode
From: Jesse Gross @ 2010-09-29 17:44 UTC (permalink / raw)
  To: Roger Luethi; +Cc: netdev, Patrick McHardy
In-Reply-To: <20100929113757.GA23755@core.hellgate.ch>

On Wed, Sep 29, 2010 at 4:37 AM, Roger Luethi <rl@hellgate.ch> wrote:
> I noticed packets for unknown VLANs getting silently dropped even in
> promiscuous mode (this is true only for the hardware accelerated path).
> netif_nit_deliver was introduced specifically to prevent that, but the
> function gets called only _after_ packets from unknown VLANs have been
> dropped.

Some drivers are fixing this on a case by case basis by disabling
hardware accelerated VLAN stripping when in promiscuous mode, i.e.:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f6c01819979afbfec7e0b15fe52371b8eed87e8

However, at this point it is more or less random which drivers do
this.  It would obviously be much better if it were consistent.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ip_gre: lockless xmit
From: Jesse Gross @ 2010-09-29 17:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1285664747.2607.48.camel@edumazet-laptop>

On Tue, Sep 28, 2010 at 2:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
>
> Note: If tunnels are created with the "oseq" option, LLTX is not
> enabled :
>
> Even using an atomic_t o_seq, we would increase chance for packets being
> out of order at receiver.
>
> Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
> 10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
>
> Before patch :
> real    3m0.094s
> user    0m9.365s
> sys     47m50.103s
>
> After patch:
> real    0m29.756s
> user    0m11.097s
> sys     7m33.012s
>
> Last problem to solve is the contention on dst :
>
>
> 38660.00 21.4% __ip_route_output_key          vmlinux
> 20786.00 11.5% dst_release                    vmlinux
> 14191.00  7.8% __xfrm_lookup                  vmlinux
> 12410.00  6.9% ip_finish_output               vmlinux
>  4540.00  2.5% ip_push_pending_frames         vmlinux
>  4427.00  2.4% ip_append_data                 vmlinux
>  4265.00  2.4% __alloc_skb                    vmlinux
>  4140.00  2.3% __ip_local_out                 vmlinux
>  3991.00  2.2% dev_queue_xmit                 vmlinux
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

The tx lock has another use here: to break local loops.  With this
change, a misconfigured tunnel can bring down the machine with a stack
overflow.  There are clearly other ways to fix this that don't require
a lock that restricts parallelism, such as a loop counter, but that's
the way it is now.

^ permalink raw reply

* Re: [PATCH net-next2.6] net: rename netdev rx_queue to ingress_queue
From: Jarek Poplawski @ 2010-09-29 16:54 UTC (permalink / raw)
  To: jamal; +Cc: Eric Dumazet, David Miller, netdev
In-Reply-To: <20100929131852.GA1891@del.dom.local>

On Wed, Sep 29, 2010 at 03:18:52PM +0200, Jarek Poplawski wrote:
> On Wed, Sep 29, 2010 at 06:56:57AM -0400, jamal wrote:
> > On Tue, 2010-09-28 at 20:04 +0200, Jarek Poplawski wrote:
> > 
> > > I hope Master of Ingress will forgive you this CC some day... ;-)
> > 
> > Forgiven-by: moi 
> > 
> > Which may or may not stand for "me" if you pardon my French
> 
> Which, may or may not stand for "my" if you pardon my Polish ;-)

Hmm...  I'm really slow!  FORGIVE "ME", "MOI" ;-)

Jarek P.

^ permalink raw reply

* Re: null dereference in ipip6_get_stats in linux-next
From: Eric Dumazet @ 2010-09-29 16:48 UTC (permalink / raw)
  To: Jesse Gross
  Cc: J. Bruce Fields, Stephen Rothwell, linux-next, linux-kernel,
	netdev
In-Reply-To: <AANLkTin2E2ACB5yiAUA7MbYqzYyMKAi+syFOCzXWqVrY@mail.gmail.com>

Le mercredi 29 septembre 2010 à 09:39 -0700, Jesse Gross a écrit :
> On Wed, Sep 29, 2010 at 9:26 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > Last night's linux-next fails to boot for me; apologies if this is already
> > known.
> 
> Eric sent out a patch for this:
> http://patchwork.ozlabs.org/patch/65961/

Also same kind of problem for ipip :

 http://patchwork.ozlabs.org/patch/65951/

Sorry guys :(

^ permalink raw reply

* how to use secure_tcp_sequence_number
From: Nicola Padovano @ 2010-09-29 16:46 UTC (permalink / raw)
  To: netfilter-devel, netdev

Hi all. How can I export the secure_tcp_sequence_number to use it in my modules?

-- 
Nicola Padovano
e-mail: nicola.padovano@gmail.com
web: http://npadovano.altervista.org

"My only ambition is not be anything at all; it seems the most
sensible thing" (C. Bukowski)

^ permalink raw reply

* Re: null dereference in ipip6_get_stats in linux-next
From: Jesse Gross @ 2010-09-29 16:39 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Stephen Rothwell, linux-next, linux-kernel, Eric Dumazet, netdev
In-Reply-To: <20100929162625.GB17087@fieldses.org>

On Wed, Sep 29, 2010 at 9:26 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> Last night's linux-next fails to boot for me; apologies if this is already
> known.

Eric sent out a patch for this:
http://patchwork.ozlabs.org/patch/65961/

^ permalink raw reply

* null dereference in ipip6_get_stats in linux-next
From: J. Bruce Fields @ 2010-09-29 16:26 UTC (permalink / raw)
  To: Stephen Rothwell, linux-next, linux-kernel, Eric Dumazet, netdev

Last night's linux-next fails to boot for me; apologies if this is already
known.

--b.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff818f1fc0>] ipip6_get_stats+0x10/0x50
PGD 0 
Oops: 0000 [#1] PREEMPT 
last sysfs file: 
CPU 0 
Modules linked in:

Pid: 1, comm: swapper Not tainted 2.6.36-rc5-next-20100929-05834-g1063b82 #471 /Bochs
RIP: 0010:[<ffffffff818f1fc0>]  [<ffffffff818f1fc0>] ipip6_get_stats+0x10/0x50
RSP: 0018:ffff88001f4bdbe0  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88001f4bdc30 RCX: 0000000000000000
RDX: ffff88001ceecc60 RSI: ffff88001f4bdc30 RDI: ffff88001ceecc60
RBP: ffff88001f4bdbe0 R08: ffffffff81b31b20 R09: ffff88001cef55e4
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88001f506c48
R13: ffff88001cef54f8 R14: ffff88001f4bdd1c R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff81e1c000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000001ce5f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff88001f4bc000, task ffff88001f4ba050)
Stack:
 ffff88001f4bdc00 ffffffff81840f6e ffff88001f4bdc00 ffff88001ceecc60
<0> ffff88001f4bdd50 ffffffff8184fe9b ffff88001f4bdcb0 ffff88001cef5584
<0> ffff88001cef3ca8 ffff8800ffffffff ffff88001cef54f0 ffff88001f400e80
Call Trace:
 [<ffffffff81840f6e>] dev_get_stats+0x5e/0xa0
 [<ffffffff8184fe9b>] rtnl_fill_ifinfo+0x39b/0x870
 [<ffffffff810e247a>] ? cache_alloc_debugcheck_after+0xea/0x220
 [<ffffffff81851115>] ? rtmsg_ifinfo+0x45/0x100
 [<ffffffff810e3f70>] ? __kmalloc_track_caller+0x150/0x290
 [<ffffffff81851115>] ? rtmsg_ifinfo+0x45/0x100
 [<ffffffff8185113e>] rtmsg_ifinfo+0x6e/0x100
 [<ffffffff8105f5c6>] ? raw_notifier_call_chain+0x16/0x20
 [<ffffffff81845391>] register_netdevice+0x441/0x4f0
 [<ffffffff818460df>] register_netdev+0x3f/0x60
 [<ffffffff81f0e9e6>] sit_init_net+0x194/0x1c4
 [<ffffffff81f0e934>] ? sit_init_net+0xe2/0x1c4
 [<ffffffff8183e28a>] ? ops_init.clone.2+0x6a/0x120
 [<ffffffff8183e268>] ops_init.clone.2+0x48/0x120
 [<ffffffff8183e417>] ? register_pernet_device+0x27/0x80
 [<ffffffff81f0e7f2>] ? sit_init+0x0/0x60
 [<ffffffff8183e397>] register_pernet_operations+0x57/0xb0
 [<ffffffff81f0e7f2>] ? sit_init+0x0/0x60
 [<ffffffff8183e426>] register_pernet_device+0x36/0x80
 [<ffffffff81f0e815>] sit_init+0x23/0x60
 [<ffffffff810001d2>] do_one_initcall+0x42/0x170
 [<ffffffff81ed85f5>] kernel_init+0xa5/0x12a
 [<ffffffff8196ccb9>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81003274>] kernel_thread_helper+0x4/0x10
 [<ffffffff8196defe>] ? restore_args+0x0/0x30
 [<ffffffff81ed8550>] ? kernel_init+0x0/0x12a
 [<ffffffff81003270>] ? kernel_thread_helper+0x0/0x10
Code: 48 8b 93 b8 00 00 00 e9 fd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 0f 1f 44 00 00 48 8b 87 58 04 00 00 <48> 8b 70 08 48 8b 48 10 48 8b 50 18 48 8b 00 48 89 b7 e0 00 00 
RIP  [<ffffffff818f1fc0>] ipip6_get_stats+0x10/0x50
 RSP <ffff88001f4bdbe0>
CR2: 0000000000000008
---[ end trace e2d6566c536d1627 ]---

^ permalink raw reply

* Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel
From: Shirley Ma @ 2010-09-29 15:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Arnd Bergmann, Avi Kivity, Xin, Xiaohui, David Miller, netdev,
	kvm, linux-kernel
In-Reply-To: <20100929151415.GA27041@redhat.com>

On Wed, 2010-09-29 at 17:14 +0200, Michael S. Tsirkin wrote:
> I guess I just don't understand what your patch does.
> If you send it, I can take a look.

Ok, I will collect more performance data and send the patch.

Thanks
Shirley

^ permalink raw reply

* Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel
From: Michael S. Tsirkin @ 2010-09-29 15:14 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Arnd Bergmann, Avi Kivity, Xin, Xiaohui, David Miller, netdev,
	kvm, linux-kernel
In-Reply-To: <1285730669.31343.7.camel@localhost.localdomain>

On Tue, Sep 28, 2010 at 08:24:29PM -0700, Shirley Ma wrote:
> Hello Michael,
> 
> On Wed, 2010-09-15 at 07:52 -0700, Shirley Ma wrote:
> > > >  Don't you think once I address vhost_add_used_and_signal update
> > > > issue, it is a simple and complete patch for macvtap TX zero copy?
> > > > 
> > > > Thanks
> > > > Shirley
> > > 
> > > I like the fact that the patch is simple. Unfortunately
> > > I suspect it'll stop being simple by the time it's complete :) 
> > 
> > I can make a try. :)
> 
> I compared several approaches for addressing the issue being raised here
> on how/when to update vhost_add_used_and_signal. The simple approach I
> have found is:
> 
> 1. Adding completion field in struct virtqueue;
> 2. when it is a zero copy packet, put vhost thread wait for completion
> to update vhost_add_used_and_signal;
> 3. passing vq from vhost to macvtap as skb destruct_arg;
> 4. when skb is freed for the last reference, signal vq completion
> 
> The test results show same performance as the original patch. How do you
> think? If it sounds good to you. I will resubmit this reversion patch.
> The patch still keeps as simple as it was before. :)
> 
> Thanks
> Shirley

I guess I just don't understand what your patch does.
If you send it, I can take a look.

-- 
MST

^ permalink raw reply

* Re: mmotm 2010-09-28-16-13 uploaded
From: Na Pohybel @ 2010-09-29 15:05 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: akpm, mm-commits, linux-kernel, netdev
In-Reply-To: <20100929080113.5581c19d.rdunlap@xenotime.net>

2010/9/29 Randy Dunlap <rdunlap@xenotime.net>:
> On Wed, 29 Sep 2010 15:13:41 +0200 N.P.S. N.P.S. wrote:
>
>> 2010/9/29  <akpm@linux-foundation.org>:
>> > The mm-of-the-moment snapshot 2010-09-28-16-13 has been uploaded to
>>
>> Another randconfig thing
>>
>> drivers/net/pch_gbe/pch_gbe_main.c: In function 'pch_gbe_tx_queue':
>> drivers/net/pch_gbe/pch_gbe_main.c:965: error: implicit declaration of
>> function 'tcp_hdr'
>> drivers/net/pch_gbe/pch_gbe_main.c:965: error: invalid type argument
>> of '->' (have 'int')
>> drivers/net/pch_gbe/pch_gbe_main.c:968: error: invalid type argument
>> of '->' (have 'int')
>> drivers/net/pch_gbe/pch_gbe_main.c:976: error: implicit declaration of
>> function 'udp_hdr'
>> drivers/net/pch_gbe/pch_gbe_main.c:976: error: invalid type argument
>> of '->' (have 'int')
>> drivers/net/pch_gbe/pch_gbe_main.c:980: error: invalid type argument
>> of '->' (have 'int')
>> make[3]: *** [drivers/net/pch_gbe/pch_gbe_main.o] Error 1
>> make[2]: *** [drivers/net/pch_gbe] Error 2
>> make[1]: *** [drivers/net] Error 2
>> make: *** [drivers] Error 2
>
> Patch has been sent to DaveM and he has merged it.

Thanks again!

>
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>



-- 
Slawa!
N.P.S.

^ permalink raw reply

* Re: mmotm 2010-09-28-16-13 uploaded
From: Randy Dunlap @ 2010-09-29 15:01 UTC (permalink / raw)
  To: N.P.S. N.P.S.; +Cc: akpm, mm-commits, linux-kernel, netdev
In-Reply-To: <AANLkTinYExzihm6BuP=amLDwMO-zXejeUq5xeHL4ETg_@mail.gmail.com>

On Wed, 29 Sep 2010 15:13:41 +0200 N.P.S. N.P.S. wrote:

> 2010/9/29  <akpm@linux-foundation.org>:
> > The mm-of-the-moment snapshot 2010-09-28-16-13 has been uploaded to
> 
> Another randconfig thing
> 
> drivers/net/pch_gbe/pch_gbe_main.c: In function 'pch_gbe_tx_queue':
> drivers/net/pch_gbe/pch_gbe_main.c:965: error: implicit declaration of
> function 'tcp_hdr'
> drivers/net/pch_gbe/pch_gbe_main.c:965: error: invalid type argument
> of '->' (have 'int')
> drivers/net/pch_gbe/pch_gbe_main.c:968: error: invalid type argument
> of '->' (have 'int')
> drivers/net/pch_gbe/pch_gbe_main.c:976: error: implicit declaration of
> function 'udp_hdr'
> drivers/net/pch_gbe/pch_gbe_main.c:976: error: invalid type argument
> of '->' (have 'int')
> drivers/net/pch_gbe/pch_gbe_main.c:980: error: invalid type argument
> of '->' (have 'int')
> make[3]: *** [drivers/net/pch_gbe/pch_gbe_main.o] Error 1
> make[2]: *** [drivers/net/pch_gbe] Error 2
> make[1]: *** [drivers/net] Error 2
> make: *** [drivers] Error 2

Patch has been sent to DaveM and he has merged it.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel
From: Shirley Ma @ 2010-09-29 14:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Arnd Bergmann, Avi Kivity, Xin, Xiaohui, David Miller, netdev,
	kvm, linux-kernel
In-Reply-To: <20100929082820.GC21195@redhat.com>

On Wed, 2010-09-29 at 10:28 +0200, Michael S. Tsirkin wrote:
> > > 1. Adding completion field in struct virtqueue;
> > > 2. when it is a zero copy packet, put vhost thread wait for
> completion
> > > to update vhost_add_used_and_signal;
> > > 3. passing vq from vhost to macvtap as skb destruct_arg;
> > > 4. when skb is freed for the last reference, signal vq completion
> > > The test results show same performance as the original patch. How
> do you
> > > think? If it sounds good to you. I will resubmit this reversion
> patch.
> > > The patch still keeps as simple as it was before. :)
> > > 
> > > Thanks
> > > Shirley
> > 
> > If you look at dev_hard_start_xmit you will see a call
> > to skb_orphan_try which often calls the skb destructor.
> > So I suspect this is almost equivalent to your original patch,
> > and has the same correctness issue.
> 
> So you could try doing skb_tx(skb)->prevent_sk_orphan = 1
> just to see what will happen. Might be interesting - just
> make sure the device doesn't orphan the skb first thing.
> I suspect lack of parallelism will result in bad throughput
> esp for small messages.
> 
> Note this still won't make it correct (this has module unloading
> issue, and devices might still orphan skb, clone it, or hang on to
> paged data in some other way) but at least closer. 

For message size smaller than 128, it still does copy. I tested some
small message size, I didn't see any regression. I will run more test to
focus on small message size between 128 - 4K.

I don't need prevent_sk_orphan since in skb_release_data for last
reference, I just need the ZEROCOPY flag from that sock to signal a
completion.

Thanks
Shirley


^ permalink raw reply

* Re: [PATCH linux-2.6 v2] IPv6: Create temporary address if none exists.
From: Brian Haley @ 2010-09-29 14:43 UTC (permalink / raw)
  To: David Miller
  Cc: gwurster, kuznet, pekkas, jmorris, yoshfuji, kaber, shemminger,
	eric.dumazet, herbert, ebiederm, netdev, linux-kernel
In-Reply-To: <20100928.222510.71109554.davem@davemloft.net>

On 09/29/2010 01:25 AM, David Miller wrote:
> From: Glenn Wurster <gwurster@scs.carleton.ca>
> Date: Mon, 27 Sep 2010 13:04:30 -0400
> 
>> If privacy extentions are enabled, but no current temporary address exists,
>> then create one when we get a router advertisement.
>>
>> Version 2, now with 100% fewer line wraps.  Thanks to David Miller for
>> pointing out the line wrapping issue.
>>
>> Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca>
> 
> The existing code is correct from what I can tell.
> 
> Variable "create" is true when "ifp == NULL" and "valid_lft != 0"
> 
> And RFC 3041 explicitly states in section 3.3:
> 
> 	When a new public address is created as described in [ADDRCONF]
> 	(because the prefix advertised does not match the prefix of any
> 	address already assigned to the interface, and Valid Lifetime
> 	in the option is not zero), also create a new temporary address.
> 
> Your patch is going to cause us to create a temporary address even
> when valid_lft is zero, which the RFC says we should not do.
> 
> That goes against what the RFC tells us to do, so I can only conclude
> that your patch is not correct.

I think this patch might actually be OK, I had to look at this more than
once to figure out the problem Glenn was trying to fix.  Maybe he can
confirm.

>From what I have found, this is fixing the case where we've changed
use_tempaddr to 1 on an interface that already has a "stable" IPv6
prefix.  In that case you'll never add a temporary address:

# ip -6 a s dev eth6
10: eth6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2620:0:a09:e000:21f:29ff:fe59:faca/64 scope global dynamic 
       valid_lft 2591820sec preferred_lft 604620sec
    inet6 fe80::21f:29ff:fe59:faca/64 scope link 
       valid_lft forever preferred_lft forever

07:47:52.119051 IP6 fe80::205:9aff:fe3a:1871 > ip6-allnodes: ICMP6, router advertisement

# ip -6 a s dev eth6
10: eth6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2620:0:a09:e000:21f:29ff:fe59:faca/64 scope global dynamic 
       valid_lft 2591996sec preferred_lft 604796sec
    inet6 fe80::21f:29ff:fe59:faca/64 scope link 
       valid_lft forever preferred_lft forever

No temp address :(

Since we're in the "if (ifp)" block, we can assume that at some point
in time we did get a valid advertisement to add this address, whether
is was right now or an hour ago doesn't matter.  Of course the RFCs
don't cover this case, they assume privacy settings were enabled at
boot time, if there's ever an update to 4941/3041 that should be
clarified.

Maybe the below (untested) patch is better?  Glenn, can you test this?

-Brian


---

If privacy extensions are enabled, but no current temporary address exists,
then create one when we get a router advertisement with a valid lifetime.

Signed-off-by: Brian Haley <brian.haley@hp.com>

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8c88340..fb238d6 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1925,7 +1925,8 @@ ok:
 			update_lft = create = 1;
 			ifp->cstamp = jiffies;
 			addrconf_dad_start(ifp, RTF_ADDRCONF|RTF_PREFIX_RT);
-		}
+		} else if (list_empty(&in6_dev->tempaddr_list) && valid_lft)
+			create = 1;	/* use_tempaddr could have changed */
 
 		if (ifp) {
 			int flags;

^ permalink raw reply related

* Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel
From: Shirley Ma @ 2010-09-29 14:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Arnd Bergmann, Avi Kivity, Xin, Xiaohui, David Miller, netdev,
	kvm, linux-kernel
In-Reply-To: <20100929081645.GA21195@redhat.com>

On Wed, 2010-09-29 at 10:16 +0200, Michael S. Tsirkin wrote:
> > I compared several approaches for addressing the issue being raised
> here
> > on how/when to update vhost_add_used_and_signal. The simple approach
> I
> > have found is:
> > 
> > 1. Adding completion field in struct virtqueue;
> > 2. when it is a zero copy packet, put vhost thread wait for
> completion
> > to update vhost_add_used_and_signal;
> > 3. passing vq from vhost to macvtap as skb destruct_arg;
> > 4. when skb is freed for the last reference, signal vq completion
> > The test results show same performance as the original patch. How do
> you
> > think? If it sounds good to you. I will resubmit this reversion
> patch.
> > The patch still keeps as simple as it was before. :)
> > 
> > Thanks
> > Shirley
> 
> If you look at dev_hard_start_xmit you will see a call
> to skb_orphan_try which often calls the skb destructor.
> So I suspect this is almost equivalent to your original patch,
> and has the same correctness issue. 

If I didn't address this, then vhost net will wait for ever with wait
for completion :))


Thanks
Shirley

^ permalink raw reply

* Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel
From: Shirley Ma @ 2010-09-29 14:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Arnd Bergmann, Avi Kivity, Xin, Xiaohui, David Miller, netdev,
	kvm, linux-kernel
In-Reply-To: <20100929082820.GC21195@redhat.com>

On Wed, 2010-09-29 at 10:28 +0200, Michael S. Tsirkin wrote:
> I think you should try testing with guest to external communication,
> this will uncover some of these correctness issues for you.
> I think netperf also has some flag to check data, might
> be a good idea to use it for testing.

I always tested guest to remote host, remote host to guest. What other
external communication do you suggest here?

Thanks
Shirley


^ permalink raw reply

* Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel
From: Shirley Ma @ 2010-09-29 14:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Arnd Bergmann, Avi Kivity, Xin, Xiaohui, David Miller, netdev,
	kvm, linux-kernel
In-Reply-To: <20100929081645.GA21195@redhat.com>

On Wed, 2010-09-29 at 10:16 +0200, Michael S. Tsirkin wrote:
> If you look at dev_hard_start_xmit you will see a call
> to skb_orphan_try which often calls the skb destructor.
> So I suspect this is almost equivalent to your original patch,
> and has the same correctness issue.

I forgot to mention, in skb_orphan_try, I still kept the skb->sk
reference to sk, I assign skb->sk to NULL when the DMA done.

Thanks
Shirley


^ permalink raw reply

* Regression (ancient), bisected: TCP hangs with certain ESP6 SA.
From: Nick Bowler @ 2010-09-29 14:22 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Herbert Xu, David S. Miller, Eric Dumazet

Since around 2.6.25, TCP connections hang when using IPv6 ESP, in
transport mode, with AES and a null MAC.  It really is that specific:
ICMP/UDP work fine with this cipher/mac combination, tunnel mode works
fine, and so do DES/triple-DES.

In this scenario, there are two hosts, host1 and host2, with IPv6
addresses fec0:42::2 and fec0:42::3, respectively.  Host2 is running
lighttpd serving a 300MB binary file called 'big'.

On host1, we run the following setkey script:

  add fec0:42::2 fec0:42::3 esp 0x6B8B4567  -f seq-pad
   -E rijndael-cbc 0x643C98696633487374B0DC5119495CFF
   -A null;
  
  add fec0:42::3 fec0:42::2 esp 0x327B23C6  -f seq-pad
   -E rijndael-cbc 0x3D1B58BA507ED7AB2EB141F241B71EFB
   -A null;
  
  spdadd fec0:42::2 fec0:42::3 any -P out ipsec
   esp/transport//require;
  spdadd fec0:42::3 fec0:42::2 any -P in ipsec
   esp/transport//require;

And on host2, the complementary script:

  add fec0:42::2 fec0:42::3 esp 0x6B8B4567  -f seq-pad
   -E rijndael-cbc 0x643C98696633487374B0DC5119495CFF
   -A null;
  
  add fec0:42::3 fec0:42::2 esp 0x327B23C6  -f seq-pad
   -E rijndael-cbc 0x3D1B58BA507ED7AB2EB141F241B71EFB
   -A null;
  
  spdadd fec0:42::3 fec0:42::2 any -P out ipsec
   esp/transport//require;
  spdadd fec0:42::2 fec0:42::3 any -P in ipsec
   esp/transport//require;

>From host1 we can now do ping6 fec0:42::3 and it will work.  However,
when we attempt to wget 'http://[fec0:42::3]/big',  we get the
following:

  Connecting to fec0:42::3:80... connected.
  HTTP request sent, awaiting response... Read error (connection timed
  out) in headers.
  Giving up.

A similar issue occurs with SSH: we can connect successfully, and even
run some commands on the remote host, but the connection stalls
permanently as soon as we try to do anything remotely interesting.

I managed to bisect the issue to the following commit.  Amazingly, the
commit can still be reverted with almost no hassle and solves this
particular issue.

b5c15fc004ac83b7ad280acbe0fd4bbed7e2c8d4 is the first bad commit
commit b5c15fc004ac83b7ad280acbe0fd4bbed7e2c8d4
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 14 23:49:37 2008 -0800

    [IPV6]: Fix reversed local_df test in ip6_fragment
    
    I managed to reverse the local_df test when forward-porting this
    patch so it actually makes things worse by never fragmenting at
    all.
    
    Thanks to David Stevens for testing and reporting this bug.
    
    Bill Fink pointed out that the local_df setting is also the wrong
    way around.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 e8b31c5a4196df9292237f6c654758eebeadf00d a6be8522530892360965c38951b45520bc49160b M	net

git bisect start
# good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23
git bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1
# bad: [8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84] Linux 2.6.29
git bisect bad 8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84
# bad: [6b648063eb51e2620774ddaebef4e07f2f6f4ae7] [S390] Cleanup lcs printk messages.
git bisect bad 6b648063eb51e2620774ddaebef4e07f2f6f4ae7
# good: [2c044a4803804708984931bcbd03314732e995d5] USB: fix codingstyle issues in drivers/usb/core/*.c
git bisect good 2c044a4803804708984931bcbd03314732e995d5
# bad: [334d094504c2fe1c44211ecb49146ae6bca8c321] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26
git bisect bad 334d094504c2fe1c44211ecb49146ae6bca8c321
# bad: [e51c683717e3ac21713444e9a517aa8e0ad0ee48] zd1211rw: Fixed incorrect constant name.
git bisect bad e51c683717e3ac21713444e9a517aa8e0ad0ee48
# good: [0afc2edfada50980bec999f94dcea26ebad3dda6] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
git bisect good 0afc2edfada50980bec999f94dcea26ebad3dda6
# good: [b791dd3ed7bef989f268365e85800862e8ac756f] Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
git bisect good b791dd3ed7bef989f268365e85800862e8ac756f
# bad: [d5c67bac833c6c9cc713f6a27daa77dcba898dd8] Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
git bisect bad d5c67bac833c6c9cc713f6a27daa77dcba898dd8
# good: [0d63e4f9ea61df1d727bd52a174aba732e6e1853] Dont touch fs_struct in drivers
git bisect good 0d63e4f9ea61df1d727bd52a174aba732e6e1853
# good: [7d8330a563b00040326084f933f5bee06675ac54] KVM is not seen under X86 config with latest git (32 bit compile)
git bisect good 7d8330a563b00040326084f933f5bee06675ac54
# good: [30b3cfe1f67550bb6ec6868507a78060ef98269a] [ATYFB]: Kill 'prom_palette' sparc code.
git bisect good 30b3cfe1f67550bb6ec6868507a78060ef98269a
# bad: [b69409279c4c960fcd1575bcf80f2a0ca414ca93] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
git bisect bad b69409279c4c960fcd1575bcf80f2a0ca414ca93
# good: [45b503548210fe6f23e92b856421c2a3f05fd034] [RTNETLINK]: Send a single notification on device state changes.
git bisect good 45b503548210fe6f23e92b856421c2a3f05fd034
# bad: [c58310bf4933986513020fa90b4190c7492995ae] Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus
git bisect bad c58310bf4933986513020fa90b4190c7492995ae
# good: [f527cf405017e60ceb28f84e2d60ab16fc34f209] Merge branch 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm
git bisect good f527cf405017e60ceb28f84e2d60ab16fc34f209
# good: [96f2fc006c281cbd5702a409c57d1f1549cde1fe] sh: Clean up whitespace damage in Kconfig.debug.
git bisect good 96f2fc006c281cbd5702a409c57d1f1549cde1fe
# good: [5c8f82c64941594cdab53bf9f9a66c190781f4f6] maple: Fix up maple build failure.
git bisect good 5c8f82c64941594cdab53bf9f9a66c190781f4f6
# good: [e036eaa681a17f71b64f6d9040fe605555623919] sh: use ctrl_in/out for on chip pci access
git bisect good e036eaa681a17f71b64f6d9040fe605555623919
# good: [11d64be6a631236b3b3d21711c7d1a83d9f85904] Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
git bisect good 11d64be6a631236b3b3d21711c7d1a83d9f85904
# bad: [f6866fecd6fd8e44a6715da09844a4fd1b8484da] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
git bisect bad f6866fecd6fd8e44a6715da09844a4fd1b8484da
# good: [4ee29f6a52158cea526b16a44ae38643946103ec] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
git bisect good 4ee29f6a52158cea526b16a44ae38643946103ec
# good: [0f4bda005fd685f7cbb2ad47b7bab1b155df2b86] net: xfrm statistics depend on INET
git bisect good 0f4bda005fd685f7cbb2ad47b7bab1b155df2b86
# good: [073a371987f9a9806a85329eed51dca1fc52a7a0] [XFRM]: Avoid bogus BUG() when throwing new policy away.
git bisect good 073a371987f9a9806a85329eed51dca1fc52a7a0
# bad: [69c3683ca7fe066ecba9e8a0424c5abd258a5d58] netconsole: avoid null pointer dereference at show_local_mac()
git bisect bad 69c3683ca7fe066ecba9e8a0424c5abd258a5d58
# bad: [b5c15fc004ac83b7ad280acbe0fd4bbed7e2c8d4] [IPV6]: Fix reversed local_df test in ip6_fragment
git bisect bad b5c15fc004ac83b7ad280acbe0fd4bbed7e2c8d4

-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

^ permalink raw reply

* Re: [RFC] Online firmware upgrade in non-embedded systems
From: Carl-Daniel Hailfinger @ 2010-09-29 13:43 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, linux-kernel, linux-mtd, sf-linux-drivers, flashrom
In-Reply-To: <1285765850.2283.15.camel@achroite.uk.solarflarecom.com>

On 29.09.2010 15:10, Ben Hutchings wrote:
> On Wed, 2010-09-29 at 14:45 +0200, Carl-Daniel Hailfinger wrote:
>   
>> On 29.09.2010 14:35, Ben Hutchings wrote:
>>     
>>> On Wed, 2010-09-29 at 00:41 +0200, Carl-Daniel Hailfinger wrote:
>>>   
>>>       
>>>> [adding flashrom@flashrom.org to CC, senders will be whitelisted after a
>>>> short delay]
>>>>
>>>> On 28.09.2010 19:59, Ben Hutchings wrote:
>>>>     
>>>>         
>>>>> Network and disk controllers normally have at least some firmware in
>>>>> flash to support their use as boot devices. [...]
>>>>>   
>>>>>       
>>>>>           
>>>> Given that the flashrom utility <http://www.flashrom.org/> (GPLv2)
>>>> supports flashing many network cards, SATA/PATA controllers, graphics
>>>> cards, and of course the main system firmware/BIOS/EFI, and it does that
>>>> from userspace without any kernel support,
>>>>     
>>>>         
>>> [...]
>>>
>>> I'm looking for a clean solution, not a hack.
>>>   
>>>       
>> What would qualify as a clean solution?
>>     
>
> One where hardware access is mediated by the kernel, and doesn't involve
> unloading or potentially conflicting with the driver for that hardware.
>   

flashrom can ask the kernel driver to "please stop accessing flash". No
unloading needed, no conflict in place.


>> And is cross-platform code one of your goals?
>>     
>
> Not at this level.  At the application level, yes, but we already have a
> working application so I'm not interested in using flashrom for that.
>   

I see. Just because I'm interested in how other flashing applications
solve this: Does that application work on *BSD as well? And could you
tell me the name of the app so I can take a look at it? Thanks.

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox