Netdev List
 help / color / mirror / Atom feed
* Re: bisect results of MSI-X related panic (help!)
From: Tejun Heo @ 2009-09-14  9:40 UTC (permalink / raw)
  To: Frans Pop; +Cc: Jesse Brandeburg, linux-kernel, netdev, Ingo Molnar
In-Reply-To: <200909120623.49764.elendil@planet.nl>

Frans Pop wrote:
> Jesse Brandeburg wrote:
>> I've bisected, here is my bisect log, problem is that the commit
>> identified is a merge commit, and *I don't know what to revert to test*.
>> It appears the parent of the merge:
>> 6e15cf04860074ad032e88c306bea656bbdd0f22 is marked good, but looks to be
>> in a possibly related area to the panic.
> 
> That merge does contain quite a few merge fixups, so it's quite possible 
> one of them is the cause of the failure.
> Maybe the simplest way to verify that is to compile both parents of the 
> merge to doublecheck that they work OK. Then, if a compile of the merge 
> itself is bad, the problem really is in the merge commit itself.
> 
> That commit is the "percpu" merge, so I've added Tejun (author of most of 
> that branch) and Ingo (merger) in CC.

Sorry, the oops doesn't ring a bell, well, not yet at least.  It would
be great if the bisection can be narrowed down more.

-- 
tejun

^ permalink raw reply

* Re: [PATCH RESEND] bonding: remap muticast addresses without using dev_close() and dev_open()
From: Moni Shoua @ 2009-09-14  9:38 UTC (permalink / raw)
  To: David Miller; +Cc: ogerlitz, fubar, jgunthorpe, netdev, bonding-devel
In-Reply-To: <20090911.123652.92869438.davem@davemloft.net>


> But unfortunately this patch doesn't apply to net-next-2.6 and
> you'll need to respin it against current sources so I can apply
> it, thanks.
The patch was written against linux-2.6 but I just pulled net-next-2.6 (git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git) and found out that besides offsets it applies well.
Is the URL above correct? Anyway, I add here the one that I took from net-next-2.6. Please let me know if it still doesn't apply. Thanks
-------------------------------------------------------------------------------

This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:

*. It assumes tha the device is UP when calling dev_close(), or otherwise
   dev_close() has no affect. It is worth to mention that initscripts (Redhat)
   and sysconfig (Suse) doesn't act the same in this matter. 
*. dev_close() has other side affects, like deleting entries from the routing
   table, which might be unnecessary.

The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
   
Reported-by:   Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
--
 drivers/net/bonding/bond_main.c |    9 ++++++---
 include/linux/igmp.h            |    2 ++
 include/linux/netdevice.h       |    3 ++-
 include/linux/notifier.h        |    2 ++
 include/net/addrconf.h          |    2 ++
 net/core/dev.c                  |    4 ++--
 net/ipv4/devinet.c              |    6 ++++++
 net/ipv4/igmp.c                 |   22 ++++++++++++++++++++++
 net/ipv6/addrconf.c             |   19 +++++++++++++++++++
 net/ipv6/mcast.c                |   19 +++++++++++++++++++
 10 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a7e731f..6419cf9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1211,7 +1211,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 			write_unlock_bh(&bond->curr_slave_lock);
 			read_unlock(&bond->lock);
 
-			netdev_bonding_change(bond->dev);
+			netdev_bonding_change(bond->dev, NETDEV_BONDING_FAILOVER);
 
 			read_lock(&bond->lock);
 			write_lock_bh(&bond->curr_slave_lock);
@@ -1469,14 +1469,17 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	 */
 	if (bond->slave_cnt == 0) {
 		if (bond_dev->type != slave_dev->type) {
-			dev_close(bond_dev);
 			pr_debug("%s: change device type from %d to %d\n",
 				bond_dev->name, bond_dev->type, slave_dev->type);
+
+			netdev_bonding_change(bond_dev, NETDEV_BONDING_OLDTYPE);
+
 			if (slave_dev->type != ARPHRD_ETHER)
 				bond_setup_by_slave(bond_dev, slave_dev);
 			else
 				ether_setup(bond_dev);
-			dev_open(bond_dev);
+
+			netdev_bonding_change(bond_dev, NETDEV_BONDING_NEWTYPE);
 		}
 	} else if (bond_dev->type != slave_dev->type) {
 		pr_err(DRV_NAME ": %s ether type (%d) is different "
diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 92fbd8c..fe158e0 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -233,6 +233,8 @@ extern void ip_mc_init_dev(struct in_device *);
 extern void ip_mc_destroy_dev(struct in_device *);
 extern void ip_mc_up(struct in_device *);
 extern void ip_mc_down(struct in_device *);
+extern void ip_mc_unmap(struct in_device *);
+extern void ip_mc_remap(struct in_device *);
 extern void ip_mc_dec_group(struct in_device *in_dev, __be32 addr);
 extern void ip_mc_inc_group(struct in_device *in_dev, __be32 addr);
 extern void ip_mc_rejoin_group(struct ip_mc_list *im);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 65ee192..f46db6c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1873,7 +1873,8 @@ extern void		__dev_addr_unsync(struct dev_addr_list **to, int *to_count, struct
 extern int		dev_set_promiscuity(struct net_device *dev, int inc);
 extern int		dev_set_allmulti(struct net_device *dev, int inc);
 extern void		netdev_state_change(struct net_device *dev);
-extern void		netdev_bonding_change(struct net_device *dev);
+extern void		netdev_bonding_change(struct net_device *dev,
+					      unsigned long event);
 extern void		netdev_features_change(struct net_device *dev);
 /* Load a device via the kmod */
 extern void		dev_load(struct net *net, const char *name);
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 81bc252..44428d2 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -199,6 +199,8 @@ static inline int notifier_to_errno(int ret)
 #define NETDEV_FEAT_CHANGE	0x000B
 #define NETDEV_BONDING_FAILOVER 0x000C
 #define NETDEV_PRE_UP		0x000D
+#define NETDEV_BONDING_OLDTYPE  0x000E
+#define NETDEV_BONDING_NEWTYPE  0x000F
 
 #define SYS_DOWN	0x0001	/* Notify of system down */
 #define SYS_RESTART	SYS_DOWN
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 7b55ab2..0f7c378 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -143,6 +143,8 @@ extern int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr
 extern int ipv6_dev_mc_dec(struct net_device *dev, const struct in6_addr *addr);
 extern void ipv6_mc_up(struct inet6_dev *idev);
 extern void ipv6_mc_down(struct inet6_dev *idev);
+extern void ipv6_mc_unmap(struct inet6_dev *idev);
+extern void ipv6_mc_remap(struct inet6_dev *idev);
 extern void ipv6_mc_init_dev(struct inet6_dev *idev);
 extern void ipv6_mc_destroy_dev(struct inet6_dev *idev);
 extern void addrconf_dad_failure(struct inet6_ifaddr *ifp);
diff --git a/net/core/dev.c b/net/core/dev.c
index f843a0c..f6a4495 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1017,9 +1017,9 @@ void netdev_state_change(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_state_change);
 
-void netdev_bonding_change(struct net_device *dev)
+void netdev_bonding_change(struct net_device *dev, unsigned long event)
 {
-	call_netdevice_notifiers(NETDEV_BONDING_FAILOVER, dev);
+	call_netdevice_notifiers(event, dev);
 }
 EXPORT_SYMBOL(netdev_bonding_change);
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 3863c3a..07336c6 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1087,6 +1087,12 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 	case NETDEV_DOWN:
 		ip_mc_down(in_dev);
 		break;
+	case NETDEV_BONDING_OLDTYPE:
+		ip_mc_unmap(in_dev);
+		break;
+	case NETDEV_BONDING_NEWTYPE:
+		ip_mc_remap(in_dev);
+		break;
 	case NETDEV_CHANGEMTU:
 		if (inetdev_valid_mtu(dev->mtu))
 			break;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 01b4284..d41e5de 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1298,6 +1298,28 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr)
 	}
 }
 
+/* Device changing type */
+
+void ip_mc_unmap(struct in_device *in_dev)
+{
+	struct ip_mc_list *i;
+
+	ASSERT_RTNL();
+
+	for (i = in_dev->mc_list; i; i = i->next)
+		igmp_group_dropped(i);
+}
+
+void ip_mc_remap(struct in_device *in_dev)
+{
+	struct ip_mc_list *i;
+
+	ASSERT_RTNL();
+
+	for (i = in_dev->mc_list; i; i = i->next)
+		igmp_group_added(i);
+}
+
 /* Device going down */
 
 void ip_mc_down(struct in_device *in_dev)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c9b3690..f216a41 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -137,6 +137,8 @@ static DEFINE_SPINLOCK(addrconf_verify_lock);
 static void addrconf_join_anycast(struct inet6_ifaddr *ifp);
 static void addrconf_leave_anycast(struct inet6_ifaddr *ifp);
 
+static void addrconf_bonding_change(struct net_device *dev,
+				    unsigned long event);
 static int addrconf_ifdown(struct net_device *dev, int how);
 
 static void addrconf_dad_start(struct inet6_ifaddr *ifp, u32 flags);
@@ -2582,6 +2584,10 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 				return notifier_from_errno(err);
 		}
 		break;
+	case NETDEV_BONDING_OLDTYPE:
+	case NETDEV_BONDING_NEWTYPE:
+		addrconf_bonding_change(dev, event);
+		break;
 	}
 
 	return NOTIFY_OK;
@@ -2595,6 +2601,19 @@ static struct notifier_block ipv6_dev_notf = {
 	.priority = 0
 };
 
+static void addrconf_bonding_change(struct net_device *dev, unsigned long event)
+{
+	struct inet6_dev *idev;
+	ASSERT_RTNL();
+
+	idev = __in6_dev_get(dev);
+
+	if (event == NETDEV_BONDING_NEWTYPE)
+		ipv6_mc_remap(idev);
+	else if (event == NETDEV_BONDING_OLDTYPE)
+		ipv6_mc_unmap(idev);
+}
+
 static int addrconf_ifdown(struct net_device *dev, int how)
 {
 	struct inet6_dev *idev;
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 71c3dac..f9fcf69 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -2249,6 +2249,25 @@ static void igmp6_timer_handler(unsigned long data)
 	ma_put(ma);
 }
 
+/* Device changing type */
+
+void ipv6_mc_unmap(struct inet6_dev *idev)
+{
+	struct ifmcaddr6 *i;
+
+	/* Install multicast list, except for all-nodes (already installed) */
+
+	read_lock_bh(&idev->lock);
+	for (i = idev->mc_list; i; i = i->next)
+		igmp6_group_dropped(i);
+	read_unlock_bh(&idev->lock);
+}
+
+void ipv6_mc_remap(struct inet6_dev *idev)
+{
+	ipv6_mc_up(idev);
+}
+
 /* Device going down */
 
 void ipv6_mc_down(struct inet6_dev *idev)

^ permalink raw reply related

* Re: [PATCH RFC] tun: export underlying socket
From: Michael S. Tsirkin @ 2009-09-14  9:11 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, netdev, herbert
In-Reply-To: <4AADFC0A.30305@voltaire.com>

On Mon, Sep 14, 2009 at 11:17:14AM +0300, Or Gerlitz wrote:
> Michael S. Tsirkin wrote:
>> On Mon, Sep 14, 2009 at 11:07:25AM +0300, Or Gerlitz wrote:
>>   
>>> vhost injects packets into physical device, what is the use case of vhost injecting packets into the host network stack?
>>>     
>> The case where the user requires bridging, typically.
>>   
> So you want to support a scheme where someone wants to attach vhost to a  
> bridge? why not just telling these users to just set their vhost on top  
> of veth couple, such that one veth device is added to a bridge as  
> interface and a vhost instance is tied to the other veth device?
>
>
> Or.

That's already possible. However virtualization users are familiar with
configuring the tun device, and tun has grown virtualization-specific
extensions, so I don't see a reason not to accomodate these uses.

-- 
MST

^ permalink raw reply

* Re: igb bandwidth allocation configuration
From: Or Gerlitz @ 2009-09-14  8:42 UTC (permalink / raw)
  To: Simon Horman
  Cc: e1000-devel, netdev, Alexander Duyck, Kirsher, Jeffrey T,
	Or Gerlitz
In-Reply-To: <20090910081844.GA5421@verge.net.au>

On 9/10/09, Simon Horman <horms@verge.net.au> wrote:
> I have been looking into adding support the 82586's per-PF/VF bandwidth allocation to
> the igb driver. It seems that the trickiest part is working out how to expose things to
> user-space.

Please note that there are bunch (not many, but more then 1-2) of
things to configure from user space in a PF/VF scheme, and I think it
would be best if we do that through one mechanism, which may be
netlink based or extension to ethtool as you suggested, we've started
to discuss this on the "L2 switching in igb" thread

The "82576 SR-IOV Driver Companion Guide" document, section 7.6
mentions "Transmit Bandwidth Allocation to VFs... define minimum
transmit bandwidth for individual VMs".

I'm not clear if one can program rate limiter (upper bound) per VF or
actually rate guarantee per VF, even  with these being  just details
of specific device, alex, I would be happy if you can clarify that.


Or.

^ permalink raw reply

* [PATCH] pkt_sched: Fix qdisc_graft WRT ingress qdisc
From: Jarek Poplawski @ 2009-09-14  8:35 UTC (permalink / raw)
  To: David Miller; +Cc: Patrick McHardy, netdev

After the recent mq change using ingress qdisc overwrites dev->qdisc;
there is also a wrong old qdisc pointer passed to notify_and_destroy.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/sched/sch_api.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 3af1061..c5a6d62 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -693,13 +693,18 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 			if (new && i > 0)
 				atomic_inc(&new->refcnt);
 
-			qdisc_destroy(old);
+			if (!ingress)
+				qdisc_destroy(old);
 		}
 
-		notify_and_destroy(skb, n, classid, dev->qdisc, new);
-		if (new && !new->ops->attach)
-			atomic_inc(&new->refcnt);
-		dev->qdisc = new ? : &noop_qdisc;
+		if (!ingress) {
+			notify_and_destroy(skb, n, classid, dev->qdisc, new);
+			if (new && !new->ops->attach)
+				atomic_inc(&new->refcnt);
+			dev->qdisc = new ? : &noop_qdisc;
+		} else {
+			notify_and_destroy(skb, n, classid, old, new);
+		}
 
 		if (dev->flags & IFF_UP)
 			dev_activate(dev);

^ permalink raw reply related

* Re: [PATCH RFC] tun: export underlying socket
From: Or Gerlitz @ 2009-09-14  8:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, herbert
In-Reply-To: <20090914080923.GC14030@redhat.com>

Michael S. Tsirkin wrote:
> On Mon, Sep 14, 2009 at 11:07:25AM +0300, Or Gerlitz wrote:
>   
>> vhost injects packets into physical device, what is the use case of vhost injecting packets into the host network stack?
>>     
> The case where the user requires bridging, typically.
>   
So you want to support a scheme where someone wants to attach vhost to a 
bridge? why not just telling these users to just set their vhost on top 
of veth couple, such that one veth device is added to a bridge as 
interface and a vhost instance is tied to the other veth device?


Or.


^ permalink raw reply

* Re: [PATCH RFC] tun: export underlying socket
From: Michael S. Tsirkin @ 2009-09-14  8:09 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, netdev, herbert, Or Gerlitz
In-Reply-To: <15ddcffd0909140107m4d94f5abh5405074b654bd15d@mail.gmail.com>

On Mon, Sep 14, 2009 at 11:07:25AM +0300, Or Gerlitz wrote:
> On 9/10/09, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  This way, code using raw sockets to inject packets into a physical device,
> >  can support injecting packets into host network stack almost without modification.
> >  First user of this interface will be vhost virtualization accelerator.
> 
> vhost injects packet into physical device, what is the use case of
> vhost injecting packet into the host network stack?

The case where the user requires bridging, typically.

> Or.

^ permalink raw reply

* Re: [PATCH RFC] tun: export underlying socket
From: Or Gerlitz @ 2009-09-14  8:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, m.s.tsirkin, netdev, herbert, Or Gerlitz
In-Reply-To: <20090910125929.GA32593@redhat.com>

On 9/10/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>  This way, code using raw sockets to inject packets into a physical device,
>  can support injecting packets into host network stack almost without modification.
>  First user of this interface will be vhost virtualization accelerator.

vhost injects packet into physical device, what is the use case of
vhost injecting packet into the host network stack?

Or.

^ permalink raw reply

* Re: [PATCH RFC] tun: export underlying socket
From: Or Gerlitz @ 2009-09-14  8:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paul Moore, David Miller, netdev, herbert, Or Gerlitz
In-Reply-To: <20090911053610.GA10324@redhat.com>

On 9/11/09, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Fri, Sep 11, 2009 at 12:17:27AM -0400, Paul Moore wrote:
>>> No comments on the code at this point - I'm just trying to understand the
>>> intended user right now which I'm assuming is the vhost-net bits you sent previously

> More specifically, vhost would then be patched with:
>
>  diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>  index aeffb3a..b54f9d6 100644
>  --- a/drivers/vhost/net.c
>  +++ b/drivers/vhost/net.c
>  @@ -331,15 +331,26 @@ err:
>         return ERR_PTR(r);
>   }
>
>  +static struct socket *get_tun_socket(int fd)
>  +{
>  +       struct file *file = fget(fd);
>  +       if (!file)
>  +               return ERR_PTR(-EBADF);
>  +       return tun_get_socket(file);
>  +}
>  +
>   static struct socket *get_socket(int fd)

Michael,

your latest posting "[PATCHv5 3/3] vhost_net: a kernel-level virtio
server" doesn't have a function named get_socket, so I don't see how
one can really get what you are up with this snip from vhost/net.c

Or.

>   {
>         struct socket *sock;
>         sock = get_raw_socket(fd);
>
>         if (!IS_ERR(sock))
>                 return sock;
>
> +       sock = get_tun_socket(fd);
>  +       if (!IS_ERR(sock))
>  +               return sock;
>         return ERR_PTR(-ENOTSOCK);
>   }
>
>   static long vhost_net_set_socket(struct vhost_net *n, int fd)
>   {
>         struct socket *sock, *oldsock = NULL;
>

^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Michael S. Tsirkin @ 2009-09-14  7:05 UTC (permalink / raw)
  To: Xin, Xiaohui
  Cc: Ira W. Snyder, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, linux-mm@kvack.org,
	akpm@linux-foundation.org, hpa@zytor.com,
	gregory.haskins@gmail.com, Rusty Russell, s.hetze@linux-ag.com,
	avi@redhat.com
In-Reply-To: <C85CEDA13AB1CF4D9D597824A86D2B9006AECB9FE7@PDSMSX501.ccr.corp.intel.com>

On Mon, Sep 14, 2009 at 01:57:06PM +0800, Xin, Xiaohui wrote:
> >The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
> >git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> >
> >I expect them to be merged by 2.6.32-rc1 - right, Avi?
> 
> Michael,
> 
> I think I have the kernel patch for kvm_irqfd and kvm_ioeventfd, but missed the qemu side patch for irqfd and ioeventfd.
> 
> I met the compile error when I compiled virtio-pci.c file in qemu-kvm like this:
> 
> /root/work/vmdq/vhost/qemu-kvm/hw/virtio-pci.c:384: error: `KVM_IRQFD` undeclared (first use in this function)
> /root/work/vmdq/vhost/qemu-kvm/hw/virtio-pci.c:400: error: `KVM_IOEVENTFD` undeclared (first use in this function)
> 
> Which qemu tree or patch do you use for kvm_irqfd and kvm_ioeventfd?

I'm using the headers from upstream kernel.
I'll send a patch for that.

> Thanks
> Xiaohui
> 
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com] 
> Sent: Sunday, September 13, 2009 1:46 PM
> To: Xin, Xiaohui
> Cc: Ira W. Snyder; netdev@vger.kernel.org; virtualization@lists.linux-foundation.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; mingo@elte.hu; linux-mm@kvack.org; akpm@linux-foundation.org; hpa@zytor.com; gregory.haskins@gmail.com; Rusty Russell; s.hetze@linux-ag.com; avi@redhat.com
> Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
> 
> On Fri, Sep 11, 2009 at 11:17:33PM +0800, Xin, Xiaohui wrote:
> > Michael,
> > We are very interested in your patch and want to have a try with it.
> > I have collected your 3 patches in kernel side and 4 patches in queue side.
> > The patches are listed here:
> > 
> > PATCHv5-1-3-mm-export-use_mm-unuse_mm-to-modules.patch
> > PATCHv5-2-3-mm-reduce-atomic-use-on-use_mm-fast-path.patch
> > PATCHv5-3-3-vhost_net-a-kernel-level-virtio-server.patch
> > 
> > PATCHv3-1-4-qemu-kvm-move-virtio-pci[1].o-to-near-pci.o.patch
> > PATCHv3-2-4-virtio-move-features-to-an-inline-function.patch
> > PATCHv3-3-4-qemu-kvm-vhost-net-implementation.patch
> > PATCHv3-4-4-qemu-kvm-add-compat-eventfd.patch
> > 
> > I applied the kernel patches on v2.6.31-rc4 and the qemu patches on latest kvm qemu.
> > But seems there are some patches are needed at least irqfd and ioeventfd patches on
> > current qemu. I cannot create a kvm guest with "-net nic,model=virtio,vhost=vethX".
> > 
> > May you kindly advice us the patch lists all exactly to make it work?
> > Thanks a lot. :-)
> > 
> > Thanks
> > Xiaohui
> 
> 
> The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> 
> I expect them to be merged by 2.6.32-rc1 - right, Avi?
> 
> -- 
> MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Xin, Xiaohui @ 2009-09-14  5:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ira W. Snyder, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, linux-mm@kvack.org,
	akpm@linux-foundation.org, hpa@zytor.com,
	gregory.haskins@gmail.com, Rusty Russell, s.hetze@linux-ag.com,
	avi@redhat.com
In-Reply-To: <20090913054610.GA4446@redhat.com>

>The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
>git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
>
>I expect them to be merged by 2.6.32-rc1 - right, Avi?

Michael,

I think I have the kernel patch for kvm_irqfd and kvm_ioeventfd, but missed the qemu side patch for irqfd and ioeventfd.

I met the compile error when I compiled virtio-pci.c file in qemu-kvm like this:

/root/work/vmdq/vhost/qemu-kvm/hw/virtio-pci.c:384: error: `KVM_IRQFD` undeclared (first use in this function)
/root/work/vmdq/vhost/qemu-kvm/hw/virtio-pci.c:400: error: `KVM_IOEVENTFD` undeclared (first use in this function)

Which qemu tree or patch do you use for kvm_irqfd and kvm_ioeventfd?

Thanks
Xiaohui

-----Original Message-----
From: Michael S. Tsirkin [mailto:mst@redhat.com] 
Sent: Sunday, September 13, 2009 1:46 PM
To: Xin, Xiaohui
Cc: Ira W. Snyder; netdev@vger.kernel.org; virtualization@lists.linux-foundation.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; mingo@elte.hu; linux-mm@kvack.org; akpm@linux-foundation.org; hpa@zytor.com; gregory.haskins@gmail.com; Rusty Russell; s.hetze@linux-ag.com; avi@redhat.com
Subject: Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

On Fri, Sep 11, 2009 at 11:17:33PM +0800, Xin, Xiaohui wrote:
> Michael,
> We are very interested in your patch and want to have a try with it.
> I have collected your 3 patches in kernel side and 4 patches in queue side.
> The patches are listed here:
> 
> PATCHv5-1-3-mm-export-use_mm-unuse_mm-to-modules.patch
> PATCHv5-2-3-mm-reduce-atomic-use-on-use_mm-fast-path.patch
> PATCHv5-3-3-vhost_net-a-kernel-level-virtio-server.patch
> 
> PATCHv3-1-4-qemu-kvm-move-virtio-pci[1].o-to-near-pci.o.patch
> PATCHv3-2-4-virtio-move-features-to-an-inline-function.patch
> PATCHv3-3-4-qemu-kvm-vhost-net-implementation.patch
> PATCHv3-4-4-qemu-kvm-add-compat-eventfd.patch
> 
> I applied the kernel patches on v2.6.31-rc4 and the qemu patches on latest kvm qemu.
> But seems there are some patches are needed at least irqfd and ioeventfd patches on
> current qemu. I cannot create a kvm guest with "-net nic,model=virtio,vhost=vethX".
> 
> May you kindly advice us the patch lists all exactly to make it work?
> Thanks a lot. :-)
> 
> Thanks
> Xiaohui


The irqfd/ioeventfd patches are part of Avi's kvm.git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git

I expect them to be merged by 2.6.32-rc1 - right, Avi?

-- 
MST

^ permalink raw reply

* Re: [PATCH 1/8] networking/fanotify: declare fanotify socket numbers
From: Eric Paris @ 2009-09-14  1:26 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: jamal, David Miller, linux-kernel, linux-fsdevel, netdev, viro,
	alan, hch, balbir
In-Reply-To: <20090914000303.GA30621@shareable.org>

On Mon, 2009-09-14 at 01:03 +0100, Jamie Lokier wrote:
> jamal wrote:
> > On Fri, 2009-09-11 at 22:42 +0100, Jamie Lokier wrote:

First let me state the 2 main new things fanotify gives so neither is
lost.

#1 fanotify implements the same basic thing as inotify except rather
than an arbitrary number (inotify speak a watch descriptor) which
userspace has to somehow convert back into a file, fanotify gives
userspace an open file descriptor to that object.  (this is the part
that requires recv side processing)

#2 fanotify allows the userspace 'listener' or 'group' (I use group to
describe it's actions in kernel and listener to describe it's action in
userspace) to request that it be allowed to arbitrate open and access
(read) security decisions.

> > > Eric's explained that it would be normal for _every_ file operation on
> > > some systems to trigger a fanotify event and possibly wait on the
> > > response, or at least in major directory trees on the filesystem.
> > > Even if it's just for the fanotify app to say "oh I don't care about
> > > that file, carry on".
> > > 
> > 
> > That doesnt sound very scalable. Should it not be you get nothing unless
> > you register for interest in something?
> 
> You do get nothing unless you register interest.  The problem is
> there's no way to register interest on just a subtree, so the fanotify
> approach is let you register for events on the whole filesystem, and
> let the userspace daemon filter paths.  At least it's decisions can be
> cached, although I'm not sure how that works when multiple processes
> want to monitor overlapping parts of the filesystem.

fanotify provides 3 options to register.
1) this inode
2) this dir and it's children
3) all files on the whole fscking system

this patch only does #1 and #2.  After it's in I'm going to take a
serious look at #4 subtrees.

Responses to access decisions are cached and checked in kernel per
fanotify listener.  So listener 1 can ignore requests for a given inode
while listener 2 still gets notification and forces the original process
to block.

> It doesn't sound scalable to me, either, and that's why I don't like
> this part, and described a solution to monitoring subtrees - which
> would also solve the problem for inotify.  (Both use fsnotify under
> the hood, and that's where subtree notification would go).

Subtree checking hasn't seen and work from me but it is something I plan
to work on.  And one thing that makes me scared to tie myself to
syscalls when I already have something that works relatively cleanly and
easily.

> Eric's mentioned interest in a way to monitor subtrees, but that
> hasn't gone anywhere as far as I know.  He doesn't seem convinced by
> my solution - or even that scalability will be an issue.  I think
> there's a bit of vision lacking here, and I'll admit I'm more
> interested in the inotify uses of fsnotify (being able to detect
> changes) than the fanotify uses (being able to _block_ or _modify_
> changes).  I think both inotify and fanotify ought to benefit from the
> same improvements to file monitoring.

sort of agree with you here.  anything that gets added to support
subtrees would have to be in the generic code.  Although I question how
inotify could be used, as a wd is not (in my mind) a reasonable way to
tell userspace about files.  (and with subtrees it would be a wd and a
pathname....)    I think fanotify with notification only (what I'm
giving in this patch series is a much better fit for subtree
notification.

> > > File performance is one of those things which really needs to be fast
> > > for a good user experience - and it's not unusual to grep the odd
> > > 10,000 files here or there (just think of what a kernel developer
> > > does), or to replace a few thousand quickly (rpm/dpkg) and things like
> > > that.
> > > 
> > 
> > So grepping 10000 files would cause 10000 events? I am not sure how the
> > scheme works; filtering of what events get delivered sounds more
> > reasonable if it happens in the kernel.
> 
> I believe it would cause 10000 events, yes, even if they are files
> that userspace policy is not interested in.  Eric, is that right?

If fanotify wants it, yes, that's exactly what happens.

> However I believe after the first grep, subsequent greps' decisions
> would be cached by marking the inodes.  I'm not sure what happens if
> two fanotify monitors both try marking the inodes.

Each can mark individually.

> Arguably if a fanotify monitor is running before those files are in
> page cache anyway, then I/O may dominate, and when the files are
> cached, fanotify has already cached it's decisions in the kernel.
> However fanotify is synchronous: each new file access involves a round
> trip to the fanotify userspace and back before it can proceed, so
> there's quite a lot of IPC and scheduling too.  Without testing, it's
> hard to guess how it'll really perform.

As I recall my old old tests on a 32 way system showed around a 10%
performance penalty when building a kernel when making userspace
arbitrate decisions and the cache was blank.  So yes, there is a serious
performance hit to making a userspace application control access
decisions.  Then again, I'd rather not have those people who need this
system wide access controls to do it in the kernel (anti-malware
vendors)

I believe that people who chose to use this interface will have to
realize there is a severe up front performance penalty.  On a steady
state system like a web server you'd see near 0% performance (a new srcu
lock, inode->i_lock, and running a short list)  But yes, controling
access to every file on a system eats performance, that's the nature of
the beast.

> > Theres a difference between events which are abbreviated in the form
> > "hey some read happened on fd you are listening on" vs "hey a read
> > of file X for 16 bytes at offset 200 by process Y just occured while
> > at the same time process Z was writting at offset 2000". The later
> > (which netlink will give you) includes a lot more attribute details
> > which could be filtered or can be extended to include a lot
> > more. The former(what epoll will give you) is merely a signal.

> But this part is irrelevant to fanotify, because there's no plan or
> intention to provide that much detail about I/O.

We have ZERO plan to include ordering.  ZERO.  inotify sorta pretends it
deals with ordering by only dropping a notification if it is the same as
the last one in the queue.  fanotify will gladly merge events which
exist anywhere in the queue clearly throwing ordering to the wind.

We do plan to include the pid, uid, and gid of the process making the
original request.  We also plan to include the f_flags of the file in
the original process when possible.

^ permalink raw reply

* Re: [PATCH 1/8] networking/fanotify: declare fanotify socket numbers
From: Jamie Lokier @ 2009-09-14  0:17 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: Eric Paris, David Miller, linux-kernel, linux-fsdevel, netdev,
	viro, alan, hch
In-Reply-To: <20090912094110.GB24709@ioremap.net>

Evgeniy Polyakov wrote:
> On Fri, Sep 11, 2009 at 05:51:42PM -0400, Eric Paris (eparis@redhat.com) wrote:
> > For some things yes, some things no.  I'd have to understand where loss
> > can happen to know if it's feasible.  If I know loss happens in the
> > sender context that's great.  If it's somewhere in the middle and the
> > sender doesn't immediately know it'll never be delivered, yes, I don't
> > think it can solve all my needs.  How many places can and skb get lost
> > between the sender and the receiver?
> 
> When queue is full or you do not have enough RAM. Both are reported at
> 'sending' time.

Can you ->poll() and wait reliably until the queue will accept an skb?
(A few spurious EAGAINs/ENOBUFs is ok, as long as it's not the norm).

-- Jamie

^ permalink raw reply

* Re: [PATCH 1/8] networking/fanotify: declare fanotify socket numbers
From: Jamie Lokier @ 2009-09-14  0:03 UTC (permalink / raw)
  To: jamal
  Cc: Eric Paris, David Miller, linux-kernel, linux-fsdevel, netdev,
	viro, alan, hch, balbir
In-Reply-To: <1252709567.25158.61.camel@dogo.mojatatu.com>

jamal wrote:
> On Fri, 2009-09-11 at 22:42 +0100, Jamie Lokier wrote:
> 
> > One of the uses of fanotify is as a security or auditing mechanism.
> > That can't tolerate gaps.
> > 
> > It's fundemantally different from inotify in one important respect:
> > inotify apps can recover from losing events by checking what they are
> > watching.
> > 
> > The fanotify application will know that it missed events, but what
> > happens to the other application which _caused_ those events?  Does it
> > get to do things it shouldn't, or hide them from the fanotify app, by
> > simply overloading the system?  Or the opposite, does it get access
> > denied - spurious file errors when the system is overloaded?
> > 
> > There's no way to handle that by dropping events.  A transport
> > mechanism can be dropped (say skbs), but the event itself has to be
> > kept, and then retried.
> > 
> >
> > Since you have to keep an event object around until it's handled,
> > there's no point tying it to an unreliable delivery mechanism which
> > you'd have to wrap a retry mechanism around.
> > 
> > In other words, that part of netlink is a poor match.  It would match
> > inotify much better.
> 
> Reliability is something that you should build in. Netlink provides you
> all the necessary tools. What you are asking for here is essentially 
> reliable multicasting.

Almost.  It's reliable multicasting plus unicast responses which must
be waited for.  That changes things.

> You dont have infinite memory, therefore there
> will be times when you will overload one of the users, and they wont
> have sufficient buffer space and then you have to retransmit.

If you have enough memory to remember _what_ to retransmit, then you
have enough memory to buffer a fixed-size message.  It just depends on
how you do the buffering.  To say netlink drops the message and you
can retry is just saying that the buffering is happening one step
earlier, before netlink.  That's what I mean by netlink being a
pointless complication for this, because you can just as easily write
code which gets to the message to userspace without going through
netlink and with no chance of it being dropped.

> Is the current proposed mechanism capable of reliably multicasting
> without need for retransmit?

Yes.  It uses positive acknowledge and flow control, because these
match naturally with what fanotify does at the next higher level.

The process generating the multicast (e.g. trying to write a file) is
blocked until the receiver gets the message, handles it and
acknowledges with a "yes you can" or "no you can't" response.

That's part of fanotify's design.  The pattern conveniently has no
issues with using unbounded memory for message, because the sending
process is blocked.

> > Speaking of skbs, how fast and compact are they for this?
> 
> They are largish relative to say if you trimmed down to basic necessity.
> But then you get a lot of the buffer management aspects for free.
> In this case, the concept of multicasting is built in so for one event
> to be sent to X users - you only need one skb.

True you only need one skb.  But netlink doesn't handle waiting for
positive acknowledge responses from every receiver, and combining
their value, does it?  You can't really take advantage of netlink's
built in multicast, because to known when it has all the responses,
the fanotify layer has to track the subscriber list itself anyway.

What I'm saying is perhaps skbs are useful for fanotify, but I don't
know that netlink's multicasting is useful.  But storing the messages
in skbs for transmission, and using parts of netlink to manage them,
and to provide some of the API, that might be useful.

> > Eric's explained that it would be normal for _every_ file operation on
> > some systems to trigger a fanotify event and possibly wait on the
> > response, or at least in major directory trees on the filesystem.
> > Even if it's just for the fanotify app to say "oh I don't care about
> > that file, carry on".
> > 
> 
> That doesnt sound very scalable. Should it not be you get nothing unless
> you register for interest in something?

You do get nothing unless you register interest.  The problem is
there's no way to register interest on just a subtree, so the fanotify
approach is let you register for events on the whole filesystem, and
let the userspace daemon filter paths.  At least it's decisions can be
cached, although I'm not sure how that works when multiple processes
want to monitor overlapping parts of the filesystem.

It doesn't sound scalable to me, either, and that's why I don't like
this part, and described a solution to monitoring subtrees - which
would also solve the problem for inotify.  (Both use fsnotify under
the hood, and that's where subtree notification would go).

Eric's mentioned interest in a way to monitor subtrees, but that
hasn't gone anywhere as far as I know.  He doesn't seem convinced by
my solution - or even that scalability will be an issue.  I think
there's a bit of vision lacking here, and I'll admit I'm more
interested in the inotify uses of fsnotify (being able to detect
changes) than the fanotify uses (being able to _block_ or _modify_
changes).  I think both inotify and fanotify ought to benefit from the
same improvements to file monitoring.

> > File performance is one of those things which really needs to be fast
> > for a good user experience - and it's not unusual to grep the odd
> > 10,000 files here or there (just think of what a kernel developer
> > does), or to replace a few thousand quickly (rpm/dpkg) and things like
> > that.
> > 
> 
> So grepping 10000 files would cause 10000 events? I am not sure how the
> scheme works; filtering of what events get delivered sounds more
> reasonable if it happens in the kernel.

I believe it would cause 10000 events, yes, even if they are files
that userspace policy is not interested in.  Eric, is that right?

However I believe after the first grep, subsequent greps' decisions
would be cached by marking the inodes.  I'm not sure what happens if
two fanotify monitors both try marking the inodes.

Arguably if a fanotify monitor is running before those files are in
page cache anyway, then I/O may dominate, and when the files are
cached, fanotify has already cached it's decisions in the kernel.
However fanotify is synchronous: each new file access involves a round
trip to the fanotify userspace and back before it can proceed, so
there's quite a lot of IPC and scheduling too.  Without testing, it's
hard to guess how it'll really perform.

> > While skbs and netlink aren't that slow, I suspect they're an order of
> > magnitude or two slower than, say, epoll or inotify at passing events
> > around.
> 
> not familiar with inotify.

inotify is like dnotify, and like a signal or epoll: a message that
something happened.  You register interest in individual files or
directories only, and inotify does not (yet) provide a way to monitor
the whole filesystem or a subtree.

fanotify is different: it provides access control, and can _refuse_
attempts to read file X, or even modify the file before permitting the
file to be read.

> Theres a difference between events which are abbreviated in the form
> "hey some read happened on fd you are listening on" vs "hey a read
> of file X for 16 bytes at offset 200 by process Y just occured while
> at the same time process Z was writting at offset 2000". The later
> (which netlink will give you) includes a lot more attribute details
> which could be filtered or can be extended to include a lot
> more. The former(what epoll will give you) is merely a signal.

Firstly, it's really hard to retain the ordering of userspace events
like that in a useful way, given the non-determinstic parallelism
going on with multiple processes doing I/O do the same file :-)

Second, you can't really pump messages with that much detail into
netlink and let _it_ filter them to userspace; that would be too much
processing.  You'd have to have some way of not generating that much
detail except when it's been requested, and preferably only for files
you want it for.

But this part is irrelevant to fanotify, because there's no plan or
intention to provide that much detail about I/O.

If you want, feel free to provide a stracenotify subsystem to track
everything in detail :-)

-- Jamie

^ permalink raw reply

* [PATCH resend] sky2: Make sure both ports initialize correctly
From: Mike McCormack @ 2009-09-13 23:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Rework sky2_probe() so that both ports are allocated and deallocated in the
 same way.  Fail if allocating either port fails as both are assumed to be
 allocated in many places.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
---
 drivers/net/sky2.c |   71 +++++++++++++++++++++++++--------------------------
 1 files changed, 35 insertions(+), 36 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index c8ebc03..9c3fa49 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -4422,11 +4422,11 @@ static const char *sky2_name(u8 chipid, char *buf, int sz)
 static int __devinit sky2_probe(struct pci_dev *pdev,
 				const struct pci_device_id *ent)
 {
-	struct net_device *dev;
 	struct sky2_hw *hw;
 	int err, using_dac = 0, wol_default;
 	u32 reg;
 	char buf1[16];
+	int i;
 
 	err = pci_enable_device(pdev);
 	if (err) {
@@ -4512,17 +4512,30 @@ static int __devinit sky2_probe(struct pci_dev *pdev,
 
 	err = sky2_init(hw);
 	if (err)
-		goto err_out_iounmap;
+		goto err_out_free_pci;
 
 	dev_info(&pdev->dev, "Yukon-2 %s chip revision %d\n",
 		 sky2_name(hw->chip_id, buf1, sizeof(buf1)), hw->chip_rev);
 
 	sky2_reset(hw);
 
-	dev = sky2_init_netdev(hw, 0, using_dac, wol_default);
-	if (!dev) {
-		err = -ENOMEM;
-		goto err_out_free_pci;
+	for (i=0; i<hw->ports; i++)
+	{
+		struct net_device *dev;
+
+		dev = sky2_init_netdev(hw, 0, using_dac, wol_default);
+		if (!dev) {
+			err = -ENOMEM;
+			goto err_out_free_netdev;
+		}
+
+		err = register_netdev(dev);
+		if (err) {
+			dev_err(&pdev->dev, "cannot register net device\n");
+			goto err_out_free_netdev;
+		}
+
+		sky2_show_addr(dev);
 	}
 
 	if (!disable_msi && pci_enable_msi(pdev) == 0) {
@@ -4530,44 +4543,21 @@ static int __devinit sky2_probe(struct pci_dev *pdev,
 		if (err == -EOPNOTSUPP)
  			pci_disable_msi(pdev);
 		else if (err)
-			goto err_out_free_netdev;
+			goto err_out_disable_msi;
  	}
 
-	err = register_netdev(dev);
-	if (err) {
-		dev_err(&pdev->dev, "cannot register net device\n");
-		goto err_out_free_netdev;
-	}
-
-	netif_napi_add(dev, &hw->napi, sky2_poll, NAPI_WEIGHT);
+	netif_napi_add(hw->dev[0], &hw->napi, sky2_poll, NAPI_WEIGHT);
 
 	err = request_irq(pdev->irq, sky2_intr,
 			  (hw->flags & SKY2_HW_USE_MSI) ? 0 : IRQF_SHARED,
-			  dev->name, hw);
+			  hw->dev[0]->name, hw);
 	if (err) {
 		dev_err(&pdev->dev, "cannot assign irq %d\n", pdev->irq);
-		goto err_out_unregister;
+		goto err_out_disable_msi;
 	}
 	sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
 	napi_enable(&hw->napi);
 
-	sky2_show_addr(dev);
-
-	if (hw->ports > 1) {
-		struct net_device *dev1;
-
-		dev1 = sky2_init_netdev(hw, 1, using_dac, wol_default);
-		if (!dev1)
-			dev_warn(&pdev->dev, "allocation for second device failed\n");
-		else if ((err = register_netdev(dev1))) {
-			dev_warn(&pdev->dev,
-				 "register of second port failed (%d)\n", err);
-			hw->dev[1] = NULL;
-			free_netdev(dev1);
-		} else
-			sky2_show_addr(dev1);
-	}
-
 	setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
 	INIT_WORK(&hw->restart_work, sky2_restart);
 
@@ -4575,12 +4565,21 @@ static int __devinit sky2_probe(struct pci_dev *pdev,
 
 	return 0;
 
-err_out_unregister:
+err_out_disable_msi:
 	if (hw->flags & SKY2_HW_USE_MSI)
 		pci_disable_msi(pdev);
-	unregister_netdev(dev);
+
 err_out_free_netdev:
-	free_netdev(dev);
+	for (i=0; i<hw->ports; i++) {
+		struct net_device *dev = hw->dev[i];
+
+		if (dev) {
+			if (dev->reg_state == NETREG_REGISTERED)
+				unregister_netdev(dev);
+			free_netdev(dev);
+		}
+	}
+
 err_out_free_pci:
 	sky2_write8(hw, B0_CTST, CS_RST_SET);
 	pci_free_consistent(pdev, STATUS_LE_BYTES, hw->st_le, hw->st_dma);
-- 
1.5.6.5


^ permalink raw reply related

* [PATCH resend] sky2: Account for VLAN tag in tx_le_req
From: Mike McCormack @ 2009-09-13 23:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

With VLAN enabled, tx_le_req could return fewer list elements than required,
 leading to a potential wrap around of the transmit ring buffer.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
---
 drivers/net/sky2.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 00bc65a..c8ebc03 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1563,7 +1563,7 @@ static inline int tx_avail(const struct sky2_port *sky2)
 }
 
 /* Estimate of number of transmit list elements required */
-static unsigned tx_le_req(const struct sk_buff *skb)
+static unsigned tx_le_req(const struct sk_buff *skb, int vlan_tag)
 {
 	unsigned count;
 
@@ -1573,6 +1573,9 @@ static unsigned tx_le_req(const struct sk_buff *skb)
 	if (skb_is_gso(skb))
 		++count;
 
+	else if (sizeof(dma_addr_t) == sizeof(u32) && vlan_tag)
+		++count;
+
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
 		++count;
 
@@ -1611,8 +1614,13 @@ static netdev_tx_t sky2_xmit_frame(struct sk_buff *skb,
 	u16 slot;
 	u16 mss;
 	u8 ctrl;
+	unsigned vlan_tag = 0;
+
+#ifdef SKY2_VLAN_TAG_USED
+	vlan_tag = sky2->vlgrp && vlan_tx_tag_present(skb);
+#endif
 
- 	if (unlikely(tx_avail(sky2) < tx_le_req(skb)))
+ 	if (unlikely(tx_avail(sky2) < tx_le_req(skb, vlan_tag)))
   		return NETDEV_TX_BUSY;
 
 	len = skb_headlen(skb);
@@ -1657,7 +1665,7 @@ static netdev_tx_t sky2_xmit_frame(struct sk_buff *skb,
 	ctrl = 0;
 #ifdef SKY2_VLAN_TAG_USED
 	/* Add VLAN tag, can piggyback on LRGLEN or ADDR64 */
-	if (sky2->vlgrp && vlan_tx_tag_present(skb)) {
+	if (vlan_tag) {
 		if (!le) {
 			le = get_tx_le(sky2, &slot);
 			le->addr = 0;
-- 
1.5.6.5



^ permalink raw reply related

* Re: [PATCH 5/5] Updating documentation accordingly
From: Gerrit Renker @ 2009-09-13 18:50 UTC (permalink / raw)
  To: Ivo Calado; +Cc: dccp, netdev
In-Reply-To: <4AA6A26A.6080808@embedded.ufcg.edu.br>

| Updating documentation accordingly

+/* As defined at section 8.6.1. of RFC 4342 */
 #define TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH	28
+/* Specified on section 8.7. of CCID4 draft */
 #define TFRC_DROP_OPT_MAX_LENGTH		84
Need to update the references to CCID4 draft with RFC 5622. See also 4/5.

It would be more helpful to include the documentation for structs with
the patches that use these structs.

^ permalink raw reply

* Re: [PATCH 4/5] Adds options DROPPED PACKETS and LOSS INTERVALS to receiver
From: Gerrit Renker @ 2009-09-13 18:41 UTC (permalink / raw)
  To: Ivo Calado; +Cc: dccp, netdev
In-Reply-To: <4AA6A267.7080500@embedded.ufcg.edu.br>

| Adds options DROPPED PACKETS and LOSS INTERVALS to receiver. In this patch is added the
| mechanism of gathering information about loss intervals and storing it, for later
| construction of these two options.
This also needs some more work, please see inline.

--- dccp_tree_work5.orig/net/dccp/ccids/lib/packet_history_sp.c
+++ dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.c
@@ -233,7 +233,9 @@
 }
 
 /* return 1 if a new loss event has been identified */
-static int __two_after_loss(struct tfrc_rx_hist *h, struct sk_buff *skb, u32 n3)
+static int __two_after_loss(struct tfrc_rx_hist *h,
+			    struct sk_buff *skb, u32 n3,
+			    bool *new_loss)
 {
 	u64 s0 = tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno,
 	    s1 = tfrc_rx_hist_entry(h, 1)->tfrchrx_seqno,
@@ -245,6 +247,7 @@
 		tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 3),
 					       skb, n3);
 		h->num_losses = dccp_loss_count(s2, s3, n3);
+		*new_loss = 1;
 		return 1;
 	}
 
@@ -259,6 +262,7 @@
 					       skb, n3);
 		h->loss_count = 3;
 		h->num_losses = dccp_loss_count(s1, s3, n3);
+		*new_loss = 1;
 		return 1;
 	}
 
@@ -284,6 +288,7 @@
 			tfrc_sp_rx_hist_entry_from_skb(
 					tfrc_rx_hist_loss_prev(h), skb, n3);
 
+		*new_loss = 0;
 		return 0;
 	}
 
@@ -297,6 +302,7 @@
 	h->loss_count = 3;
 	h->num_losses = dccp_loss_count(s0, s3, n3);
 
+	*new_loss = 1;
 	return 1;
 }
The 'new_loss' parameter is completely redundant, since it duplicates the
return value of the function.


@@ -348,11 +354,14 @@
  *  operations when loss_count is greater than 0 after calling this function.
  */
 bool tfrc_sp_rx_congestion_event(struct tfrc_rx_hist *h,
-			      struct tfrc_loss_hist *lh,
-	 struct sk_buff *skb, const u64 ndp,
-  u32 (*first_li)(struct sock *), struct sock *sk)
+				 struct tfrc_loss_hist *lh,
+				 struct tfrc_loss_data *ld,
+				 struct sk_buff *skb, const u64 ndp,
+				 u32 (*first_li)(struct sock *),
+				 struct sock *sk)
 {
 	bool new_event = false;
+	bool new_loss = false;
 
 	if (tfrc_sp_rx_hist_duplicate(h, skb))
 		return 0;
@@ -365,12 +374,13 @@
 		__one_after_loss(h, skb, ndp);
 	} else if (h->loss_count != 2) {
 		DCCP_BUG("invalid loss_count %d", h->loss_count);
-	} else if (__two_after_loss(h, skb, ndp)) {
+	} else if (__two_after_loss(h, skb, ndp, &new_loss)) {
 		/*
 		* Update Loss Interval database and recycle RX records
 		*/
 		new_event = tfrc_sp_lh_interval_add(lh, h, first_li, sk,
 						dccp_hdr(skb)->dccph_ccval);
+		tfrc_sp_update_li_data(ld, h, skb, new_loss, new_event);
 		__three_after_loss(h);
 
 	} else if (dccp_data_packet(skb) && dccp_skb_is_ecn_ce(skb)) {
@@ -396,6 +406,8 @@
 		}
 	}
 
+	tfrc_sp_update_li_data(ld, h, skb, new_loss, new_event);
+
I don't understand why 'tfrc_sp_update_li_data' is called twice, one call seems
to be redundant. What it seems to be wanting to do is
	bool new_loss = false;

	//...
	} else if (__two_after_loss(h, skb, ndp)) {
		new_loss  = true;
		new_event = tfrc_sp_lh_interval_add(...);
		// ...
	}
	// ...
	tfrc_sp_update_li_data(ld, h, skb, new_loss, new_event);
	


--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.c
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.c
 
+void tfrc_sp_ld_cleanup(struct tfrc_loss_data *ld)
+{
+	struct tfrc_loss_data_entry *next, *h = ld->head;
+
+	if (!h)
+		return;
+
+	while (h) {
+		next = h->next;
+		kmem_cache_free(tfrc_ld_slab, h);
+		h = next;
+	}
+
+	ld->head = NULL;
+	ld->counter = 0;
+}
The if(!h) statement is not needed above and prevents resetting 
the head/counter values.


+void tfrc_sp_ld_prepare_data(u8 loss_count, struct tfrc_loss_data *ld)
+{
+	u8 *li_ofs, *d_ofs;
+	struct tfrc_loss_data_entry *e;
+	u16 count;
+
+	li_ofs = &ld->loss_intervals_opts[0];
+	d_ofs = &ld->drop_opts[0];
+
+	count = 0;
+	e = ld->head;
+
+	*li_ofs = loss_count + 1;
+	li_ofs++;
+
+	while (e != NULL) {
+
+		if (count < TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH) {
+			*li_ofs = ((htonl(e->lossless_length) & 0x00FFFFFF)<<8);
+			li_ofs += 3;
+			*li_ofs = ((e->ecn_nonce_sum&0x1) << 31) &
+				  (htonl((e->loss_length & 0x00FFFFFF))<<8);
==> This does not seem right: mixing htonl with non-htonl data, using '&' where
    probably '|' is meant.
+			li_ofs += 3;
+			*li_ofs = ((htonl(e->data_length) & 0x00FFFFFF)<<8);
==> I think you mean "e->data_length & 0xFFFFFF".

+			li_ofs += 3;
+		}
+
+		if (count < TFRC_DROP_OPT_MAX_LENGTH) {
+			*d_ofs = (htonl(e->drop_count) & 0x00FFFFFF)<<8;
+			d_ofs += 3;
+		}
+
+		if ((count >= TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH) &&
+		    (count >= TFRC_DROP_OPT_MAX_LENGTH))
+			break;
==> This could be better handled using 'else' cases above, it seems the '&&' means '||'.
+
+		count++;
+		e = e->next;
+	}
+}

+void tfrc_sp_update_li_data(struct tfrc_loss_data *ld,
+			    struct tfrc_rx_hist *rh,
+			    struct sk_buff *skb,
+			    bool new_loss, bool new_event)
+{
+	struct tfrc_loss_data_entry *new, *h;
+
+	if (!dccp_data_packet(skb))
+		return;
+
+	if (ld->head == NULL) {
+		new = tfrc_ld_add_new(ld);
+		if (unlikely(new == NULL)) {
+			DCCP_CRIT("Cannot allocate new loss data registry.");
+			return;
+		}
+
+		if (new_loss) {
+			new->drop_count = rh->num_losses;
+			new->lossless_length = 1;
+			new->loss_length = rh->num_losses;
+
+			if (dccp_data_packet(skb))
+				new->data_length = 1;
+
+			if (dccp_data_packet(skb) && dccp_skb_is_ecn_ect1(skb))
+				new->ecn_nonce_sum = 1;
+			else
+				new->ecn_nonce_sum = 0;
+		} else {
+			new->drop_count = 0;
+			new->lossless_length = 1;
+			new->loss_length = 0;
+
+			if (dccp_data_packet(skb))
+				new->data_length = 1;
+
+			if (dccp_data_packet(skb) && dccp_skb_is_ecn_ect1(skb))
+				new->ecn_nonce_sum = 1;
+			else
+				new->ecn_nonce_sum = 0;
+		}
+
+		return;
+	}
+
+	if (new_event) {
+		new = tfrc_ld_add_new(ld);
+		if (unlikely(new == NULL)) {
+			DCCP_CRIT("Cannot allocate new loss data registry. \
+					Cleaning up.");
+			tfrc_sp_ld_cleanup(ld);
+			return;
+		}
+
+		new->drop_count = rh->num_losses;
+		new->lossless_length = (ld->last_loss_count - rh->loss_count);
+		new->loss_length = rh->num_losses;
+
+		new->ecn_nonce_sum = 0;
+		new->data_length = 0;
+
+		while (ld->last_loss_count > rh->loss_count) {
+			ld->last_loss_count--;
+
+			if (ld->sto_is_data & (1 << (ld->last_loss_count))) {
+				new->data_length++;
+
+				if (ld->sto_ecn & (1 << (ld->last_loss_count)))
+					new->ecn_nonce_sum =
+						!new->ecn_nonce_sum;
+			}
+		}
+
+		return;
+	}
+
+	h = ld->head;
+
+	if (rh->loss_count > ld->last_loss_count) {
+		ld->last_loss_count = rh->loss_count;
+
+		if (dccp_data_packet(skb))
+			ld->sto_is_data |= (1 << (ld->last_loss_count - 1));
+
+		if (dccp_skb_is_ecn_ect1(skb))
+			ld->sto_ecn |= (1 << (ld->last_loss_count - 1));
+
+		return;
+	}
+
+	if (new_loss) {
+		h->drop_count += rh->num_losses;
+		h->lossless_length = (ld->last_loss_count - rh->loss_count);
+		h->loss_length += h->lossless_length + rh->num_losses;
+
+		h->ecn_nonce_sum = 0;
+		h->data_length = 0;
+
+		while (ld->last_loss_count > rh->loss_count) {
+			ld->last_loss_count--;
+
+			if (ld->sto_is_data&(1 << (ld->last_loss_count))) {
+				h->data_length++;
+
+				if (ld->sto_ecn & (1 << (ld->last_loss_count)))
+					h->ecn_nonce_sum = !h->ecn_nonce_sum;
+			}
+		}
+
+		return;
+	}
+
+	if (ld->last_loss_count > rh->loss_count) {
+		while (ld->last_loss_count > rh->loss_count) {
+			ld->last_loss_count--;
+
+			h->lossless_length++;
+
+			if (ld->sto_is_data & (1 << (ld->last_loss_count))) {
+				h->data_length++;
+
+				if (ld->sto_ecn & (1 << (ld->last_loss_count)))
+					h->ecn_nonce_sum = !h->ecn_nonce_sum;
+			}
+		}
+
+		return;
+	}
+
+	h->lossless_length++;
+
+	if (dccp_data_packet(skb)) {
+		h->data_length++;
+
+		if (dccp_skb_is_ecn_ect1(skb))
+			h->ecn_nonce_sum = !h->ecn_nonce_sum;
+	}
+}
Due to the 
       if (!dccp_data_packet(skb))
               return;
at the begin of the function, all subsequent 'dccp_data_packet(skb)' are unnecessary.
Almost every 'if' statement ends in 'return', this seems ad-hoc and could be reduced
by adding if-else-if-else-if..., which would probably also reduce the code duplication.


@@ -244,8 +463,11 @@
 	tfrc_lh_slab = kmem_cache_create("tfrc_sp_li_hist",
 					 sizeof(struct tfrc_loss_interval), 0,
 					 SLAB_HWCACHE_ALIGN, NULL);
+	tfrc_ld_slab = kmem_cache_create("tfrc_sp_li_data",
+					 sizeof(struct tfrc_loss_data_entry), 0,
+					 SLAB_HWCACHE_ALIGN, NULL);
 
-	if ((tfrc_lh_slab != NULL))
+	if ((tfrc_lh_slab != NULL) || (tfrc_ld_slab != NULL))
 		return 0;
 
 	if (tfrc_lh_slab != NULL) {
@@ -253,6 +475,11 @@
 		tfrc_lh_slab = NULL;
 	}
 
+	if (tfrc_ld_slab != NULL) {
+		kmem_cache_destroy(tfrc_ld_slab);
+		tfrc_ld_slab = NULL;
+	}
+
 	return -ENOBUFS;
 }
The condition above should be '&&', not '||'. Suggested alternative:

+	if (tfrc_lh_slab == NULL)
+		goto lh_failed;
+
+	tfrc_ld_slab = kmem_cache_create("tfrc_sp_li_data",
+					 sizeof(struct tfrc_loss_data_entry), 0,
+					 SLAB_HWCACHE_ALIGN, NULL);
+	if (tfrc_ld_slab != NULL)
+		return 0;
+
+	kmem_cache_destroy(tfrc_lh_slab);
+	tfrc_lh_slab = NULL;
+lh_failed:
+	return -ENOBUFS;
 }
 



--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.h
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h
@@ -70,13 +70,52 @@
 struct tfrc_rx_hist;
 #endif
 
+struct tfrc_loss_data_entry {
+	struct tfrc_loss_data_entry	*next;
+	u32				lossless_length:24;
+	u8				ecn_nonce_sum:1;
+	u32				loss_length:24;
+	u32				data_length:24;
+	u32				drop_count:24;
+};
According to RFC 4342, 8.6.1, Loss Length is a 23-bit number, i.e.
	u32				loss_length:23;




+#define TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH	28
+#define TFRC_DROP_OPT_MAX_LENGTH		84
+#define TFRC_LI_OPT_SZ	\
+	(2 + TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH*9)
+#define TFRC_DROPPED_OPT_SZ \
+	(1 + TFRC_DROP_OPT_MAX_LENGTH*3)
It would be good to have a reminder where the numbers come from, i.e.
 * the "28" comes from RFC 4342, 8.6
 * the "84" comes from RFC 5622, 8.7
 * the "9" again is from RFC 4342, 8.6
 * the 1 + TFRC_DROP_OPT_MAX_LENGTH*3 = DCCP_SINGLE_OPT_MAXLEN (linux/dccp.h)


+struct tfrc_loss_data {
+	struct tfrc_loss_data_entry	*head;
+	u16				counter;
+	u8				loss_intervals_opts[TFRC_LI_OPT_SZ];
+	u8				drop_opts[TFRC_DROPPED_OPT_SZ];
+	u8				last_loss_count;
+	u8				sto_ecn;
+	u8				sto_is_data;
+};



+static inline void tfrc_ld_init(struct tfrc_loss_data *ld)
+{
+	memset(ld, 0, sizeof(struct tfrc_loss_data));
+}
A tip from CodingStyle - using "sizeof(*ld)" will continue to work if there
are changes in the interface.


--- dccp_tree_work5.orig/net/dccp/dccp.h
+++ dccp_tree_work5/net/dccp/dccp.h
@@ -403,6 +403,16 @@
 	return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_CE;
 }
 
+static inline bool dccp_skb_is_ecn_ect0(const struct sk_buff *skb)
+{
+	return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_ECT_0;
+}
+
+static inline bool dccp_skb_is_ecn_ect1(const struct sk_buff *skb)
+{
+	return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_ECT_0;
+}
The routines are not needed, because the dccpd_ecn field is (deliberately) only
2 bits wide. In the second case it should have been INET_ECN_ECT_1.

^ permalink raw reply

* Re: [PATCH 3/5] Implement TFRC-SP calc of mean length of loss intervals, accordingly to section 3 of RFC 4828
From: Gerrit Renker @ 2009-09-13 17:28 UTC (permalink / raw)
  To: Ivo Calado; +Cc: dccp, netdev
In-Reply-To: <4AA6A263.8000106@embedded.ufcg.edu.br>

| Implement TFRC-SP calc of mean length of loss intervals accordingly to
| section 3 of RFC 4828
Sorry this also has problems.

--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.c
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.c
@@ -67,10 +67,11 @@
 		}
 }
 
-static void tfrc_sp_lh_calc_i_mean(struct tfrc_loss_hist *lh)
+static void tfrc_sp_lh_calc_i_mean(struct tfrc_loss_hist *lh, __u8 curr_ccval)
 {
 	u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0;
 	int i, k = tfrc_lh_length(lh) - 1; /* k is as in rfc3448bis, 5.4 */
+	u32 losses;
 
 	if (k <= 0)
 		return;
@@ -78,6 +79,15 @@
 	for (i = 0; i <= k; i++) {
 		i_i = tfrc_lh_get_interval(lh, i);
 
+		if (SUB16(curr_ccval,
+		    tfrc_lh_get_loss_interval(lh, i)->li_ccval) <= 8) {
+
+			losses = tfrc_lh_get_loss_interval(lh, i)->li_losses;
+
+			if (losses > 0)
+				i_i = div64_u64(i_i, losses);
+		}
+
(Both 'i_i' and 'losses' are u32, so could write "i_i /= losses" instead.)

However, this computation is done in the wrong place: here it is already too
late. The N/K in RFC 4828 refers to entering each individual interval I_0, so
the division (and the check whether this is a "short" interval) needs to be
done when adding the interval, in tfrc_sp_lh_interval_add(). (There also exists
a special rule for the open interval I_0, see RFC 5348, 5.3.)

A second problem is a divide-by-zero condition encoded in the above: if
i_i < losses, the result is 0, if all intervals are 0 then I_mean = 0, and
thus p = 1/I_mean triggers a panic. In all other cases, the integer arithmetic
has the effect of floor(N/K), i.e. it decreases the interval length.

A way out (which does not fix the truncation problem) is to round up:

	if (losses > 0)
		i_i = DIV_ROUND_UP(i_i, losses);

In fact, we have to do this, to avoid the divide-by-zero condition.

The third problem is that the CCVal can be wrong, i.e. "less than 8" can
also mean that it just wrapped around one or more times. But this is noted
already in RFC 5622, it is a problem of the specification.

@@ -87,6 +97,11 @@
 	}
 
 	lh->i_mean = max(i_tot0, i_tot1) / w_tot;
+	BUG_ON(w_tot == 0);
+	if (SUB16(curr_ccval, tfrc_lh_get_loss_interval(lh, 0)->li_ccval) > 8)
+		lh->i_mean = max(i_tot0, i_tot1) / w_tot;
+	else
+		lh->i_mean = i_tot1 / w_tot;
 }
The line above the BUG_ON is probably a leftover. For the BUG_ON, this would
be too late (division by zero). 
In fact, the BUG_ON is not needed, due to testing for k <= 0, see
http://mirror.celinuxforum.org/gitstat//commit-detail.php?commit=eff253c4272cd2aac95ccff46d3d2e1a495f22b1

@@ -127,7 +142,7 @@
 		return;
 
 	cur->li_length = len;
-	tfrc_sp_lh_calc_i_mean(lh);
+	tfrc_sp_lh_calc_i_mean(lh, dccp_hdr(skb)->dccph_ccval);
 }
Should be division as per above.

--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.h
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h
@@ -73,7 +73,8 @@
 extern bool tfrc_sp_lh_interval_add(struct tfrc_loss_hist *,
 				    struct tfrc_rx_hist *,
 				    u32 (*first_li)(struct sock *),
-				    struct sock *);
+				    struct sock *,
+				    __u8 ccval);
This function already has the ccval available, so could use it instead
of just passing it through.

^ permalink raw reply

* Re: [PATCH 2/5] Implement loss counting on TFRC-SP receiver
From: Gerrit Renker @ 2009-09-13 16:12 UTC (permalink / raw)
  To: Ivo Calado; +Cc: dccp, netdev
In-Reply-To: <4AA6A25F.4060100@embedded.ufcg.edu.br>

| Implement loss counting on TFRC-SP receiver. 
| Consider transmission's hole size as loss count.
The implementation of loss counts as the basis for Dropped Packet option has
problems and can in its current form not be used. Please see below.


--- dccp_tree_work4.orig/net/dccp/ccids/lib/loss_interval_sp.c
+++ dccp_tree_work4/net/dccp/ccids/lib/loss_interval_sp.c
@@ -187,6 +187,7 @@
 		s64 len = dccp_delta_seqno(cur->li_seqno, cong_evt_seqno);
 		if ((len <= 0) ||
 		    (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval))) {
+			cur->li_losses += rh->num_losses;
 			return false;
 		}
This has a multiplicative effect, since rh->num_losses is added to cur->li_losses
each time the condition is evaluated. E.g. if 3 times in a row reordered (earlier)
sequence numbers arrive, or if the CCvals do not differ (high-speed networks),
we end up with 3 * rh->num_losses, which can't be correct.



--- dccp_tree_work4.orig/net/dccp/ccids/lib/packet_history_sp.c
+++ dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.c
@@ -244,6 +244,7 @@
 		h->loss_count = 3;
 		tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 3),
 					       skb, n3);
+		h->num_losses = dccp_loss_count(s2, s3, n3);
 		return 1;
 	}
This only measures the gap between s2 and s3, but the "hole" starts at s0,
so it would need to be dccp_loss_count(s0, s3, n3). Algorithm is documented at
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
ccid3/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
 
@@ -257,6 +258,7 @@
 		tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 2),
 					       skb, n3);
 		h->loss_count = 3;
+		h->num_losses = dccp_loss_count(s1, s3, n3);
 		return 1;
 	}
In this case, the "hole" is between s0 and s2, it is case VI(d) in the above.
 
@@ -293,6 +295,7 @@
 	h->loss_start = tfrc_rx_hist_index(h, 3);
 	tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 1), skb, n3);
 	h->loss_count = 3;
+	h->num_losses = dccp_loss_count(s0, s3, n3);
 
 	return 1;
 }
Here it is also between s0 and s2, not between s0 and s3. It is case VI(c.3).

However, the above still is a crude approximation, since it only measures between
the last sequence number received before the loss and the third sequence number
after the loss. It would be better to either

 * use the first sequence number after the loss (this can be s1, s2, or s3) or
 * check if there are more holes between the first/second and the second/third
   sequence numbers after the loss.

The second option would be the correct one, it should also take the NDP counts
of each gap into account. And already we have a fairly complicated algorithm.
 		  
Another observation is that this function is only called in packet_history_sp.c,
and only in __two_after_loss(), so that dccp_loss_count() could be made static,
and without the need for the WARN_ON (see below), since in all above cases it is
ensured that the first sequence number argument is "before" the second one.


--- dccp_tree_work4.orig/net/dccp/ccids/lib/packet_history_sp.h
+++ dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.h
@@ -113,6 +113,7 @@
 	u32			  packet_size,
 				  bytes_recvd;
 	ktime_t			  bytes_start;
+	u8			  num_losses;
 };
No more than 255 losses? NDP count option has space for up to 6 bytes, i.e. 2^48-1.
Suggest u64 for consistency with the other parts.
 

--- dccp_tree_work4.orig/net/dccp/dccp.h
+++ dccp_tree_work4/net/dccp/dccp.h
@@ -168,6 +168,21 @@
 	return (u64)delta <= ndp + 1;
 }
 
+static inline u64 dccp_loss_count(const u64 s1, const u64 s2, const u64 ndp)
+{
+	s64 delta, count;
+
+	delta = dccp_delta_seqno(s1, s2);
+	WARN_ON(delta < 0);
+
+	count = ndp + 1;
+	count -= delta;
+
+	count = (count > 0) ? count : 0;
+
+	return (u64) count;
+}
Let S_A, S_B be sequence numbers such that S_B is "after" S_A, and let
N_B be the NDP count of packet S_B. Then, using module-2^48 arithmetic,
 D = S_B - S_A - 1  is an upper bound of the number of lost data packets,
 D - N_B            is an approximation of the number of lost data 
                    packets (there are cases where this is not exact).

But your formula computes N_B - D, i.e. it negates this value. What you
want is dccp_loss_count(S_A, S_B, N_B) := max(S_B - S_A - 1 - N_B, 0).

A short implementation would be:

/**
 * dccp_loss_count - Approximate the number of data packets lost in a row
 * @s1:   last known sequence number before the loss ('hole')
 * @s2:   first sequence number seen after the 'hole'
 * @ndp:  ndp count associated with packet having sequence number @s2
 */
u64 dccp_loss_count(const u64 s1, const u64 s2, const u64 ndp)
{
	s64 delta = dccp_delta_seqno(s1, s2);
	
	WARN_ON(delta < 0);
	delta -= ndp + 1;

	return delta > 0 ? delta : 0;
}

But then dccp_loss_free reduces to a specialisation of the above:
bool dccp_loss_free(const u64 s1, const u64 s2, const u64 ndp)
{
	return dccp_loss_count(s1, s2, ndp) == 0;
}

But please see above -- the function needs to be called for each hole in a row.

^ permalink raw reply

* INFO: possible circular locking detected [when unloading the bonding driver]
From: Alan Jenkins @ 2009-09-13 12:26 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: linux-kernel, netdev, bonding-devel

I found this warning after loading and unloading all the modules I had
available.  I can reproduce it by hibernating the system, then loading
and unloading the bonding module. 

Regards
Alan

[  123.484225] Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
[  123.484229] bonding: Warning: either miimon or arp_interval and
arp_ip_target module parameters must be specified, otherwise bonding
will not detect link failures! see bonding.txt for details.
[  125.790929]
[  125.790931] =======================================================
[  125.790936] [ INFO: possible circular locking dependency detected ]
[  125.790940] 2.6.31-rc8debug #50
[  125.790942] -------------------------------------------------------
[  125.790946] modprobe/8372 is trying to acquire lock:
[  125.790948]  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81048ef2>]
cpu_maps_update_begin+0x12/0x20
[  125.790961]
[  125.790962] but task is already holding lock:
[  125.790965]  (rtnl_mutex){+.+.+.}, at: [<ffffffff812d0aa2>]
rtnl_lock+0x12/0x20
[  125.790974]
[  125.790975] which lock already depends on the new lock.
[  125.790976]
[  125.790978]
[  125.790979] the existing dependency chain (in reverse order) is:
[  125.790982]
[  125.790983] -> #3 (rtnl_mutex){+.+.+.}:
[  125.790989]        [<ffffffff810741d9>] validate_chain+0xb99/0x11d0
[  125.790996]        [<ffffffff81074b6e>] __lock_acquire+0x35e/0xa30
[  125.791001]        [<ffffffff810752d7>] lock_acquire+0x97/0x110
[  125.791006]        [<ffffffff8134f046>] mutex_lock_nested+0x56/0x310
[  125.791012]        [<ffffffff812d0aa2>] rtnl_lock+0x12/0x20
[  125.791016]        [<ffffffff812d2079>] linkwatch_event+0x9/0x40
[  125.791021]        [<ffffffff8105cafa>] worker_thread+0x23a/0x350
[  125.791027]        [<ffffffff81060f56>] kthread+0xa6/0xb0
[  125.791032]        [<ffffffff8100ce5a>] child_rip+0xa/0x20
[  125.791037]        [<ffffffffffffffff>] 0xffffffffffffffff
[  125.791051]
[  125.791052] -> #2 ((linkwatch_work).work){+.+.+.}:
[  125.791058]        [<ffffffff810741d9>] validate_chain+0xb99/0x11d0
[  125.791063]        [<ffffffff81074b6e>] __lock_acquire+0x35e/0xa30
[  125.791068]        [<ffffffff810752d7>] lock_acquire+0x97/0x110
[  125.791073]        [<ffffffff8105caf4>] worker_thread+0x234/0x350
[  125.791077]        [<ffffffff81060f56>] kthread+0xa6/0xb0
[  125.791082]        [<ffffffff8100ce5a>] child_rip+0xa/0x20
[  125.791087]        [<ffffffffffffffff>] 0xffffffffffffffff
[  125.791092]
[  125.791093] -> #1 (events){+.+.+.}:
[  125.791098]        [<ffffffff810741d9>] validate_chain+0xb99/0x11d0
[  125.791104]        [<ffffffff81074b6e>] __lock_acquire+0x35e/0xa30
[  125.791108]        [<ffffffff810752d7>] lock_acquire+0x97/0x110
[  125.791114]        [<ffffffff8105b758>]
cleanup_workqueue_thread+0x48/0xd0
[  125.791119]        [<ffffffff8133dffa>] workqueue_cpu_callback+0x9a/0x130
[  125.791125]        [<ffffffff81353dce>] notifier_call_chain+0x5e/0xa0
[  125.791130]        [<ffffffff81065549>]
__raw_notifier_call_chain+0x9/0x10
[  125.791136]        [<ffffffff81065561>] raw_notifier_call_chain+0x11/0x20
[  125.791141]        [<ffffffff8133c213>] _cpu_down+0x173/0x2b0
[  125.791145]        [<ffffffff81049148>] disable_nonboot_cpus+0x98/0x130
[  125.791150]        [<ffffffff81081f97>] hibernation_snapshot+0x187/0x250
[  125.791156]        [<ffffffff81082178>] hibernate+0x118/0x1c0
[  125.791160]        [<ffffffff81080863>] state_store+0xd3/0xf0
[  125.791165]        [<ffffffff81199fe7>] kobj_attr_store+0x17/0x20
[  125.791172]        [<ffffffff8113e7b4>] sysfs_write_file+0xf4/0x150
[  125.791178]        [<ffffffff810e3557>] vfs_write+0xc7/0x1a0
[  125.791184]        [<ffffffff810e3ba0>] sys_write+0x50/0x90
[  125.791189]        [<ffffffff8100bdab>] system_call_fastpath+0x16/0x1b
[  125.791194]        [<ffffffffffffffff>] 0xffffffffffffffff
[  125.791205]
[  125.791205] -> #0 (cpu_add_remove_lock){+.+.+.}:
[  125.791211]        [<ffffffff81073c0d>] validate_chain+0x5cd/0x11d0
[  125.791216]        [<ffffffff81074b6e>] __lock_acquire+0x35e/0xa30
[  125.791221]        [<ffffffff810752d7>] lock_acquire+0x97/0x110
[  125.791225]        [<ffffffff8134f046>] mutex_lock_nested+0x56/0x310
[  125.791230]        [<ffffffff81048ef2>] cpu_maps_update_begin+0x12/0x20
[  125.791235]        [<ffffffff8105b842>] destroy_workqueue+0x22/0xc0
[  125.791240]        [<ffffffffa03afdfa>] bond_uninit+0x3a/0x70 [bonding]
[  125.791252]        [<ffffffff812c5e8d>] rollback_registered+0xbd/0x120
[  125.791258]        [<ffffffff812c5f0d>] unregister_netdevice+0x1d/0x70
[  125.791262]        [<ffffffffa03ad71f>] bond_free_all+0x4f/0xa0 [bonding]
[  125.791273]        [<ffffffffa03ba075>] bonding_exit+0x35/0x3c [bonding]
[  125.791283]        [<ffffffff8107ddc6>] sys_delete_module+0x176/0x230
[  125.791288]        [<ffffffff8100bdab>] system_call_fastpath+0x16/0x1b
[  125.791293]        [<ffffffffffffffff>] 0xffffffffffffffff
[  125.791298]
[  125.791299] other info that might help us debug this:
[  125.791300]
[  125.791304] 1 lock held by modprobe/8372:
[  125.791306]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff812d0aa2>]
rtnl_lock+0x12/0x20
[  125.791314]
[  125.791315] stack backtrace:
[  125.791319] Pid: 8372, comm: modprobe Not tainted 2.6.31-rc8debug #50
[  125.791322] Call Trace:
[  125.791328]  [<ffffffff81073494>] print_circular_bug_tail+0x94/0xe0
[  125.791333]  [<ffffffff81073c0d>] validate_chain+0x5cd/0x11d0
[  125.791338]  [<ffffffff81074b6e>] __lock_acquire+0x35e/0xa30
[  125.791343]  [<ffffffff810752d7>] lock_acquire+0x97/0x110
[  125.791349]  [<ffffffff81048ef2>] ? cpu_maps_update_begin+0x12/0x20
[  125.791354]  [<ffffffff8134f046>] mutex_lock_nested+0x56/0x310
[  125.791358]  [<ffffffff81048ef2>] ? cpu_maps_update_begin+0x12/0x20
[  125.791363]  [<ffffffff8113f974>] ? release_sysfs_dirent+0x54/0xc0
[  125.791368]  [<ffffffff81048ef2>] cpu_maps_update_begin+0x12/0x20
[  125.791373]  [<ffffffff8105b842>] destroy_workqueue+0x22/0xc0
[  125.791383]  [<ffffffffa03afdfa>] bond_uninit+0x3a/0x70 [bonding]
[  125.791388]  [<ffffffff812c5e8d>] rollback_registered+0xbd/0x120
[  125.791397]  [<ffffffffa03abfb6>] ? bond_work_cancel_all+0x16/0x130
[bonding]
[  125.791402]  [<ffffffff812c5f0d>] unregister_netdevice+0x1d/0x70
[  125.791411]  [<ffffffffa03ad71f>] bond_free_all+0x4f/0xa0 [bonding]
[  125.791421]  [<ffffffffa03ba075>] bonding_exit+0x35/0x3c [bonding]
[  125.791425]  [<ffffffff8107ddc6>] sys_delete_module+0x176/0x230
[  125.791430]  [<ffffffff8107186d>] ? trace_hardirqs_on_caller+0x14d/0x1a0
[  125.791435]  [<ffffffff81350a04>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  125.791441]  [<ffffffff8100bdab>] system_call_fastpath+0x16/0x1b


^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Michael S. Tsirkin @ 2009-09-13 12:01 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ira W. Snyder, netdev, virtualization, kvm, linux-kernel, mingo,
	linux-mm, akpm, hpa, Rusty Russell, s.hetze
In-Reply-To: <4AAA7415.5080204@gmail.com>

On Fri, Sep 11, 2009 at 12:00:21PM -0400, Gregory Haskins wrote:
> FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
> the memory is not assumed to be a userspace address.  Rather, it is a
> memctx-specific address, which can be userspace, or any other type
> (including hardware, dma-engine, etc).  As long as the memctx knows how
> to translate it, it will work.

How would permissions be handled? it's easy to allow an app to pass in
virtual addresses in its own address space.  But we can't let the guest
specify physical addresses.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH  kernel 2.6.31] pcnet_cs: add cis of Linksys multifunction pcmcia card
From: Ken Kawasaki @ 2009-09-13  8:22 UTC (permalink / raw)
  To: netdev; +Cc: ken_kawasaki
In-Reply-To: <20090426162929.3d036f26.ken_kawasaki@spring.nifty.jp>


pcnet_cs,serial_cs:
 
add cis of Linksys lan&modem mulitifunction pcmcia card
and some modem card(MT5634ZLX, RS-COM-2P).

  
Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
 
---

for kernel 2.6.31 

 drivers/net/pcmcia/pcnet_cs.c   |   10 +++++-----
 drivers/serial/serial_cs.c      |   14 +++++++-------
 firmware/Makefile               |    3 ++-
 firmware/WHENCE                 |   12 ++++++++++++
 firmware/cis/MT5634ZLX.cis.ihex |   11 +++++++++++
 firmware/cis/PCMLM28.cis.ihex   |   18 ++++++++++++++++++
 firmware/cis/RS-COM-2P.cis.ihex |   10 ++++++++++
 7 files changed, 65 insertions(+), 13 deletions(-)

diff -uprN linux-2.6.31.orig/drivers/net/pcmcia/pcnet_cs.c linux-2.6.31/drivers/net/pcmcia/pcnet_cs.c
--- linux-2.6.31.orig/drivers/net/pcmcia/pcnet_cs.c	2009-09-10 07:13:59.000000000 +0900
+++ linux-2.6.31/drivers/net/pcmcia/pcnet_cs.c	2009-09-12 19:59:57.000000000 +0900
@@ -1751,11 +1751,11 @@ static struct pcmcia_device_id pcnet_ids
 	PCMCIA_DEVICE_PROD_ID2("EN-6200P2", 0xa996d078),
 	/* too generic! */
 	/* PCMCIA_DEVICE_PROD_ID12("PCMCIA", "10/100 Ethernet Card", 0x281f1c5d, 0x11b0ffc0), */
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "PCMCIA", "EN2218-LAN/MODEM", 0x281f1c5d, 0x570f348e, "PCMLM28.cis"),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "PCMCIA", "UE2218-LAN/MODEM", 0x281f1c5d, 0x6fdcacee, "PCMLM28.cis"),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "Psion Dacom", "Gold Card V34 Ethernet", 0xf5f025c2, 0x338e8155, "PCMLM28.cis"),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "Psion Dacom", "Gold Card V34 Ethernet GSM", 0xf5f025c2, 0x4ae85d35, "PCMLM28.cis"),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "LINKSYS", "PCMLM28", 0xf7cb0b07, 0x66881874, "PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "PCMCIA", "EN2218-LAN/MODEM", 0x281f1c5d, 0x570f348e, "cis/PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "PCMCIA", "UE2218-LAN/MODEM", 0x281f1c5d, 0x6fdcacee, "cis/PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "Psion Dacom", "Gold Card V34 Ethernet", 0xf5f025c2, 0x338e8155, "cis/PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "Psion Dacom", "Gold Card V34 Ethernet GSM", 0xf5f025c2, 0x4ae85d35, "cis/PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(0, "LINKSYS", "PCMLM28", 0xf7cb0b07, 0x66881874, "cis/PCMLM28.cis"),
 	PCMCIA_MFC_DEVICE_CIS_PROD_ID12(0, "DAYNA COMMUNICATIONS", "LAN AND MODEM MULTIFUNCTION", 0x8fdf8f89, 0xdd5ed9e8, "DP83903.cis"),
 	PCMCIA_MFC_DEVICE_CIS_PROD_ID4(0, "NSC MF LAN/Modem", 0x58fc6056, "DP83903.cis"),
 	PCMCIA_MFC_DEVICE_CIS_MANF_CARD(0, 0x0175, 0x0000, "DP83903.cis"),
diff -uprN linux-2.6.31.orig/drivers/serial/serial_cs.c linux-2.6.31/drivers/serial/serial_cs.c
--- linux-2.6.31.orig/drivers/serial/serial_cs.c	2009-09-10 07:13:59.000000000 +0900
+++ linux-2.6.31/drivers/serial/serial_cs.c	2009-09-12 20:02:33.000000000 +0900
@@ -868,11 +868,11 @@ static struct pcmcia_device_id serial_id
 	PCMCIA_DEVICE_PROD_ID12("PCMCIA   ", "C336MX     ", 0x99bcafe9, 0xaa25bcab),
 	PCMCIA_DEVICE_PROD_ID12("Quatech Inc", "PCMCIA Dual RS-232 Serial Port Card", 0xc4420b35, 0x92abc92f),
 	PCMCIA_DEVICE_PROD_ID12("Quatech Inc", "Dual RS-232 Serial Port PC Card", 0xc4420b35, 0x031a380d),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "PCMCIA", "EN2218-LAN/MODEM", 0x281f1c5d, 0x570f348e, "PCMLM28.cis"),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "PCMCIA", "UE2218-LAN/MODEM", 0x281f1c5d, 0x6fdcacee, "PCMLM28.cis"),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "Psion Dacom", "Gold Card V34 Ethernet", 0xf5f025c2, 0x338e8155, "PCMLM28.cis"),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "Psion Dacom", "Gold Card V34 Ethernet GSM", 0xf5f025c2, 0x4ae85d35, "PCMLM28.cis"),
-	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "LINKSYS", "PCMLM28", 0xf7cb0b07, 0x66881874, "PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "PCMCIA", "EN2218-LAN/MODEM", 0x281f1c5d, 0x570f348e, "cis/PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "PCMCIA", "UE2218-LAN/MODEM", 0x281f1c5d, 0x6fdcacee, "cis/PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "Psion Dacom", "Gold Card V34 Ethernet", 0xf5f025c2, 0x338e8155, "cis/PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "Psion Dacom", "Gold Card V34 Ethernet GSM", 0xf5f025c2, 0x4ae85d35, "cis/PCMLM28.cis"),
+	PCMCIA_PFC_DEVICE_CIS_PROD_ID12(1, "LINKSYS", "PCMLM28", 0xf7cb0b07, 0x66881874, "cis/PCMLM28.cis"),
 	PCMCIA_MFC_DEVICE_CIS_PROD_ID12(1, "DAYNA COMMUNICATIONS", "LAN AND MODEM MULTIFUNCTION", 0x8fdf8f89, 0xdd5ed9e8, "DP83903.cis"),
 	PCMCIA_MFC_DEVICE_CIS_PROD_ID4(1, "NSC MF LAN/Modem", 0x58fc6056, "DP83903.cis"),
 	PCMCIA_MFC_DEVICE_CIS_MANF_CARD(1, 0x0101, 0x0556, "cis/3CCFEM556.cis"),
@@ -883,10 +883,10 @@ static struct pcmcia_device_id serial_id
 	PCMCIA_DEVICE_CIS_MANF_CARD(0x0192, 0x0710, "SW_7xx_SER.cis"),	/* Sierra Wireless AC710/AC750 GPRS Network Adapter R1 */
 	PCMCIA_DEVICE_CIS_MANF_CARD(0x0192, 0xa555, "SW_555_SER.cis"),  /* Sierra Aircard 555 CDMA 1xrtt Modem -- pre update */
 	PCMCIA_DEVICE_CIS_MANF_CARD(0x013f, 0xa555, "SW_555_SER.cis"),  /* Sierra Aircard 555 CDMA 1xrtt Modem -- post update */
-	PCMCIA_DEVICE_CIS_PROD_ID12("MultiTech", "PCMCIA 56K DataFax", 0x842047ee, 0xc2efcf03, "MT5634ZLX.cis"),
+	PCMCIA_DEVICE_CIS_PROD_ID12("MultiTech", "PCMCIA 56K DataFax", 0x842047ee, 0xc2efcf03, "cis/MT5634ZLX.cis"),
 	PCMCIA_DEVICE_CIS_PROD_ID12("ADVANTECH", "COMpad-32/85B-4", 0x96913a85, 0xcec8f102, "COMpad4.cis"),
 	PCMCIA_DEVICE_CIS_PROD_ID123("ADVANTECH", "COMpad-32/85", "1.0", 0x96913a85, 0x8fbe92ae, 0x0877b627, "COMpad2.cis"),
-	PCMCIA_DEVICE_CIS_PROD_ID2("RS-COM 2P", 0xad20b156, "RS-COM-2P.cis"),
+	PCMCIA_DEVICE_CIS_PROD_ID2("RS-COM 2P", 0xad20b156, "cis/RS-COM-2P.cis"),
 	PCMCIA_DEVICE_CIS_MANF_CARD(0x0013, 0x0000, "GLOBETROTTER.cis"),
 	PCMCIA_DEVICE_PROD_ID12("ELAN DIGITAL SYSTEMS LTD, c1997.","SERIAL CARD: SL100  1.00.",0x19ca78af,0xf964f42b),
 	PCMCIA_DEVICE_PROD_ID12("ELAN DIGITAL SYSTEMS LTD, c1997.","SERIAL CARD: SL100",0x19ca78af,0x71d98e83),
diff -uprN linux-2.6.31.orig/firmware/cis/MT5634ZLX.cis.ihex linux-2.6.31/firmware/cis/MT5634ZLX.cis.ihex
--- linux-2.6.31.orig/firmware/cis/MT5634ZLX.cis.ihex	1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.31/firmware/cis/MT5634ZLX.cis.ihex	2009-09-12 19:58:04.000000000 +0900
@@ -0,0 +1,11 @@
+:100000000101FF152204014D756C74695465636824
+:100010000050434D4349412035364B2044617461C3
+:10002000466178000000FF20040002010021020266
+:10003000001A05012780FF671B0FCF418B01550177
+:10004000550155AA60F80307281B08970108AA6004
+:10005000F802071B089F0108AA60E803071B08A70E
+:0B0060000108AA60E802071400FF007E
+:00000001FF
+#
+# Replacement CIS for Multitech MT5634ZLX modems
+#
diff -uprN linux-2.6.31.orig/firmware/cis/PCMLM28.cis.ihex linux-2.6.31/firmware/cis/PCMLM28.cis.ihex
--- linux-2.6.31.orig/firmware/cis/PCMLM28.cis.ihex	1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.31/firmware/cis/PCMLM28.cis.ihex	2009-09-12 19:57:29.000000000 +0900
@@ -0,0 +1,18 @@
+:1000000001030000FF151504014C494E4B53595391
+:100010000050434D4C4D3238000000FF2004430196
+:10002000ABC0210200001A05012FF803031B10E4E6
+:1000300001190155E06100031FF8020730FFFF1BA3
+:100040000BA50108E06120031FF802071B0BA601A6
+:1000500008E06140031FF802071B0BA70108E061DD
+:1000600060031FF802071B0BA80108E06100031FD3
+:10007000E803071B0BA90108E06120031FE8030741
+:100080001B0BAA0108E06140031FE803071B0BAB31
+:100090000108E06160031FE803071B0BAC0108E0E7
+:1000A0006100031FE802071B0BAD0108E06120039C
+:1000B0001FE802071B0BAE0108E06140031FE802C6
+:1000C000071B0BAF0108E06160031FE80207140083
+:0200D000FF002F
+:00000001FF
+#
+# The on-card CIS says it is MFC-compliant, but it is not
+#
diff -uprN linux-2.6.31.orig/firmware/cis/RS-COM-2P.cis.ihex linux-2.6.31/firmware/cis/RS-COM-2P.cis.ihex
--- linux-2.6.31.orig/firmware/cis/RS-COM-2P.cis.ihex	1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.31/firmware/cis/RS-COM-2P.cis.ihex	2009-09-12 19:58:12.000000000 +0900
@@ -0,0 +1,10 @@
+:1000000001030000FF1516040150434D4349410010
+:1000100052532D434F4D203250000000FF21020269
+:10002000011A0501030001011B0EC18118AA61E834
+:100030000307E8020730B89E1B0B820108AA615033
+:1000400002075802071B0B830108AA6160020768B8
+:0600500002071400FF008E
+:00000001FF
+#
+# Replacement CIS for dual-serial-port IO card
+#
diff -uprN linux-2.6.31.orig/firmware/Makefile linux-2.6.31/firmware/Makefile
--- linux-2.6.31.orig/firmware/Makefile	2009-09-10 07:13:59.000000000 +0900
+++ linux-2.6.31/firmware/Makefile	2009-09-12 20:12:11.000000000 +0900
@@ -47,9 +47,10 @@ fw-shipped-$(CONFIG_DVB_TTUSB_BUDGET) +=
 fw-shipped-$(CONFIG_E100) += e100/d101m_ucode.bin e100/d101s_ucode.bin \
 			     e100/d102e_ucode.bin
 fw-shipped-$(CONFIG_MYRI_SBUS) += myricom/lanai.bin
-fw-shipped-$(CONFIG_PCMCIA_PCNET) += cis/LA-PCM.cis
+fw-shipped-$(CONFIG_PCMCIA_PCNET) += cis/LA-PCM.cis cis/PCMLM28.cis
 fw-shipped-$(CONFIG_PCMCIA_3C589) += cis/3CXEM556.cis
 fw-shipped-$(CONFIG_PCMCIA_3C574) += cis/3CCFEM556.cis
+fw-shipped-$(CONFIG_SERIAL_8250_CS) += cis/MT5634ZLX.cis cis/RS-COM-2P.cis
 fw-shipped-$(CONFIG_PCMCIA_SMC91C92) += ositech/Xilinx7OD.bin
 fw-shipped-$(CONFIG_SCSI_ADVANSYS) += advansys/mcode.bin advansys/38C1600.bin \
 				      advansys/3550.bin advansys/38C0800.bin
diff -uprN linux-2.6.31.orig/firmware/WHENCE linux-2.6.31/firmware/WHENCE
--- linux-2.6.31.orig/firmware/WHENCE	2009-09-10 07:13:59.000000000 +0900
+++ linux-2.6.31/firmware/WHENCE	2009-09-12 20:09:40.000000000 +0900
@@ -579,6 +579,7 @@ Found in hex form in kernel source.
 Driver: PCMCIA_PCNET - NE2000 compatible PCMCIA adapter
 
 File: cis/LA-PCM.cis
+      cis/PCMLM28.cis
 
 Licence: GPL
 
@@ -606,6 +607,17 @@ Originally developed by the pcmcia-cs pr
 
 --------------------------------------------------------------------------
 
+Driver: SERIAL_8250_CS - Serial PCMCIA adapter
+
+File: cis/MT5634ZLX.cis
+      cis/RS-COM-2P.cis
+
+Licence: GPL
+
+Originally developed by the pcmcia-cs project
+
+--------------------------------------------------------------------------
+
 Driver: PCMCIA_SMC91C92 - SMC 91Cxx PCMCIA
 
 File: ositech/Xilinx7OD.bin


^ permalink raw reply

* [PATCH net-next-2.6] Have atalk_route_packet() return NET_RX_SUCCESS not NET_XMIT_SUCCESS
From: Mark Smith @ 2009-09-13  6:48 UTC (permalink / raw)
  To: netdev, acme, davem

Have atalk_route_packet() return NET_RX_SUCCESS not NET_XMIT_SUCCESS

atalk_route_packet() returns NET_RX_DROP if it's call to
aarp_send_ddp() returns NET_XMIT_DROP. If aarp_send_ddp() returns
anything else atalk_route_packet() should return NET_RX_SUCCESS, not
NET_XMIT_SUCCESS.

Signed-off-by: Mark Smith <markzzzsmith@yahoo.com.au>

diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 4a6ff2b..b1a4290 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1372,7 +1372,7 @@ static int atalk_route_packet(struct sk_buff *skb, struct net_device *dev,
 
 	if (aarp_send_ddp(rt->dev, skb, &ta, NULL) == NET_XMIT_DROP)
 		return NET_RX_DROP;
-	return NET_XMIT_SUCCESS;
+	return NET_RX_SUCCESS;
 free_it:
 	kfree_skb(skb);
 drop:

^ permalink raw reply related

* Re: [net-next PATCH] etherdevice.h: random_ether_addr update
From: Mark Smith @ 2009-09-13  6:39 UTC (permalink / raw)
  To: Joe Perches
  Cc: Stephen Hemminger, David Miller, jeffrey.t.kirsher, netdev, gospo,
	gregory.v.rose, donald.c.skidmore
In-Reply-To: <1252822152.4400.108.camel@Joe-Laptop.home>

On Sat, 12 Sep 2009 23:09:12 -0700
Joe Perches <joe@perches.com> wrote:

> On Sun, 2009-09-13 at 13:17 +0930, Mark Smith wrote:
> > On Sat, 12 Sep 2009 17:44:46 -0700
> > Joe Perches <joe@perches.com> wrote:
> > > Avoiding an initial octet of "02", which is partially
> > > assigned to 3Com and others, might be useful.
> > I wouldn't necessarily disagree. I would say that if that path was
> > taken, then you'd probably also want to be avoiding all the other
> > well known mac addresses that do or can fall within the locally
> > assigned range e.g. DECnet 0xAA addresses, Microsoft's  use of
> > 02:01:00:00:00:00 and similar addresses for their Network Load
> > Balancing software, the unicast version of the CF:00:00:00:00:00
> > multicast address use for ECTP, the unicast version of the
> > 33:33:xx:xx:xx:xx IPv6 ND multicast ranges etc.
> 
> The existing code already has the first wire bit cleared so it
> is not multicast

Agreed. However I think that if there is a well-known multicast address
that has 0x02 set, there is a future, slight possibility that unicast
addresses might be assigned out of that same LA space, and so they're
worth avoiding. It probably seems a bit paranoid, but LAs are supposed
to be private use only in the first place, and not supposed to be seen
outside of the organisation or entity assigning them (e.g. Microsoft
should have got an OUI to use with their NLB product). If people are
crossing those privacy boundaries with LA multicast addresses, I'd
suggest they might be willing to do it with LA unicast addresses in the
future too - and hopefully they'd restrict themselves to the OUI bytes
they've used for their multicast addresses.

> and has the locally assigned bit set so the
> first octet is a multiple of 2.
> 
> The suggested patch requires an initial octet >= 0x04.
> 
> Skipping AA seems a good idea.
> 
> > Having thought about this issue a bit before, another thought might be
> > to have somebody get the Linux kernel its own OUI,
> 
> That's been suggested.
> 
> > > Not drawing from entropy I think useful, but it's debatable.
> > I'm guessing there are other things in the kernel that would be taking
> > away far more entropy, far more often. IIRC, TCP connection initial
> > sequence number selection would be one example.
> 
> These MAC assignments are generally done at system startup
> when entropy often isn't available and possibly should be
> conserved.
> 
> Maybe this:
> 

I'd suggest documenting in the comment why 0x02 or 0xaa are special
values that have been avoided.

> Signed-off-by: Joe Perches <joe@perches.com>
> 
> diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
> index 3d7a668..40233db 100644
> --- a/include/linux/etherdevice.h
> +++ b/include/linux/etherdevice.h
> @@ -118,12 +118,30 @@ static inline int is_valid_ether_addr(const u8 *addr)
>   *
>   * Generate a random Ethernet address (MAC) that is not multicast
>   * and has the local assigned bit set.
> + * Does not assign a leading octet of 0x02 or 0xaa.
>   */
>  static inline void random_ether_addr(u8 *addr)
>  {
> -	get_random_bytes (addr, ETH_ALEN);
> -	addr [0] &= 0xfe;	/* clear multicast bit */
> -	addr [0] |= 0x02;	/* set local assignment bit (IEEE802) */
> +	u32 val;
> +
> +	/* not calling get_random_bytes to avoid using entropy */
> +	do {
> +		val = random32();
> +		addr[0] = val;
> +		addr[0] &= 0xfe;	/* clear multicast bit */
> +		addr[0] |= 0x02;	/* set local assignment bit (IEEE802) */
> +	} while (addr[0] == 0x02 || addr[0] == 0xaa);
> +
> +	val >>= 8;
> +	addr[1] = val;
> +	val >>= 8;
> +	addr[2] = val;
> +	val >>= 8;
> +	addr[3] = val;
> +	val = random32();
> +	addr[4] = val;
> +	val >>= 8;
> +	addr[5] = val;
>  }
>  
>  /**
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox