* [patch,rfc] allow registration of multiple netpolls per interface
@ 2005-06-21 21:41 Jeff Moyer
2005-06-21 22:52 ` Matt Mackall
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Moyer @ 2005-06-21 21:41 UTC (permalink / raw)
To: mpm; +Cc: netdev, netdev, linux-kernel
Hi,
This patch restores functionality that was removed when the recursive
->poll bug was fixed. Namely, it allows multiple netpoll clients to
register against the same network interface.
In order to put things into perspective, I'm going to provide some
background information. So, here is how things used to work:
Multiple users of the netpoll interface could register themselves to send
packets over the same interface. Any number of these netpoll clients could
register an rx_hook, as well. However, only the very first in the list
(hence the last one that registered), that matched the incoming interface,
would be called when a packet arrived. The reason for this was not design,
it was an oversight in the implementation. In practice, however, no one
ever stumbled over this. (There are more subtleties when dealing with
multiple rx_hooks registered to the same interface, but we'll ignore these,
since no one ever ran into such problems.)
Note that each netpoll client that registered an rx_hook was put on a
netpoll_rx_list. This list was protected by a spinlock, and so operations
which touched the rx routines would incur a locking penalty and a list
traversal. I am mentioning this because the list and associated lock were
removed when the code was refactored, and the patches I propose will
reintroduce the lock, but not the list.
Moving to what we have today:
Multiple netpoll clients can register to send packets over the same
interface. That's right, you can actually do this. However, there are
ugly side effects. Because we now have a pointer from the net_device to a
struct netpoll, the last netpoll client to register will be pointed to by
the net_device->np. What this means is that if you had two clients, the
first registers an rx_hook and the second does not, then the netpoll code
will not know that any device has actually registered an rx_hook (since the
np pointer in the struct net_device is overwritten)! As a result, no
incoming packets will be delivered to the registered rx routine. This is
clearly undesirable behaviour.
So what does the patch do?
I created a new structure:
struct netpoll_info {
spinlock_t poll_lock;
int poll_owner;
int rx_flags;
spinlock_t rx_lock;
struct netpoll *rx_np; /* netpoll that registered an rx_hook */
};
This is the structure which gets pointed to by the net_device. All of the
flags and locks which are specific to the INTERFACE go here. Any variables
which must be kept per struct netpoll were left in the struct netpoll. So
now, we have a cleaner separation of data and its scope.
Since we never really supported having more than one struct netpoll
register an rx_hook, I got rid of the rx_list. This is replaced by a
single pointer in the netpoll_info structure (np_rx). We still need to
protect addition or removal of the rx_np pointer, and so keep the lock
(rx_lock). There is one lock per struct net_device, and I am certain that
it will be 0 contention, as rx_np will only be changed during an insmod or
rmmod. If people think this would be a good rcu candidate, let me know and
I'll change it to use that locking scheme.
In the process of making these changes, I've fixed a couple other minor
bugs [1]. These fixes are included in this patch, but I will break them
out if people agree with this approach.
I have tested this by registering multiple netpoll clients, and verifying
that they both function properly. I have not yet tried registering an
rx_hook, but I believe the code should be sufficient to handle that case.
And so, here is the full patch. I'd appreciate comments. Once we've
reached consensus, I will resubmit as a patch series.
Oh, and I've cc'd both netdev@oss.sgi.com and @vger.kernel.org. Is it safe
to just use the vger list?
Thanks,
Jeff
[1] netpoll_poll_unlock unlocked and then set the poll_owner. I've
reversed the order of those operations. The netpoll_cleanup code could
dereference a null pointer, that was fixed by virtue of being very
different in the new case.
--- linux-2.6.12-rc6/net/core/netpoll.c.orig 2005-06-20 19:51:56.000000000 -0400
+++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-21 16:03:22.409620400 -0400
@@ -131,18 +131,19 @@ static int checksum_udp(struct sk_buff *
static void poll_napi(struct netpoll *np)
{
int budget = 16;
+ struct netpoll_info *npinfo = np->dev->npinfo;
if (test_bit(__LINK_STATE_RX_SCHED, &np->dev->state) &&
- np->poll_owner != smp_processor_id() &&
- spin_trylock(&np->poll_lock)) {
- np->rx_flags |= NETPOLL_RX_DROP;
+ npinfo->poll_owner != smp_processor_id() &&
+ spin_trylock(&npinfo->poll_lock)) {
+ npinfo->rx_flags |= NETPOLL_RX_DROP;
atomic_inc(&trapped);
np->dev->poll(np->dev, &budget);
atomic_dec(&trapped);
- np->rx_flags &= ~NETPOLL_RX_DROP;
- spin_unlock(&np->poll_lock);
+ npinfo->rx_flags &= ~NETPOLL_RX_DROP;
+ spin_unlock(&npinfo->poll_lock);
}
}
@@ -245,6 +246,7 @@ repeat:
static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
{
int status;
+ struct netpoll_info *npinfo;
repeat:
if(!np || !np->dev || !netif_running(np->dev)) {
@@ -253,7 +255,8 @@ repeat:
}
/* avoid recursion */
- if(np->poll_owner == smp_processor_id() ||
+ npinfo = np->dev->npinfo;
+ if(npinfo->poll_owner == smp_processor_id() ||
np->dev->xmit_lock_owner == smp_processor_id()) {
if (np->drop)
np->drop(skb);
@@ -346,7 +349,15 @@ static void arp_reply(struct sk_buff *sk
int size, type = ARPOP_REPLY, ptype = ETH_P_ARP;
u32 sip, tip;
struct sk_buff *send_skb;
- struct netpoll *np = skb->dev->np;
+ struct netpoll *np;
+ struct netpoll_info *npinfo = skb->dev->npinfo;
+
+ if (!npinfo) return;
+
+ spin_lock_irqsave(&npinfo->rx_lock, flags);
+ if (npinfo->rx_np->dev == skb->dev)
+ np = npinfo->rx_np;
+ spin_unlock_irqrestore(&npinfo->rx_lock, flags);
if (!np) return;
@@ -429,9 +440,9 @@ int __netpoll_rx(struct sk_buff *skb)
int proto, len, ulen;
struct iphdr *iph;
struct udphdr *uh;
- struct netpoll *np = skb->dev->np;
+ struct netpoll *np = skb->dev->npinfo->rx_np;
- if (!np->rx_hook)
+ if (!np)
goto out;
if (skb->dev->type != ARPHRD_ETHER)
goto out;
@@ -611,9 +622,8 @@ int netpoll_setup(struct netpoll *np)
{
struct net_device *ndev = NULL;
struct in_device *in_dev;
-
- np->poll_lock = SPIN_LOCK_UNLOCKED;
- np->poll_owner = -1;
+ struct netpoll_info *npinfo;
+ unsigned long flags;
if (np->dev_name)
ndev = dev_get_by_name(np->dev_name);
@@ -624,7 +634,17 @@ int netpoll_setup(struct netpoll *np)
}
np->dev = ndev;
- ndev->np = np;
+ if (!ndev->npinfo) {
+ npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
+ if (!npinfo)
+ goto release;
+
+ npinfo->rx_np = NULL;
+ npinfo->poll_lock = SPIN_LOCK_UNLOCKED;
+ npinfo->poll_owner = -1;
+ npinfo->rx_lock = SPIN_LOCK_UNLOCKED;
+ } else
+ npinfo = ndev->npinfo;
if (!ndev->poll_controller) {
printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
@@ -692,13 +712,20 @@ int netpoll_setup(struct netpoll *np)
np->name, HIPQUAD(np->local_ip));
}
- if(np->rx_hook)
- np->rx_flags = NETPOLL_RX_ENABLED;
+ if(np->rx_hook) {
+ spin_lock_irqsave(&npinfo->rx_lock, flags);
+ npinfo->rx_flags |= NETPOLL_RX_ENABLED;
+ npinfo->rx_np = np;
+ spin_unlock_irqsave(&npinfo->rx_lock, flags);
+ }
+ /* last thing to do is link it to the net device structure */
+ ndev->npinfo = npinfo;
return 0;
release:
- ndev->np = NULL;
+ if (!ndev->npinfo)
+ kfree(npinfo);
np->dev = NULL;
dev_put(ndev);
return -1;
@@ -706,9 +733,17 @@ int netpoll_setup(struct netpoll *np)
void netpoll_cleanup(struct netpoll *np)
{
- if (np->dev)
- np->dev->np = NULL;
- dev_put(np->dev);
+ struct netpoll_info *npinfo;
+
+ if (np->dev) {
+ npinfo = np->dev->npinfo;
+ if (npinfo && npinfo->rx_np == np) {
+ npinfo->rx_np = NULL;
+ npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+ }
+ dev_put(np->dev);
+ }
+
np->dev = NULL;
}
--- linux-2.6.12-rc6/net/core/dev.c.orig 2005-06-20 19:51:59.000000000 -0400
+++ linux-2.6.12-rc6/net/core/dev.c 2005-06-21 13:53:51.583407710 -0400
@@ -1656,6 +1656,7 @@ int netif_receive_skb(struct sk_buff *sk
unsigned short type;
/* if we've gotten here through NAPI, check netpoll */
+ /* how else can we get here? --phro */
if (skb->dev->poll && netpoll_rx(skb))
return NET_RX_DROP;
--- linux-2.6.12-rc6/include/linux/netpoll.h.orig 2005-06-20 19:51:47.000000000 -0400
+++ linux-2.6.12-rc6/include/linux/netpoll.h 2005-06-21 15:29:48.994422229 -0400
@@ -16,14 +16,19 @@ struct netpoll;
struct netpoll {
struct net_device *dev;
char dev_name[16], *name;
- int rx_flags;
void (*rx_hook)(struct netpoll *, int, char *, int);
void (*drop)(struct sk_buff *skb);
u32 local_ip, remote_ip;
u16 local_port, remote_port;
unsigned char local_mac[6], remote_mac[6];
+};
+
+struct netpoll_info {
spinlock_t poll_lock;
int poll_owner;
+ int rx_flags;
+ spinlock_t rx_lock;
+ struct netpoll *rx_np; /* netpoll that registered an rx_hook */
};
void netpoll_poll(struct netpoll *np);
@@ -39,22 +44,35 @@ void netpoll_queue(struct sk_buff *skb);
#ifdef CONFIG_NETPOLL
static inline int netpoll_rx(struct sk_buff *skb)
{
- return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb);
+ struct netpoll_info *npinfo = skb->dev->npinfo;
+ unsigned long flags;
+ int ret = 0;
+
+ if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags))
+ return 0;
+
+ spin_lock_irqsave(&npinfo->rx_lock, flags);
+ /* check rx_flags again with the lock held */
+ if (npinfo->rx_flags && __netpoll_rx(skb))
+ ret = 1;
+ spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+
+ return ret;
}
static inline void netpoll_poll_lock(struct net_device *dev)
{
- if (dev->np) {
- spin_lock(&dev->np->poll_lock);
- dev->np->poll_owner = smp_processor_id();
+ if (dev->npinfo) {
+ spin_lock(&dev->npinfo->poll_lock);
+ dev->npinfo->poll_owner = smp_processor_id();
}
}
static inline void netpoll_poll_unlock(struct net_device *dev)
{
- if (dev->np) {
- spin_unlock(&dev->np->poll_lock);
- dev->np->poll_owner = -1;
+ if (dev->npinfo) {
+ dev->npinfo->poll_owner = -1;
+ spin_unlock(&dev->npinfo->poll_lock);
}
}
--- linux-2.6.12-rc6/include/linux/netdevice.h.orig 2005-06-20 20:26:21.000000000 -0400
+++ linux-2.6.12-rc6/include/linux/netdevice.h 2005-06-21 14:46:52.093190854 -0400
@@ -41,7 +41,7 @@
struct divert_blk;
struct vlan_group;
struct ethtool_ops;
-struct netpoll;
+struct netpoll_info;
/* source back-compat hooks */
#define SET_ETHTOOL_OPS(netdev,ops) \
( (netdev)->ethtool_ops = (ops) )
@@ -468,7 +468,7 @@ struct net_device
unsigned char *haddr);
int (*neigh_setup)(struct net_device *dev, struct neigh_parms *);
#ifdef CONFIG_NETPOLL
- struct netpoll *np;
+ struct netpoll_info *npinfo;
#endif
#ifdef CONFIG_NET_POLL_CONTROLLER
void (*poll_controller)(struct net_device *dev);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch,rfc] allow registration of multiple netpolls per interface
2005-06-21 21:41 [patch,rfc] allow registration of multiple netpolls per interface Jeff Moyer
@ 2005-06-21 22:52 ` Matt Mackall
2005-06-22 11:47 ` Jeff Moyer
0 siblings, 1 reply; 6+ messages in thread
From: Matt Mackall @ 2005-06-21 22:52 UTC (permalink / raw)
To: Jeff Moyer; +Cc: netdev, netdev, linux-kernel
On Tue, Jun 21, 2005 at 05:41:34PM -0400, Jeff Moyer wrote:
> Hi,
>
> This patch restores functionality that was removed when the recursive
> ->poll bug was fixed. Namely, it allows multiple netpoll clients to
> register against the same network interface.
Thanks. I've been neglecting this for a bit while I've been busy with
other things.
> In order to put things into perspective, I'm going to provide some
> background information. So, here is how things used to work:
>
> Multiple users of the netpoll interface could register themselves to send
> packets over the same interface. Any number of these netpoll clients could
> register an rx_hook, as well. However, only the very first in the list
> (hence the last one that registered), that matched the incoming interface,
> would be called when a packet arrived. The reason for this was not design,
> it was an oversight in the implementation. In practice, however, no one
> ever stumbled over this. (There are more subtleties when dealing with
> multiple rx_hooks registered to the same interface, but we'll ignore these,
> since no one ever ran into such problems.)
Hmm. It's conceivable we'll want netdump and kgdb on the same
interface so we'll have to address this eventually..
> Note that each netpoll client that registered an rx_hook was put on a
> netpoll_rx_list. This list was protected by a spinlock, and so operations
> which touched the rx routines would incur a locking penalty and a list
> traversal. I am mentioning this because the list and associated lock were
> removed when the code was refactored, and the patches I propose will
> reintroduce the lock, but not the list.
..so we'll probably want the list back in some form. Sigh.
> Moving to what we have today:
>
> Multiple netpoll clients can register to send packets over the same
> interface. That's right, you can actually do this. However, there are
> ugly side effects. Because we now have a pointer from the net_device to a
> struct netpoll, the last netpoll client to register will be pointed to by
> the net_device->np. What this means is that if you had two clients, the
> first registers an rx_hook and the second does not, then the netpoll code
> will not know that any device has actually registered an rx_hook (since the
> np pointer in the struct net_device is overwritten)! As a result, no
> incoming packets will be delivered to the registered rx routine. This is
> clearly undesirable behaviour.
>
> So what does the patch do?
>
> I created a new structure:
>
> struct netpoll_info {
> spinlock_t poll_lock;
> int poll_owner;
> int rx_flags;
> spinlock_t rx_lock;
> struct netpoll *rx_np; /* netpoll that registered an rx_hook */
> };
>
> This is the structure which gets pointed to by the net_device. All of the
> flags and locks which are specific to the INTERFACE go here. Any variables
> which must be kept per struct netpoll were left in the struct netpoll. So
> now, we have a cleaner separation of data and its scope.
>
> Since we never really supported having more than one struct netpoll
> register an rx_hook, I got rid of the rx_list. This is replaced by a
> single pointer in the netpoll_info structure (np_rx). We still need to
> protect addition or removal of the rx_np pointer, and so keep the lock
> (rx_lock). There is one lock per struct net_device, and I am certain that
> it will be 0 contention, as rx_np will only be changed during an insmod or
> rmmod. If people think this would be a good rcu candidate, let me know and
> I'll change it to use that locking scheme.
It might be simpler to have a single lock here..?
> In the process of making these changes, I've fixed a couple other minor
> bugs [1]. These fixes are included in this patch, but I will break them
> out if people agree with this approach.
>
> I have tested this by registering multiple netpoll clients, and verifying
> that they both function properly. I have not yet tried registering an
> rx_hook, but I believe the code should be sufficient to handle that case.
>
> And so, here is the full patch. I'd appreciate comments. Once we've
> reached consensus, I will resubmit as a patch series.
I think the general idea is sound. So let's take a look at the patch itself.
> Oh, and I've cc'd both netdev@oss.sgi.com and @vger.kernel.org. Is it safe
> to just use the vger list?
Yes.
> [1] netpoll_poll_unlock unlocked and then set the poll_owner. I've
> reversed the order of those operations. The netpoll_cleanup code could
> dereference a null pointer, that was fixed by virtue of being very
> different in the new case.
Ok, let's fix the lock ordering bit first.
> --- linux-2.6.12-rc6/net/core/netpoll.c.orig 2005-06-20 19:51:56.000000000 -0400
> +++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-21 16:03:22.409620400 -0400
> @@ -131,18 +131,19 @@ static int checksum_udp(struct sk_buff *
> static void poll_napi(struct netpoll *np)
> {
> int budget = 16;
> + struct netpoll_info *npinfo = np->dev->npinfo;
As a minor point of style, I like to put the "get my private info"
lines first.
> @@ -245,6 +246,7 @@ repeat:
> static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
> {
> int status;
> + struct netpoll_info *npinfo;
>
> repeat:
> if(!np || !np->dev || !netif_running(np->dev)) {
> @@ -253,7 +255,8 @@ repeat:
> }
>
> /* avoid recursion */
> - if(np->poll_owner == smp_processor_id() ||
> + npinfo = np->dev->npinfo;
Again, the npinfo assignment ought to happen as soon as possible.
> + if(npinfo->poll_owner == smp_processor_id() ||
> np->dev->xmit_lock_owner == smp_processor_id()) {
> if (np->drop)
> np->drop(skb);
> @@ -346,7 +349,15 @@ static void arp_reply(struct sk_buff *sk
> int size, type = ARPOP_REPLY, ptype = ETH_P_ARP;
> u32 sip, tip;
> struct sk_buff *send_skb;
> - struct netpoll *np = skb->dev->np;
> + struct netpoll *np;
> + struct netpoll_info *npinfo = skb->dev->npinfo;
> +
> + if (!npinfo) return;
We should only be replying to ARPs if we're trapped, right? How do we
get here with npinfo unset?
The return ought to be on a separate line, btw.
> + spin_lock_irqsave(&npinfo->rx_lock, flags);
> + if (npinfo->rx_np->dev == skb->dev)
> + np = npinfo->rx_np;
> + spin_unlock_irqrestore(&npinfo->rx_lock, flags);
And I think that means we don't need the lock here either.
> if (!np) return;
And the same question and style criticism of my own code.
> @@ -429,9 +440,9 @@ int __netpoll_rx(struct sk_buff *skb)
> int proto, len, ulen;
> struct iphdr *iph;
> struct udphdr *uh;
> - struct netpoll *np = skb->dev->np;
> + struct netpoll *np = skb->dev->npinfo->rx_np;
>
> - if (!np->rx_hook)
> + if (!np)
> goto out;
> if (skb->dev->type != ARPHRD_ETHER)
> goto out;
> @@ -611,9 +622,8 @@ int netpoll_setup(struct netpoll *np)
> {
> struct net_device *ndev = NULL;
> struct in_device *in_dev;
> -
> - np->poll_lock = SPIN_LOCK_UNLOCKED;
> - np->poll_owner = -1;
> + struct netpoll_info *npinfo;
> + unsigned long flags;
>
> if (np->dev_name)
> ndev = dev_get_by_name(np->dev_name);
> @@ -624,7 +634,17 @@ int netpoll_setup(struct netpoll *np)
> }
>
> np->dev = ndev;
> - ndev->np = np;
> + if (!ndev->npinfo) {
> + npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
> + if (!npinfo)
> + goto release;
> +
> + npinfo->rx_np = NULL;
> + npinfo->poll_lock = SPIN_LOCK_UNLOCKED;
> + npinfo->poll_owner = -1;
> + npinfo->rx_lock = SPIN_LOCK_UNLOCKED;
> + } else
> + npinfo = ndev->npinfo;
>
> if (!ndev->poll_controller) {
> printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
> @@ -692,13 +712,20 @@ int netpoll_setup(struct netpoll *np)
> np->name, HIPQUAD(np->local_ip));
> }
>
> - if(np->rx_hook)
> - np->rx_flags = NETPOLL_RX_ENABLED;
> + if(np->rx_hook) {
> + spin_lock_irqsave(&npinfo->rx_lock, flags);
> + npinfo->rx_flags |= NETPOLL_RX_ENABLED;
> + npinfo->rx_np = np;
> + spin_unlock_irqsave(&npinfo->rx_lock, flags);
> + }
> + /* last thing to do is link it to the net device structure */
> + ndev->npinfo = npinfo;
>
> return 0;
>
> release:
> - ndev->np = NULL;
> + if (!ndev->npinfo)
> + kfree(npinfo);
> np->dev = NULL;
> dev_put(ndev);
> return -1;
> @@ -706,9 +733,17 @@ int netpoll_setup(struct netpoll *np)
>
> void netpoll_cleanup(struct netpoll *np)
> {
> - if (np->dev)
> - np->dev->np = NULL;
> - dev_put(np->dev);
> + struct netpoll_info *npinfo;
> +
> + if (np->dev) {
> + npinfo = np->dev->npinfo;
> + if (npinfo && npinfo->rx_np == np) {
> + npinfo->rx_np = NULL;
> + npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
> + }
> + dev_put(np->dev);
> + }
> +
> np->dev = NULL;
> }
>
> --- linux-2.6.12-rc6/net/core/dev.c.orig 2005-06-20 19:51:59.000000000 -0400
> +++ linux-2.6.12-rc6/net/core/dev.c 2005-06-21 13:53:51.583407710 -0400
> @@ -1656,6 +1656,7 @@ int netif_receive_skb(struct sk_buff *sk
> unsigned short type;
>
> /* if we've gotten here through NAPI, check netpoll */
> + /* how else can we get here? --phro */
We can get here in the usual route of non-NAPI delivery, IIRC.
> if (skb->dev->poll && netpoll_rx(skb))
> return NET_RX_DROP;
>
> --- linux-2.6.12-rc6/include/linux/netpoll.h.orig 2005-06-20 19:51:47.000000000 -0400
> +++ linux-2.6.12-rc6/include/linux/netpoll.h 2005-06-21 15:29:48.994422229 -0400
> @@ -16,14 +16,19 @@ struct netpoll;
> struct netpoll {
> struct net_device *dev;
> char dev_name[16], *name;
> - int rx_flags;
> void (*rx_hook)(struct netpoll *, int, char *, int);
> void (*drop)(struct sk_buff *skb);
> u32 local_ip, remote_ip;
> u16 local_port, remote_port;
> unsigned char local_mac[6], remote_mac[6];
> +};
> +
> +struct netpoll_info {
> spinlock_t poll_lock;
> int poll_owner;
> + int rx_flags;
> + spinlock_t rx_lock;
> + struct netpoll *rx_np; /* netpoll that registered an rx_hook */
> };
>
> void netpoll_poll(struct netpoll *np);
> @@ -39,22 +44,35 @@ void netpoll_queue(struct sk_buff *skb);
> #ifdef CONFIG_NETPOLL
> static inline int netpoll_rx(struct sk_buff *skb)
> {
> - return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb);
> + struct netpoll_info *npinfo = skb->dev->npinfo;
> + unsigned long flags;
> + int ret = 0;
> +
> + if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags))
> + return 0;
> +
> + spin_lock_irqsave(&npinfo->rx_lock, flags);
> + /* check rx_flags again with the lock held */
> + if (npinfo->rx_flags && __netpoll_rx(skb))
> + ret = 1;
> + spin_unlock_irqrestore(&npinfo->rx_lock, flags);
> +
> + return ret;
> }
This is perhaps a problem due to cache line bouncing. Perhaps we can
use an atomic op and a memory barrier instead?
> static inline void netpoll_poll_lock(struct net_device *dev)
> {
> - if (dev->np) {
> - spin_lock(&dev->np->poll_lock);
> - dev->np->poll_owner = smp_processor_id();
> + if (dev->npinfo) {
> + spin_lock(&dev->npinfo->poll_lock);
> + dev->npinfo->poll_owner = smp_processor_id();
> }
> }
>
> static inline void netpoll_poll_unlock(struct net_device *dev)
> {
> - if (dev->np) {
> - spin_unlock(&dev->np->poll_lock);
> - dev->np->poll_owner = -1;
> + if (dev->npinfo) {
> + dev->npinfo->poll_owner = -1;
> + spin_unlock(&dev->npinfo->poll_lock);
> }
> }
>
> --- linux-2.6.12-rc6/include/linux/netdevice.h.orig 2005-06-20 20:26:21.000000000 -0400
> +++ linux-2.6.12-rc6/include/linux/netdevice.h 2005-06-21 14:46:52.093190854 -0400
> @@ -41,7 +41,7 @@
> struct divert_blk;
> struct vlan_group;
> struct ethtool_ops;
> -struct netpoll;
> +struct netpoll_info;
> /* source back-compat hooks */
> #define SET_ETHTOOL_OPS(netdev,ops) \
> ( (netdev)->ethtool_ops = (ops) )
> @@ -468,7 +468,7 @@ struct net_device
> unsigned char *haddr);
> int (*neigh_setup)(struct net_device *dev, struct neigh_parms *);
> #ifdef CONFIG_NETPOLL
> - struct netpoll *np;
> + struct netpoll_info *npinfo;
> #endif
> #ifdef CONFIG_NET_POLL_CONTROLLER
> void (*poll_controller)(struct net_device *dev);
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch,rfc] allow registration of multiple netpolls per interface
2005-06-21 22:52 ` Matt Mackall
@ 2005-06-22 11:47 ` Jeff Moyer
2005-06-22 17:01 ` Matt Mackall
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Moyer @ 2005-06-22 11:47 UTC (permalink / raw)
To: Matt Mackall; +Cc: netdev, netdev, linux-kernel
==> Regarding Re: [patch,rfc] allow registration of multiple netpolls per interface; Matt Mackall <mpm@selenic.com> adds:
mpm> On Tue, Jun 21, 2005 at 05:41:34PM -0400, Jeff Moyer wrote:
>> Hi,
>>
>> This patch restores functionality that was removed when the recursive
-> poll bug was fixed. Namely, it allows multiple netpoll clients to
>> register against the same network interface.
mpm> Thanks. I've been neglecting this for a bit while I've been busy with
mpm> other things.
>> In order to put things into perspective, I'm going to provide some
>> background information. So, here is how things used to work:
>>
>> Multiple users of the netpoll interface could register themselves to send
>> packets over the same interface. Any number of these netpoll clients could
>> register an rx_hook, as well. However, only the very first in the list
>> (hence the last one that registered), that matched the incoming interface,
>> would be called when a packet arrived. The reason for this was not design,
>> it was an oversight in the implementation. In practice, however, no one
>> ever stumbled over this. (There are more subtleties when dealing with
>> multiple rx_hooks registered to the same interface, but we'll ignore these,
>> since no one ever ran into such problems.)
mpm> Hmm. It's conceivable we'll want netdump and kgdb on the same
mpm> interface so we'll have to address this eventually..
Well, do you want to address it eventually, or now? As I said, it's never
bitten anyone before.
>> Note that each netpoll client that registered an rx_hook was put on a
>> netpoll_rx_list. This list was protected by a spinlock, and so operations
>> which touched the rx routines would incur a locking penalty and a list
>> traversal. I am mentioning this because the list and associated lock were
>> removed when the code was refactored, and the patches I propose will
>> reintroduce the lock, but not the list.
mpm> ..so we'll probably want the list back in some form. Sigh.
>> Moving to what we have today:
>>
>> Multiple netpoll clients can register to send packets over the same
>> interface. That's right, you can actually do this. However, there are
>> ugly side effects. Because we now have a pointer from the net_device to a
>> struct netpoll, the last netpoll client to register will be pointed to by
>> the net_device->np. What this means is that if you had two clients, the
>> first registers an rx_hook and the second does not, then the netpoll code
>> will not know that any device has actually registered an rx_hook (since the
>> np pointer in the struct net_device is overwritten)! As a result, no
>> incoming packets will be delivered to the registered rx routine. This is
>> clearly undesirable behaviour.
>>
>> So what does the patch do?
>>
>> I created a new structure:
>>
>> struct netpoll_info {
>> spinlock_t poll_lock;
>> int poll_owner;
>> int rx_flags;
>> spinlock_t rx_lock;
>> struct netpoll *rx_np; /* netpoll that registered an rx_hook */
>> };
>>
>> This is the structure which gets pointed to by the net_device. All of the
>> flags and locks which are specific to the INTERFACE go here. Any variables
>> which must be kept per struct netpoll were left in the struct netpoll. So
>> now, we have a cleaner separation of data and its scope.
>>
>> Since we never really supported having more than one struct netpoll
>> register an rx_hook, I got rid of the rx_list. This is replaced by a
>> single pointer in the netpoll_info structure (np_rx). We still need to
>> protect addition or removal of the rx_np pointer, and so keep the lock
>> (rx_lock). There is one lock per struct net_device, and I am certain that
>> it will be 0 contention, as rx_np will only be changed during an insmod or
>> rmmod. If people think this would be a good rcu candidate, let me know and
>> I'll change it to use that locking scheme.
mpm> It might be simpler to have a single lock here..?
Maybe. You can't really have netpoll code running on multiple cpus at the
same time, right? This is the rx path, remember, so the other cpu should
be spinning on the poll_lock.
Keeping separate locks would allow you to unregister a struct netpoll
associated with another net device without causing lock contention. This
is a very minor win, obviously.
I still feel like this npinfo struct is the right place for this, though.
If you're strongly opposed to that, I'll change it.
>> In the process of making these changes, I've fixed a couple other minor
>> bugs [1]. These fixes are included in this patch, but I will break them
>> out if people agree with this approach.
>>
>> I have tested this by registering multiple netpoll clients, and verifying
>> that they both function properly. I have not yet tried registering an
>> rx_hook, but I believe the code should be sufficient to handle that case.
>>
>> And so, here is the full patch. I'd appreciate comments. Once we've
>> reached consensus, I will resubmit as a patch series.
mpm> I think the general idea is sound. So let's take a look at the patch itself.
>> Oh, and I've cc'd both netdev@oss.sgi.com and @vger.kernel.org. Is it safe
>> to just use the vger list?
mpm> Yes.
>> [1] netpoll_poll_unlock unlocked and then set the poll_owner. I've
>> reversed the order of those operations. The netpoll_cleanup code could
>> dereference a null pointer, that was fixed by virtue of being very
>> different in the new case.
mpm> Ok, let's fix the lock ordering bit first.
>> --- linux-2.6.12-rc6/net/core/netpoll.c.orig 2005-06-20 19:51:56.000000000 -0400
>> +++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-21 16:03:22.409620400 -0400
>> @@ -131,18 +131,19 @@ static int checksum_udp(struct sk_buff *
>> static void poll_napi(struct netpoll *np)
>> {
>> int budget = 16;
>> + struct netpoll_info *npinfo = np->dev->npinfo;
mpm> As a minor point of style, I like to put the "get my private info"
mpm> lines first.
Quite the minor nit! It's the second line in the function! I'll fix it,
though. ;)
>> @@ -245,6 +246,7 @@ repeat:
>> static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
>> {
>> int status;
>> + struct netpoll_info *npinfo;
>>
>> repeat:
>> if(!np || !np->dev || !netif_running(np->dev)) {
>> @@ -253,7 +255,8 @@ repeat:
>> }
>>
>> /* avoid recursion */
>> - if(np->poll_owner == smp_processor_id() ||
>> + npinfo = np->dev->npinfo;
mpm> Again, the npinfo assignment ought to happen as soon as possible.
This is as soon as possible. Note that above we check to see if np and
np->dev are valid pointers. We can't get to the npinfo struct before we
know that.
>> + if(npinfo->poll_owner == smp_processor_id() ||
np-> dev->xmit_lock_owner == smp_processor_id()) {
>> if (np->drop)
np-> drop(skb);
>> @@ -346,7 +349,15 @@ static void arp_reply(struct sk_buff *sk
>> int size, type = ARPOP_REPLY, ptype = ETH_P_ARP;
>> u32 sip, tip;
>> struct sk_buff *send_skb;
>> - struct netpoll *np = skb->dev->np;
>> + struct netpoll *np;
>> + struct netpoll_info *npinfo = skb->dev->npinfo;
>> +
>> + if (!npinfo) return;
mpm> We should only be replying to ARPs if we're trapped, right? How do we
mpm> get here with npinfo unset?
Good point.
mpm> The return ought to be on a separate line, btw.
Agreed.
>> + spin_lock_irqsave(&npinfo->rx_lock, flags);
>> + if (npinfo->rx_np->dev == skb->dev)
>> + np = npinfo->rx_np;
>> + spin_unlock_irqrestore(&npinfo->rx_lock, flags);
mpm> And I think that means we don't need the lock here either.
Sure we do. We need to protect against rmmod's.
>> if (!np) return;
mpm> And the same question and style criticism of my own code.
;)
>> @@ -429,9 +440,9 @@ int __netpoll_rx(struct sk_buff *skb)
>> int proto, len, ulen;
>> struct iphdr *iph;
>> struct udphdr *uh;
>> - struct netpoll *np = skb->dev->np;
>> + struct netpoll *np = skb->dev->npinfo->rx_np;
>>
>> - if (!np->rx_hook)
>> + if (!np)
>> goto out;
>> if (skb->dev->type != ARPHRD_ETHER)
>> goto out;
>> @@ -611,9 +622,8 @@ int netpoll_setup(struct netpoll *np)
>> {
>> struct net_device *ndev = NULL;
>> struct in_device *in_dev;
>> -
>> - np->poll_lock = SPIN_LOCK_UNLOCKED;
>> - np->poll_owner = -1;
>> + struct netpoll_info *npinfo;
>> + unsigned long flags;
>>
>> if (np->dev_name)
>> ndev = dev_get_by_name(np->dev_name);
>> @@ -624,7 +634,17 @@ int netpoll_setup(struct netpoll *np)
>> }
>>
np-> dev = ndev;
>> - ndev->np = np;
>> + if (!ndev->npinfo) {
>> + npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
>> + if (!npinfo)
>> + goto release;
>> +
>> + npinfo->rx_np = NULL;
>> + npinfo->poll_lock = SPIN_LOCK_UNLOCKED;
>> + npinfo->poll_owner = -1;
>> + npinfo->rx_lock = SPIN_LOCK_UNLOCKED;
>> + } else
>> + npinfo = ndev->npinfo;
>>
>> if (!ndev->poll_controller) {
>> printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
>> @@ -692,13 +712,20 @@ int netpoll_setup(struct netpoll *np)
np-> name, HIPQUAD(np->local_ip));
>> }
>>
>> - if(np->rx_hook)
>> - np->rx_flags = NETPOLL_RX_ENABLED;
>> + if(np->rx_hook) {
>> + spin_lock_irqsave(&npinfo->rx_lock, flags);
>> + npinfo->rx_flags |= NETPOLL_RX_ENABLED;
>> + npinfo->rx_np = np;
>> + spin_unlock_irqsave(&npinfo->rx_lock, flags);
>> + }
>> + /* last thing to do is link it to the net device structure */
>> + ndev->npinfo = npinfo;
>>
>> return 0;
>>
>> release:
>> - ndev->np = NULL;
>> + if (!ndev->npinfo)
>> + kfree(npinfo);
np-> dev = NULL;
>> dev_put(ndev);
>> return -1;
>> @@ -706,9 +733,17 @@ int netpoll_setup(struct netpoll *np)
>>
>> void netpoll_cleanup(struct netpoll *np)
>> {
>> - if (np->dev)
>> - np->dev->np = NULL;
>> - dev_put(np->dev);
>> + struct netpoll_info *npinfo;
>> +
>> + if (np->dev) {
>> + npinfo = np->dev->npinfo;
>> + if (npinfo && npinfo->rx_np == np) {
>> + npinfo->rx_np = NULL;
>> + npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
>> + }
>> + dev_put(np->dev);
>> + }
>> +
np-> dev = NULL;
>> }
>>
>> --- linux-2.6.12-rc6/net/core/dev.c.orig 2005-06-20 19:51:59.000000000 -0400
>> +++ linux-2.6.12-rc6/net/core/dev.c 2005-06-21 13:53:51.583407710 -0400
>> @@ -1656,6 +1656,7 @@ int netif_receive_skb(struct sk_buff *sk
>> unsigned short type;
>>
>> /* if we've gotten here through NAPI, check netpoll */
>> + /* how else can we get here? --phro */
mpm> We can get here in the usual route of non-NAPI delivery, IIRC.
I couldn't find that path. I'll look again.
>> if (skb->dev->poll && netpoll_rx(skb))
>> return NET_RX_DROP;
>>
>> --- linux-2.6.12-rc6/include/linux/netpoll.h.orig 2005-06-20 19:51:47.000000000 -0400
>> +++ linux-2.6.12-rc6/include/linux/netpoll.h 2005-06-21 15:29:48.994422229 -0400
>> @@ -16,14 +16,19 @@ struct netpoll;
>> struct netpoll {
>> struct net_device *dev;
>> char dev_name[16], *name;
>> - int rx_flags;
>> void (*rx_hook)(struct netpoll *, int, char *, int);
>> void (*drop)(struct sk_buff *skb);
>> u32 local_ip, remote_ip;
>> u16 local_port, remote_port;
>> unsigned char local_mac[6], remote_mac[6];
>> +};
>> +
>> +struct netpoll_info {
>> spinlock_t poll_lock;
>> int poll_owner;
>> + int rx_flags;
>> + spinlock_t rx_lock;
>> + struct netpoll *rx_np; /* netpoll that registered an rx_hook */
>> };
>>
>> void netpoll_poll(struct netpoll *np);
>> @@ -39,22 +44,35 @@ void netpoll_queue(struct sk_buff *skb);
>> #ifdef CONFIG_NETPOLL
>> static inline int netpoll_rx(struct sk_buff *skb)
>> {
>> - return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb);
>> + struct netpoll_info *npinfo = skb->dev->npinfo;
>> + unsigned long flags;
>> + int ret = 0;
>> +
>> + if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags))
>> + return 0;
>> +
>> + spin_lock_irqsave(&npinfo->rx_lock, flags);
>> + /* check rx_flags again with the lock held */
>> + if (npinfo->rx_flags && __netpoll_rx(skb))
>> + ret = 1;
>> + spin_unlock_irqrestore(&npinfo->rx_lock, flags);
>> +
>> + return ret;
>> }
mpm> This is perhaps a problem due to cache line bouncing. Perhaps we can
mpm> use an atomic op and a memory barrier instead?
It really should be a 0 contention lock. Let's not optimize something that
doesn't need it. If we find that it causes problems, I'll be more than
happy to fix it.
Thanks for the review, Matt. I'll put together another patch, test it, and
repost later today.
-Jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch,rfc] allow registration of multiple netpolls per interface
2005-06-22 11:47 ` Jeff Moyer
@ 2005-06-22 17:01 ` Matt Mackall
2005-06-22 21:05 ` Jeff Moyer
0 siblings, 1 reply; 6+ messages in thread
From: Matt Mackall @ 2005-06-22 17:01 UTC (permalink / raw)
To: Jeff Moyer; +Cc: netdev, netdev, linux-kernel
On Wed, Jun 22, 2005 at 07:47:37AM -0400, Jeff Moyer wrote:
> mpm> Hmm. It's conceivable we'll want netdump and kgdb on the same
> mpm> interface so we'll have to address this eventually..
>
> Well, do you want to address it eventually, or now? As I said, it's never
> bitten anyone before.
Later's fine. I just don't want to design it out by accident again.
> >> struct netpoll_info {
> >> spinlock_t poll_lock;
> >> int poll_owner;
> >> int rx_flags;
> >> spinlock_t rx_lock;
> >> struct netpoll *rx_np; /* netpoll that registered an rx_hook */
> >> };
> >>
> >> This is the structure which gets pointed to by the net_device. All of the
> >> flags and locks which are specific to the INTERFACE go here. Any variables
> >> which must be kept per struct netpoll were left in the struct netpoll. So
> >> now, we have a cleaner separation of data and its scope.
> >>
> >> Since we never really supported having more than one struct netpoll
> >> register an rx_hook, I got rid of the rx_list. This is replaced by a
> >> single pointer in the netpoll_info structure (np_rx). We still need to
> >> protect addition or removal of the rx_np pointer, and so keep the lock
> >> (rx_lock). There is one lock per struct net_device, and I am certain that
> >> it will be 0 contention, as rx_np will only be changed during an insmod or
> >> rmmod. If people think this would be a good rcu candidate, let me know and
> >> I'll change it to use that locking scheme.
>
> mpm> It might be simpler to have a single lock here..?
>
> Maybe. You can't really have netpoll code running on multiple cpus at the
> same time, right? This is the rx path, remember, so the other cpu should
> be spinning on the poll_lock.
>
> Keeping separate locks would allow you to unregister a struct netpoll
> associated with another net device without causing lock contention. This
> is a very minor win, obviously.
>
> I still feel like this npinfo struct is the right place for this, though.
> If you're strongly opposed to that, I'll change it.
No, certainly having it in npinfo makes sense. I just was wondering if
we really need two locks in there.
> >> + spin_lock_irqsave(&npinfo->rx_lock, flags);
> >> + if (npinfo->rx_np->dev == skb->dev)
> >> + np = npinfo->rx_np;
> >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags);
>
> mpm> And I think that means we don't need the lock here either.
>
> Sure we do. We need to protect against rmmod's.
How can we have an rmmmod when we're trapped?
> >> static inline int netpoll_rx(struct sk_buff *skb)
> >> {
> >> - return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb);
> >> + struct netpoll_info *npinfo = skb->dev->npinfo;
> >> + unsigned long flags;
> >> + int ret = 0;
> >> +
> >> + if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags))
> >> + return 0;
> >> +
> >> + spin_lock_irqsave(&npinfo->rx_lock, flags);
> >> + /* check rx_flags again with the lock held */
> >> + if (npinfo->rx_flags && __netpoll_rx(skb))
> >> + ret = 1;
> >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags);
> >> +
> >> + return ret;
> >> }
>
> mpm> This is perhaps a problem due to cache line bouncing. Perhaps we can
> mpm> use an atomic op and a memory barrier instead?
>
> It really should be a 0 contention lock. Let's not optimize something that
> doesn't need it. If we find that it causes problems, I'll be more than
> happy to fix it.
Ok, fair enough.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch,rfc] allow registration of multiple netpolls per interface
2005-06-22 17:01 ` Matt Mackall
@ 2005-06-22 21:05 ` Jeff Moyer
2005-06-22 21:27 ` Matt Mackall
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Moyer @ 2005-06-22 21:05 UTC (permalink / raw)
To: Matt Mackall; +Cc: netdev, netdev, linux-kernel
==> Regarding Re: [patch,rfc] allow registration of multiple netpolls per interface; Matt Mackall <mpm@selenic.com> adds:
mpm> On Wed, Jun 22, 2005 at 07:47:37AM -0400, Jeff Moyer wrote:
mpm> Hmm. It's conceivable we'll want netdump and kgdb on the same
mpm> interface so we'll have to address this eventually..
>>
>> Well, do you want to address it eventually, or now? As I said, it's never
>> bitten anyone before.
mpm> Later's fine. I just don't want to design it out by accident again.
OK.
>> >> struct netpoll_info {
>> >> spinlock_t poll_lock;
>> >> int poll_owner;
>> >> int rx_flags;
>> >> spinlock_t rx_lock;
>> >> struct netpoll *rx_np; /* netpoll that registered an rx_hook */
>> >> };
[snip]
mpm> It might be simpler to have a single lock here..?
>>
>> Maybe. You can't really have netpoll code running on multiple cpus at the
>> same time, right? This is the rx path, remember, so the other cpu should
>> be spinning on the poll_lock.
>>
>> Keeping separate locks would allow you to unregister a struct netpoll
>> associated with another net device without causing lock contention. This
>> is a very minor win, obviously.
>>
>> I still feel like this npinfo struct is the right place for this, though.
>> If you're strongly opposed to that, I'll change it.
mpm> No, certainly having it in npinfo makes sense. I just was wondering if
mpm> we really need two locks in there.
Oh, I misunderstood. Well, one protects recursing into the driver's poll
routine, the other protects access to the np_rx pointer, which may later
become a list. I don't think we can lump these two together, do you?
>> >> + spin_lock_irqsave(&npinfo->rx_lock, flags);
>> >> + if (npinfo->rx_np->dev == skb->dev)
>> >> + np = npinfo->rx_np;
>> >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags);
>>
mpm> And I think that means we don't need the lock here either.
>>
>> Sure we do. We need to protect against rmmod's.
mpm> How can we have an rmmmod when we're trapped?
Looking over the code, I don't see what would prevent this. Could you
point me the code which prevents this?
It's a good thing we're discussing it, since I found that I didn't take the
lock in netpoll_cleanup.
Okay, so here's the full patch again, with the changes we've discussed.
I've also included an interdiff. As you can see, the first version I sent
didn't have some basic compile fixes, sorry about that. Anyway, I have
booted and tested this version with multiple netpoll clients.
Barring any negative feedback, I'll break this up and send it as a patch
series.
Thanks,
Jeff
(Interdiff first)
diff -u linux-2.6.12-rc6/net/core/netpoll.c linux-2.6.12-rc6/net/core/netpoll.c
--- linux-2.6.12-rc6/net/core/netpoll.c 2005-06-21 16:03:22.409620400 -0400
+++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-22 16:51:24.336062231 -0400
@@ -130,8 +130,8 @@
*/
static void poll_napi(struct netpoll *np)
{
- int budget = 16;
struct netpoll_info *npinfo = np->dev->npinfo;
+ int budget = 16;
if (test_bit(__LINK_STATE_RX_SCHED, &np->dev->state) &&
npinfo->poll_owner != smp_processor_id() &&
@@ -344,22 +344,22 @@
static void arp_reply(struct sk_buff *skb)
{
+ struct netpoll_info *npinfo = skb->dev->npinfo;
struct arphdr *arp;
unsigned char *arp_ptr;
int size, type = ARPOP_REPLY, ptype = ETH_P_ARP;
u32 sip, tip;
+ unsigned long flags;
struct sk_buff *send_skb;
- struct netpoll *np;
- struct netpoll_info *npinfo = skb->dev->npinfo;
-
- if (!npinfo) return;
+ struct netpoll *np = NULL;
spin_lock_irqsave(&npinfo->rx_lock, flags);
- if (npinfo->rx_np->dev == skb->dev)
+ if (npinfo->rx_np && npinfo->rx_np->dev == skb->dev)
np = npinfo->rx_np;
spin_unlock_irqrestore(&npinfo->rx_lock, flags);
- if (!np) return;
+ if (!np)
+ return;
/* No arp on this interface */
if (skb->dev->flags & IFF_NOARP)
@@ -716,7 +716,7 @@
spin_lock_irqsave(&npinfo->rx_lock, flags);
npinfo->rx_flags |= NETPOLL_RX_ENABLED;
npinfo->rx_np = np;
- spin_unlock_irqsave(&npinfo->rx_lock, flags);
+ spin_unlock_irqrestore(&npinfo->rx_lock, flags);
}
/* last thing to do is link it to the net device structure */
ndev->npinfo = npinfo;
@@ -734,12 +734,15 @@
void netpoll_cleanup(struct netpoll *np)
{
struct netpoll_info *npinfo;
+ unsigned long flags;
if (np->dev) {
npinfo = np->dev->npinfo;
if (npinfo && npinfo->rx_np == np) {
+ spin_lock_irqsave(&npinfo->rx_lock, flags);
npinfo->rx_np = NULL;
npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+ spin_unlock_irqrestore(&npinfo->rx_lock, flags);
}
dev_put(np->dev);
}
reverted:
--- linux-2.6.12-rc6/net/core/dev.c 2005-06-21 13:53:51.583407710 -0400
+++ linux-2.6.12-rc6/net/core/dev.c.orig 2005-06-20 19:51:59.000000000 -0400
@@ -1656,7 +1656,6 @@
unsigned short type;
/* if we've gotten here through NAPI, check netpoll */
- /* how else can we get here? --phro */
if (skb->dev->poll && netpoll_rx(skb))
return NET_RX_DROP;
And now, the full diff:
--- linux-2.6.12-rc6/net/core/netpoll.c.orig 2005-06-20 19:51:56.000000000 -0400
+++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-22 16:51:24.336062231 -0400
@@ -130,19 +130,20 @@ static int checksum_udp(struct sk_buff *
*/
static void poll_napi(struct netpoll *np)
{
+ struct netpoll_info *npinfo = np->dev->npinfo;
int budget = 16;
if (test_bit(__LINK_STATE_RX_SCHED, &np->dev->state) &&
- np->poll_owner != smp_processor_id() &&
- spin_trylock(&np->poll_lock)) {
- np->rx_flags |= NETPOLL_RX_DROP;
+ npinfo->poll_owner != smp_processor_id() &&
+ spin_trylock(&npinfo->poll_lock)) {
+ npinfo->rx_flags |= NETPOLL_RX_DROP;
atomic_inc(&trapped);
np->dev->poll(np->dev, &budget);
atomic_dec(&trapped);
- np->rx_flags &= ~NETPOLL_RX_DROP;
- spin_unlock(&np->poll_lock);
+ npinfo->rx_flags &= ~NETPOLL_RX_DROP;
+ spin_unlock(&npinfo->poll_lock);
}
}
@@ -245,6 +246,7 @@ repeat:
static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
{
int status;
+ struct netpoll_info *npinfo;
repeat:
if(!np || !np->dev || !netif_running(np->dev)) {
@@ -253,7 +255,8 @@ repeat:
}
/* avoid recursion */
- if(np->poll_owner == smp_processor_id() ||
+ npinfo = np->dev->npinfo;
+ if(npinfo->poll_owner == smp_processor_id() ||
np->dev->xmit_lock_owner == smp_processor_id()) {
if (np->drop)
np->drop(skb);
@@ -341,14 +344,22 @@ void netpoll_send_udp(struct netpoll *np
static void arp_reply(struct sk_buff *skb)
{
+ struct netpoll_info *npinfo = skb->dev->npinfo;
struct arphdr *arp;
unsigned char *arp_ptr;
int size, type = ARPOP_REPLY, ptype = ETH_P_ARP;
u32 sip, tip;
+ unsigned long flags;
struct sk_buff *send_skb;
- struct netpoll *np = skb->dev->np;
+ struct netpoll *np = NULL;
+
+ spin_lock_irqsave(&npinfo->rx_lock, flags);
+ if (npinfo->rx_np && npinfo->rx_np->dev == skb->dev)
+ np = npinfo->rx_np;
+ spin_unlock_irqrestore(&npinfo->rx_lock, flags);
- if (!np) return;
+ if (!np)
+ return;
/* No arp on this interface */
if (skb->dev->flags & IFF_NOARP)
@@ -429,9 +440,9 @@ int __netpoll_rx(struct sk_buff *skb)
int proto, len, ulen;
struct iphdr *iph;
struct udphdr *uh;
- struct netpoll *np = skb->dev->np;
+ struct netpoll *np = skb->dev->npinfo->rx_np;
- if (!np->rx_hook)
+ if (!np)
goto out;
if (skb->dev->type != ARPHRD_ETHER)
goto out;
@@ -611,9 +622,8 @@ int netpoll_setup(struct netpoll *np)
{
struct net_device *ndev = NULL;
struct in_device *in_dev;
-
- np->poll_lock = SPIN_LOCK_UNLOCKED;
- np->poll_owner = -1;
+ struct netpoll_info *npinfo;
+ unsigned long flags;
if (np->dev_name)
ndev = dev_get_by_name(np->dev_name);
@@ -624,7 +634,17 @@ int netpoll_setup(struct netpoll *np)
}
np->dev = ndev;
- ndev->np = np;
+ if (!ndev->npinfo) {
+ npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
+ if (!npinfo)
+ goto release;
+
+ npinfo->rx_np = NULL;
+ npinfo->poll_lock = SPIN_LOCK_UNLOCKED;
+ npinfo->poll_owner = -1;
+ npinfo->rx_lock = SPIN_LOCK_UNLOCKED;
+ } else
+ npinfo = ndev->npinfo;
if (!ndev->poll_controller) {
printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
@@ -692,13 +712,20 @@ int netpoll_setup(struct netpoll *np)
np->name, HIPQUAD(np->local_ip));
}
- if(np->rx_hook)
- np->rx_flags = NETPOLL_RX_ENABLED;
+ if(np->rx_hook) {
+ spin_lock_irqsave(&npinfo->rx_lock, flags);
+ npinfo->rx_flags |= NETPOLL_RX_ENABLED;
+ npinfo->rx_np = np;
+ spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+ }
+ /* last thing to do is link it to the net device structure */
+ ndev->npinfo = npinfo;
return 0;
release:
- ndev->np = NULL;
+ if (!ndev->npinfo)
+ kfree(npinfo);
np->dev = NULL;
dev_put(ndev);
return -1;
@@ -706,9 +733,20 @@ int netpoll_setup(struct netpoll *np)
void netpoll_cleanup(struct netpoll *np)
{
- if (np->dev)
- np->dev->np = NULL;
- dev_put(np->dev);
+ struct netpoll_info *npinfo;
+ unsigned long flags;
+
+ if (np->dev) {
+ npinfo = np->dev->npinfo;
+ if (npinfo && npinfo->rx_np == np) {
+ spin_lock_irqsave(&npinfo->rx_lock, flags);
+ npinfo->rx_np = NULL;
+ npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+ spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+ }
+ dev_put(np->dev);
+ }
+
np->dev = NULL;
}
--- linux-2.6.12-rc6/include/linux/netpoll.h.orig 2005-06-20 19:51:47.000000000 -0400
+++ linux-2.6.12-rc6/include/linux/netpoll.h 2005-06-21 15:29:48.000000000 -0400
@@ -16,14 +16,19 @@ struct netpoll;
struct netpoll {
struct net_device *dev;
char dev_name[16], *name;
- int rx_flags;
void (*rx_hook)(struct netpoll *, int, char *, int);
void (*drop)(struct sk_buff *skb);
u32 local_ip, remote_ip;
u16 local_port, remote_port;
unsigned char local_mac[6], remote_mac[6];
+};
+
+struct netpoll_info {
spinlock_t poll_lock;
int poll_owner;
+ int rx_flags;
+ spinlock_t rx_lock;
+ struct netpoll *rx_np; /* netpoll that registered an rx_hook */
};
void netpoll_poll(struct netpoll *np);
@@ -39,22 +44,35 @@ void netpoll_queue(struct sk_buff *skb);
#ifdef CONFIG_NETPOLL
static inline int netpoll_rx(struct sk_buff *skb)
{
- return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb);
+ struct netpoll_info *npinfo = skb->dev->npinfo;
+ unsigned long flags;
+ int ret = 0;
+
+ if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags))
+ return 0;
+
+ spin_lock_irqsave(&npinfo->rx_lock, flags);
+ /* check rx_flags again with the lock held */
+ if (npinfo->rx_flags && __netpoll_rx(skb))
+ ret = 1;
+ spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+
+ return ret;
}
static inline void netpoll_poll_lock(struct net_device *dev)
{
- if (dev->np) {
- spin_lock(&dev->np->poll_lock);
- dev->np->poll_owner = smp_processor_id();
+ if (dev->npinfo) {
+ spin_lock(&dev->npinfo->poll_lock);
+ dev->npinfo->poll_owner = smp_processor_id();
}
}
static inline void netpoll_poll_unlock(struct net_device *dev)
{
- if (dev->np) {
- spin_unlock(&dev->np->poll_lock);
- dev->np->poll_owner = -1;
+ if (dev->npinfo) {
+ dev->npinfo->poll_owner = -1;
+ spin_unlock(&dev->npinfo->poll_lock);
}
}
--- linux-2.6.12-rc6/include/linux/netdevice.h.orig 2005-06-20 20:26:21.000000000 -0400
+++ linux-2.6.12-rc6/include/linux/netdevice.h 2005-06-21 14:46:52.000000000 -0400
@@ -41,7 +41,7 @@
struct divert_blk;
struct vlan_group;
struct ethtool_ops;
-struct netpoll;
+struct netpoll_info;
/* source back-compat hooks */
#define SET_ETHTOOL_OPS(netdev,ops) \
( (netdev)->ethtool_ops = (ops) )
@@ -468,7 +468,7 @@ struct net_device
unsigned char *haddr);
int (*neigh_setup)(struct net_device *dev, struct neigh_parms *);
#ifdef CONFIG_NETPOLL
- struct netpoll *np;
+ struct netpoll_info *npinfo;
#endif
#ifdef CONFIG_NET_POLL_CONTROLLER
void (*poll_controller)(struct net_device *dev);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch,rfc] allow registration of multiple netpolls per interface
2005-06-22 21:05 ` Jeff Moyer
@ 2005-06-22 21:27 ` Matt Mackall
0 siblings, 0 replies; 6+ messages in thread
From: Matt Mackall @ 2005-06-22 21:27 UTC (permalink / raw)
To: Jeff Moyer; +Cc: netdev, netdev, linux-kernel
On Wed, Jun 22, 2005 at 05:05:15PM -0400, Jeff Moyer wrote:
> mpm> It might be simpler to have a single lock here..?
> >>
> >> Maybe. You can't really have netpoll code running on multiple cpus at the
> >> same time, right? This is the rx path, remember, so the other cpu should
> >> be spinning on the poll_lock.
> >>
> >> Keeping separate locks would allow you to unregister a struct netpoll
> >> associated with another net device without causing lock contention. This
> >> is a very minor win, obviously.
> >>
> >> I still feel like this npinfo struct is the right place for this, though.
> >> If you're strongly opposed to that, I'll change it.
>
> mpm> No, certainly having it in npinfo makes sense. I just was wondering if
> mpm> we really need two locks in there.
>
> Oh, I misunderstood. Well, one protects recursing into the driver's poll
> routine, the other protects access to the np_rx pointer, which may later
> become a list. I don't think we can lump these two together, do you?
I don't see why we couldn't, but let's worry about it later.
> >> >> + spin_lock_irqsave(&npinfo->rx_lock, flags);
> >> >> + if (npinfo->rx_np->dev == skb->dev)
> >> >> + np = npinfo->rx_np;
> >> >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags);
> >>
> mpm> And I think that means we don't need the lock here either.
> >>
> >> Sure we do. We need to protect against rmmod's.
>
> mpm> How can we have an rmmmod when we're trapped?
>
> Looking over the code, I don't see what would prevent this. Could you
> point me the code which prevents this?
I forgot we overloaded trapped for dealing with NAPI. Formerly
trapping meant "I'm stopping the box, drop every packet that's not
addressed to me" which also implied no one should be pulling the rug
out from under us.
> (Interdiff first)
Looks fine.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-06-22 21:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-21 21:41 [patch,rfc] allow registration of multiple netpolls per interface Jeff Moyer
2005-06-21 22:52 ` Matt Mackall
2005-06-22 11:47 ` Jeff Moyer
2005-06-22 17:01 ` Matt Mackall
2005-06-22 21:05 ` Jeff Moyer
2005-06-22 21:27 ` Matt Mackall
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).