* [PATCH] 8390: fix CONFIG_LOCKDEP error, 2.6.24-rc7
From: Frank Rowand @ 2008-01-15 22:23 UTC (permalink / raw)
To: p_gortmaker; +Cc: netdev
From: Frank Rowand <frank.rowand@am.sony.com>
Turning on CONFIG_LOCKDEP for CONFIG_PREEMPT invokes a path which may
sleep with IRQs disabled. Change disable_irq_nosync_lockdep() to
disable_irq_nosync(), etc. Note the comment near the top of
drivers/net/lib8390.c, which is an lkml email from Alan Cox, pre-saging
the need of this patch.
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
---
drivers/net/lib8390.c | 14 7 + 7 - 0 !
1 files changed, 7 insertions(+), 7 deletions(-)
Index: linux-2.6.24-rc5/drivers/net/lib8390.c
===================================================================
--- linux-2.6.24-rc5.orig/drivers/net/lib8390.c
+++ linux-2.6.24-rc5/drivers/net/lib8390.c
@@ -284,7 +284,7 @@ static void ei_tx_timeout(struct net_dev
/* Ugly but a reset can be slow, yet must be protected */
- disable_irq_nosync_lockdep(dev->irq);
+ disable_irq_nosync(dev->irq);
spin_lock(&ei_local->page_lock);
/* Try to restart the card. Perhaps the user has fixed something. */
@@ -292,7 +292,7 @@ static void ei_tx_timeout(struct net_dev
__NS8390_init(dev, 1);
spin_unlock(&ei_local->page_lock);
- enable_irq_lockdep(dev->irq);
+ enable_irq(dev->irq);
netif_wake_queue(dev);
}
@@ -334,7 +334,7 @@ static int ei_start_xmit(struct sk_buff
* Slow phase with lock held.
*/
- disable_irq_nosync_lockdep_irqsave(dev->irq, &flags);
+ disable_irq_nosync(dev->irq);
spin_lock(&ei_local->page_lock);
@@ -373,7 +373,7 @@ static int ei_start_xmit(struct sk_buff
netif_stop_queue(dev);
ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
spin_unlock(&ei_local->page_lock);
- enable_irq_lockdep_irqrestore(dev->irq, &flags);
+ enable_irq(dev->irq);
ei_local->stat.tx_errors++;
return 1;
}
@@ -414,7 +414,7 @@ static int ei_start_xmit(struct sk_buff
ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
spin_unlock(&ei_local->page_lock);
- enable_irq_lockdep_irqrestore(dev->irq, &flags);
+ enable_irq(dev->irq);
dev_kfree_skb (skb);
ei_local->stat.tx_bytes += send_length;
@@ -530,9 +530,9 @@ static irqreturn_t __ei_interrupt(int ir
#ifdef CONFIG_NET_POLL_CONTROLLER
static void __ei_poll(struct net_device *dev)
{
- disable_irq_lockdep(dev->irq);
+ disable_irq(dev->irq);
__ei_interrupt(dev->irq, dev);
- enable_irq_lockdep(dev->irq);
+ enable_irq(dev->irq);
}
#endif
^ permalink raw reply
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Jarek Poplawski @ 2008-01-15 22:32 UTC (permalink / raw)
To: slavon; +Cc: Patrick McHardy, netdev
In-Reply-To: <20080116004602.zn4y94e8sg0w4o8k@mail.bigtelecom.ru>
On Wed, Jan 16, 2008 at 12:46:02AM +0300, slavon@bigtelecom.ru wrote:
...
> Hmmm... i found way to fix this for me... but its not look good
>
> Scheme look like:
> Root - prio bands 3 priomap 0 0 0 0 ....
> --- Class 1
> --- Class 2
> -------- Copy of all table (Last this qdisc be root)
> --- Class 3
> -------- Copy of all table (Last this qdisc be root)
>
> 2. Add filter to root - flowid all packets to class 2
> 3. Delete qdisc at class 3
> 4. Create all table on class 3 (~20k qdiscs and 20k classes)
> 5. Replace filter on root - flowid all packets to class 3
> 6. If need update go to step 3, but use class 2
>
> All work good... and packets not dropeed =)
> But i have above 45 k classes and qdiscs.... After some time i will need
> patch to up max qdisc and classes more then 65k (> 0xfffe) =)))
> Also i have very bad TC commands performance then i have more then 10k rules.
Right! I get your idea (not all details...), and this really shows
there is needed something simpler for this.
Thanks,
Jarek P.
^ permalink raw reply
* Re: [PATCH] IPv6 support for NFS server
From: J. Bruce Fields @ 2008-01-15 22:32 UTC (permalink / raw)
To: Aurélien Charbon; +Cc: netdev ML, Brian Haley, Mailing list NFSv4
In-Reply-To: <475ED028.2010109@ext.bull.net>
[-- Attachment #1: Type: text/plain, Size: 921 bytes --]
On Tue, Dec 11, 2007 at 07:00:08PM +0100, Aurélien Charbon wrote:
> Brian Haley wrote:
>
>> In an email back on October 29th I sent-out a similar patch with a new
>> ipv6_addr_set_v4mapped() inline - it might be useful to pull that
>> piece into your patch since it cleans it up a bit to get rid of the
>> ipv6_addr_set() calls. I can re-send you that patch off-line if you
>> can't find it.
>>
>> -Brian
>>
>>
> Thanks Brian. I forgot to include your changes in my tree.
> OK Bruce you can take this one.
One trivial note: I'd prefer patches inline with the message, instead of
attached. If you need to attach it, please add From:, Subject: and a
patch comment in the standard format. Something like git-format-patch
will do all that stuff for you.
E.g. see below (also with a minor whitespace problem fixed--fun
scripts/checkpatch.pl before submitting and it'll catch that.)
--b.
[-- Attachment #2: TMP --]
[-- Type: text/plain, Size: 12802 bytes --]
>From c19360e877cfa1576dce98792cd513665deaa2ec Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Aur=C3=A9lien=20Charbon?= <aurelien.charbon@ext.bull.net>
Date: Fri, 21 Dec 2007 16:44:46 +0100
Subject: [PATCH] IPv6 support for NFS server
Prepare for IPv6 text-based mounts and exports.
Tested with IPv4 network and basic nfs ops (mount, file creation and
modification), and with unmodified nfs-utils-1.1.1 + CITI_NFS4_ALL
patch.
Signed-off-by: Aurelien Charbon <aurelien.charbon@bull.net>
Cc: Neil Brown <neilb@suse.de>
---
fs/nfsd/export.c | 10 ++-
fs/nfsd/nfsctl.c | 19 ++++++-
include/linux/sunrpc/svcauth.h | 4 +-
include/net/ipv6.h | 9 +++
net/sunrpc/svcauth_unix.c | 119 +++++++++++++++++++++++++++-------------
5 files changed, 116 insertions(+), 45 deletions(-)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index cbbc594..e29b431 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -35,6 +35,7 @@
#include <linux/lockd/bind.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/sunrpc/gss_api.h>
+#include <net/ipv6.h>
#define NFSDDBG_FACILITY NFSDDBG_EXPORT
@@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
{
struct auth_domain *dom;
int i, err;
+ struct in6_addr addr6;
/* First, consistency check. */
err = -EINVAL;
@@ -1574,9 +1576,11 @@ exp_addclient(struct nfsctl_client *ncp)
goto out_unlock;
/* Insert client into hashtable. */
- for (i = 0; i < ncp->cl_naddr; i++)
- auth_unix_add_addr(ncp->cl_addrlist[i], dom);
-
+ for (i = 0; i < ncp->cl_naddr; i++) {
+ /* Mapping address */
+ ipv6_addr_set_v4mapped(ncp->cl_addrlist[i].s_addr, &addr6);
+ auth_unix_add_addr(&addr6, dom);
+ }
auth_unix_forget_old(dom);
auth_domain_put(dom);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e307972..a8f7a90 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -37,6 +37,7 @@
#include <linux/nfsd/syscall.h>
#include <asm/uaccess.h>
+#include <net/ipv6.h>
/*
* We have a single directory with 9 nodes in it.
@@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
struct auth_domain *clp;
int err = 0;
struct knfsd_fh *res;
+ struct in6_addr in6;
if (size < sizeof(*data))
return -EINVAL;
@@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
res = (struct knfsd_fh*)buf;
exp_readlock();
- if (!(clp = auth_unix_lookup(sin->sin_addr)))
+
+ /* IPv6 address mapping */
+ ipv6_addr_set_v4mapped(
+ (((struct sockaddr_in *)&data->gd_addr)->sin_addr.s_addr),
+ &in6);
+
+ if (!(clp = auth_unix_lookup(&in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data->gd_path, res, data->gd_maxlen);
@@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
int err = 0;
struct knfsd_fh fh;
char *res;
+ struct in6_addr in6;
if (size < sizeof(*data))
return -EINVAL;
@@ -271,7 +280,13 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
res = buf;
sin = (struct sockaddr_in *)&data->gd_addr;
exp_readlock();
- if (!(clp = auth_unix_lookup(sin->sin_addr)))
+ /* IPv6 address mapping */
+ ipv6_addr_set_v4mapped(
+ (((struct sockaddr_in *)&data->gd_addr)->sin_addr.s_addr),
+ &in6
+ );
+
+ if (!(clp = auth_unix_lookup(&in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data->gd_path, &fh, NFS_FHSIZE);
diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index 22e1ef8..9e6fb86 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -120,10 +120,10 @@ extern void svc_auth_unregister(rpc_authflavor_t flavor);
extern struct auth_domain *unix_domain_find(char *name);
extern void auth_domain_put(struct auth_domain *item);
-extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom);
+extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom);
extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new);
extern struct auth_domain *auth_domain_find(char *name);
-extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
+extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr);
extern int auth_unix_forget_old(struct auth_domain *dom);
extern void svcauth_unix_purge(void);
extern void svcauth_unix_info_release(void *);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index ae328b6..9394710 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -400,6 +400,15 @@ static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
a->s6_addr32[2] == htonl(0x0000ffff));
}
+static inline void ipv6_addr_set_v4mapped(const __be32 addr,
+ struct in6_addr *v4mapped)
+{
+ ipv6_addr_set(v4mapped,
+ 0, 0,
+ htonl(0x0000FFFF),
+ addr);
+}
+
/*
* find the first different bit between two addresses
* length of address must be a multiple of 32bits
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 4114794..5fe8f1f 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -11,7 +11,8 @@
#include <linux/hash.h>
#include <linux/string.h>
#include <net/sock.h>
-
+#include <net/ipv6.h>
+#include <linux/kernel.h>
#define RPCDBG_FACILITY RPCDBG_AUTH
@@ -84,7 +85,7 @@ static void svcauth_unix_domain_release(struct auth_domain *dom)
struct ip_map {
struct cache_head h;
char m_class[8]; /* e.g. "nfsd" */
- struct in_addr m_addr;
+ struct in6_addr m_addr;
struct unix_domain *m_client;
int m_add_change;
};
@@ -112,12 +113,19 @@ static inline int hash_ip(__be32 ip)
return (hash ^ (hash>>8)) & 0xff;
}
#endif
+static inline int hash_ip6(struct in6_addr ip)
+{
+ return (hash_ip(ip.s6_addr32[0]) ^
+ hash_ip(ip.s6_addr32[1]) ^
+ hash_ip(ip.s6_addr32[2]) ^
+ hash_ip(ip.s6_addr32[3]));
+}
static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
{
struct ip_map *orig = container_of(corig, struct ip_map, h);
struct ip_map *new = container_of(cnew, struct ip_map, h);
return strcmp(orig->m_class, new->m_class) == 0
- && orig->m_addr.s_addr == new->m_addr.s_addr;
+ && ipv6_addr_equal(&orig->m_addr, &new->m_addr);
}
static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
{
@@ -125,7 +133,7 @@ static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
struct ip_map *item = container_of(citem, struct ip_map, h);
strcpy(new->m_class, item->m_class);
- new->m_addr.s_addr = item->m_addr.s_addr;
+ ipv6_addr_copy(&new->m_addr, &item->m_addr);
}
static void update(struct cache_head *cnew, struct cache_head *citem)
{
@@ -149,22 +157,24 @@ static void ip_map_request(struct cache_detail *cd,
struct cache_head *h,
char **bpp, int *blen)
{
- char text_addr[20];
+ char text_addr[40];
struct ip_map *im = container_of(h, struct ip_map, h);
- __be32 addr = im->m_addr.s_addr;
-
- snprintf(text_addr, 20, "%u.%u.%u.%u",
- ntohl(addr) >> 24 & 0xff,
- ntohl(addr) >> 16 & 0xff,
- ntohl(addr) >> 8 & 0xff,
- ntohl(addr) >> 0 & 0xff);
+ if (ipv6_addr_v4mapped(&(im->m_addr))) {
+ snprintf(text_addr, 20, NIPQUAD_FMT,
+ ntohl(im->m_addr.s6_addr32[3]) >> 24 & 0xff,
+ ntohl(im->m_addr.s6_addr32[3]) >> 16 & 0xff,
+ ntohl(im->m_addr.s6_addr32[3]) >> 8 & 0xff,
+ ntohl(im->m_addr.s6_addr32[3]) >> 0 & 0xff);
+ } else {
+ snprintf(text_addr, 40, NIP6_FMT, NIP6(im->m_addr));
+ }
qword_add(bpp, blen, im->m_class);
qword_add(bpp, blen, text_addr);
(*bpp)[-1] = '\n';
}
-static struct ip_map *ip_map_lookup(char *class, struct in_addr addr);
+static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr);
static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t expiry);
static int ip_map_parse(struct cache_detail *cd,
@@ -175,10 +185,10 @@ static int ip_map_parse(struct cache_detail *cd,
* for scratch: */
char *buf = mesg;
int len;
- int b1,b2,b3,b4;
+ int b1, b2, b3, b4, b5, b6, b7, b8;
char c;
char class[8];
- struct in_addr addr;
+ struct in6_addr addr;
int err;
struct ip_map *ipmp;
@@ -197,9 +207,26 @@ static int ip_map_parse(struct cache_detail *cd,
len = qword_get(&mesg, buf, mlen);
if (len <= 0) return -EINVAL;
- if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
+ if (sscanf(buf, NIPQUAD_FMT "%c", &b1, &b2, &b3, &b4, &c) == 4) {
+ addr.s6_addr32[0] = 0;
+ addr.s6_addr32[1] = 0;
+ addr.s6_addr32[2] = htonl(0xffff);
+ addr.s6_addr32[3] =
+ htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
+ } else if (sscanf(buf, NIP6_FMT "%c",
+ &b1, &b2, &b3, &b4, &b5, &b6, &b7, &b8, &c) == 8) {
+ addr.s6_addr16[0] = htons(b1);
+ addr.s6_addr16[1] = htons(b2);
+ addr.s6_addr16[2] = htons(b3);
+ addr.s6_addr16[3] = htons(b4);
+ addr.s6_addr16[4] = htons(b5);
+ addr.s6_addr16[5] = htons(b6);
+ addr.s6_addr16[6] = htons(b7);
+ addr.s6_addr16[7] = htons(b8);
+ } else
return -EINVAL;
+
expiry = get_expiry(&mesg);
if (expiry ==0)
return -EINVAL;
@@ -215,10 +242,7 @@ static int ip_map_parse(struct cache_detail *cd,
} else
dom = NULL;
- addr.s_addr =
- htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
-
- ipmp = ip_map_lookup(class,addr);
+ ipmp = ip_map_lookup(class, &addr);
if (ipmp) {
err = ip_map_update(ipmp,
container_of(dom, struct unix_domain, h),
@@ -238,7 +262,7 @@ static int ip_map_show(struct seq_file *m,
struct cache_head *h)
{
struct ip_map *im;
- struct in_addr addr;
+ struct in6_addr addr;
char *dom = "-no-domain-";
if (h == NULL) {
@@ -247,20 +271,24 @@ static int ip_map_show(struct seq_file *m,
}
im = container_of(h, struct ip_map, h);
/* class addr domain */
- addr = im->m_addr;
+ ipv6_addr_copy(&addr, &im->m_addr);
if (test_bit(CACHE_VALID, &h->flags) &&
!test_bit(CACHE_NEGATIVE, &h->flags))
dom = im->m_client->h.name;
- seq_printf(m, "%s %d.%d.%d.%d %s\n",
- im->m_class,
- ntohl(addr.s_addr) >> 24 & 0xff,
- ntohl(addr.s_addr) >> 16 & 0xff,
- ntohl(addr.s_addr) >> 8 & 0xff,
- ntohl(addr.s_addr) >> 0 & 0xff,
- dom
- );
+ if (ipv6_addr_v4mapped(&addr)) {
+ seq_printf(m, "%s" NIPQUAD_FMT "%s\n",
+ im->m_class,
+ ntohl(addr.s6_addr32[3]) >> 24 & 0xff,
+ ntohl(addr.s6_addr32[3]) >> 16 & 0xff,
+ ntohl(addr.s6_addr32[3]) >> 8 & 0xff,
+ ntohl(addr.s6_addr32[3]) >> 0 & 0xff,
+ dom);
+ } else {
+ seq_printf(m, "%s" NIP6_FMT "%s\n",
+ im->m_class, NIP6(addr), dom);
+ }
return 0;
}
@@ -280,16 +308,16 @@ struct cache_detail ip_map_cache = {
.alloc = ip_map_alloc,
};
-static struct ip_map *ip_map_lookup(char *class, struct in_addr addr)
+static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr)
{
struct ip_map ip;
struct cache_head *ch;
strcpy(ip.m_class, class);
- ip.m_addr = addr;
+ ipv6_addr_copy(&ip.m_addr, addr);
ch = sunrpc_cache_lookup(&ip_map_cache, &ip.h,
hash_str(class, IP_HASHBITS) ^
- hash_ip(addr.s_addr));
+ hash_ip6(*addr));
if (ch)
return container_of(ch, struct ip_map, h);
@@ -318,14 +346,14 @@ static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t ex
ch = sunrpc_cache_update(&ip_map_cache,
&ip.h, &ipm->h,
hash_str(ipm->m_class, IP_HASHBITS) ^
- hash_ip(ipm->m_addr.s_addr));
+ hash_ip6(ipm->m_addr));
if (!ch)
return -ENOMEM;
cache_put(ch, &ip_map_cache);
return 0;
}
-int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom)
+int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom)
{
struct unix_domain *udom;
struct ip_map *ipmp;
@@ -352,7 +380,7 @@ int auth_unix_forget_old(struct auth_domain *dom)
return 0;
}
-struct auth_domain *auth_unix_lookup(struct in_addr addr)
+struct auth_domain *auth_unix_lookup(struct in6_addr *addr)
{
struct ip_map *ipm;
struct auth_domain *rv;
@@ -641,9 +669,24 @@ static int unix_gid_find(uid_t uid, struct group_info **gip,
int
svcauth_unix_set_client(struct svc_rqst *rqstp)
{
- struct sockaddr_in *sin = svc_addr_in(rqstp);
+ struct sockaddr_in *sin;
+ struct sockaddr_in6 *sin6, sin6_storage;
struct ip_map *ipm;
+ switch (rqstp->rq_addr.ss_family) {
+ case AF_INET:
+ sin = svc_addr_in(rqstp);
+ sin6 = &sin6_storage;
+ ipv6_addr_set(&sin6->sin6_addr, 0, 0,
+ htonl(0x0000FFFF), sin->sin_addr.s_addr);
+ break;
+ case AF_INET6:
+ sin6 = svc_addr_in6(rqstp);
+ break;
+ default:
+ BUG();
+ }
+
rqstp->rq_client = NULL;
if (rqstp->rq_proc == 0)
return SVC_OK;
@@ -651,7 +694,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
ipm = ip_map_cached_get(rqstp);
if (ipm == NULL)
ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
- sin->sin_addr);
+ &sin6->sin6_addr);
if (ipm == NULL)
return SVC_DENIED;
--
1.5.4.rc2.60.gb2e62
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
NFSv4 mailing list
NFSv4@linux-nfs.org
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
^ permalink raw reply related
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Jarek Poplawski @ 2008-01-15 22:49 UTC (permalink / raw)
To: slavon; +Cc: Patrick McHardy, netdev
In-Reply-To: <20080116010459.676cchrf8ko4wk8o@mail.bigtelecom.ru>
On Wed, Jan 16, 2008 at 01:04:59AM +0300, slavon@bigtelecom.ru wrote:
> Good night! =)
Good morning! ;)
>
> Sorry... i was wrong...
> I see that problem more serious....
>
> Lets see to scheme
>
> Class 1
> ---qdisc
> ------- 10k classes
> Class 2
> ---qdisc
> ------- 10k classes
>
> All traffic go to class 2... class 1 qdisc not have packets and if we
> delete it - packets not lost... in theory... lets try delete class 1 qdisc
> (all childrens delete too)...
> PC freeze on 2-5 seconds... its not forward any traffic at this moment...
> its great tree lock?
>
> Its normal or code need to more accurate lock?
I don't think it's normal. On the other hand I've never had such
problems... Probably this all isn't optimized for such big operations,
so maybe you should try to do this with some smaller chunks?
Jarek P.
^ permalink raw reply
* Re: [PATCH] IPv6 support for NFS server
From: J. Bruce Fields @ 2008-01-15 23:16 UTC (permalink / raw)
To: Aurélien Charbon; +Cc: netdev ML, Brian Haley, Mailing list NFSv4
In-Reply-To: <20080115223221.GE5028@fieldses.org>
Mostly just more trivial stuff for now, apologies....:
On Tue, Jan 15, 2008 at 05:32:21PM -0500, J. Bruce Fields wrote:
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index cbbc594..e29b431 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -35,6 +35,7 @@
> #include <linux/lockd/bind.h>
> #include <linux/sunrpc/msg_prot.h>
> #include <linux/sunrpc/gss_api.h>
> +#include <net/ipv6.h>
>
> #define NFSDDBG_FACILITY NFSDDBG_EXPORT
>
> @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
> {
> struct auth_domain *dom;
> int i, err;
> + struct in6_addr addr6;
>
> /* First, consistency check. */
> err = -EINVAL;
> @@ -1574,9 +1576,11 @@ exp_addclient(struct nfsctl_client *ncp)
> goto out_unlock;
>
> /* Insert client into hashtable. */
> - for (i = 0; i < ncp->cl_naddr; i++)
> - auth_unix_add_addr(ncp->cl_addrlist[i], dom);
> -
> + for (i = 0; i < ncp->cl_naddr; i++) {
> + /* Mapping address */
> + ipv6_addr_set_v4mapped(ncp->cl_addrlist[i].s_addr, &addr6);
I think the name of the function explains well enough what it's doing,
so the preceding comment is superfluous.
> + auth_unix_add_addr(&addr6, dom);
> + }
> auth_unix_forget_old(dom);
> auth_domain_put(dom);
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index e307972..a8f7a90 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -37,6 +37,7 @@
> #include <linux/nfsd/syscall.h>
>
> #include <asm/uaccess.h>
> +#include <net/ipv6.h>
>
> /*
> * We have a single directory with 9 nodes in it.
> @@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
> struct auth_domain *clp;
> int err = 0;
> struct knfsd_fh *res;
> + struct in6_addr in6;
>
> if (size < sizeof(*data))
> return -EINVAL;
> @@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
> res = (struct knfsd_fh*)buf;
>
> exp_readlock();
> - if (!(clp = auth_unix_lookup(sin->sin_addr)))
> +
> + /* IPv6 address mapping */
> + ipv6_addr_set_v4mapped(
> + (((struct sockaddr_in *)&data->gd_addr)->sin_addr.s_addr),
> + &in6);
The case there appears to already have been done in the assignment of
"sin" a few lines above; so couldn't this last line just be written:
ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, &in6);
?
> +
> + if (!(clp = auth_unix_lookup(&in6)))
> err = -EPERM;
I'd rather assignments be made separately from tests, so:
clp = auth_unix_lookup(&in6);
if (!clp)
err = -EPERM;
Yeah, I know, that's not what the original code did, but as long as
we're modifying that line anyway....
> else {
> err = exp_rootfh(clp, data->gd_path, res, data->gd_maxlen);
> @@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
> int err = 0;
> struct knfsd_fh fh;
> char *res;
> + struct in6_addr in6;
>
> if (size < sizeof(*data))
> return -EINVAL;
> @@ -271,7 +280,13 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
> res = buf;
> sin = (struct sockaddr_in *)&data->gd_addr;
> exp_readlock();
> - if (!(clp = auth_unix_lookup(sin->sin_addr)))
> + /* IPv6 address mapping */
> + ipv6_addr_set_v4mapped(
> + (((struct sockaddr_in *)&data->gd_addr)->sin_addr.s_addr),
> + &in6
> + );
> +
> + if (!(clp = auth_unix_lookup(&in6)))
> err = -EPERM;
See both comments above.
> else {
> err = exp_rootfh(clp, data->gd_path, &fh, NFS_FHSIZE);
> diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
> index 22e1ef8..9e6fb86 100644
> --- a/include/linux/sunrpc/svcauth.h
> +++ b/include/linux/sunrpc/svcauth.h
> @@ -120,10 +120,10 @@ extern void svc_auth_unregister(rpc_authflavor_t flavor);
>
> extern struct auth_domain *unix_domain_find(char *name);
> extern void auth_domain_put(struct auth_domain *item);
> -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom);
> +extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom);
> extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new);
> extern struct auth_domain *auth_domain_find(char *name);
> -extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
> +extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr);
> extern int auth_unix_forget_old(struct auth_domain *dom);
> extern void svcauth_unix_purge(void);
> extern void svcauth_unix_info_release(void *);
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index ae328b6..9394710 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -400,6 +400,15 @@ static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
> a->s6_addr32[2] == htonl(0x0000ffff));
> }
>
> +static inline void ipv6_addr_set_v4mapped(const __be32 addr,
> + struct in6_addr *v4mapped)
> +{
> + ipv6_addr_set(v4mapped,
> + 0, 0,
> + htonl(0x0000FFFF),
> + addr);
If the function call will fit on one line, don't break it into separate
lines unless there's a really good reason to.
> +}
> +
> /*
> * find the first different bit between two addresses
> * length of address must be a multiple of 32bits
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 4114794..5fe8f1f 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -11,7 +11,8 @@
> #include <linux/hash.h>
> #include <linux/string.h>
> #include <net/sock.h>
> -
> +#include <net/ipv6.h>
> +#include <linux/kernel.h>
> #define RPCDBG_FACILITY RPCDBG_AUTH
>
>
> @@ -84,7 +85,7 @@ static void svcauth_unix_domain_release(struct auth_domain *dom)
> struct ip_map {
> struct cache_head h;
> char m_class[8]; /* e.g. "nfsd" */
> - struct in_addr m_addr;
> + struct in6_addr m_addr;
> struct unix_domain *m_client;
> int m_add_change;
> };
> @@ -112,12 +113,19 @@ static inline int hash_ip(__be32 ip)
> return (hash ^ (hash>>8)) & 0xff;
> }
> #endif
> +static inline int hash_ip6(struct in6_addr ip)
> +{
> + return (hash_ip(ip.s6_addr32[0]) ^
> + hash_ip(ip.s6_addr32[1]) ^
> + hash_ip(ip.s6_addr32[2]) ^
> + hash_ip(ip.s6_addr32[3]));
> +}
> static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
> {
> struct ip_map *orig = container_of(corig, struct ip_map, h);
> struct ip_map *new = container_of(cnew, struct ip_map, h);
> return strcmp(orig->m_class, new->m_class) == 0
> - && orig->m_addr.s_addr == new->m_addr.s_addr;
> + && ipv6_addr_equal(&orig->m_addr, &new->m_addr);
> }
> static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
> {
> @@ -125,7 +133,7 @@ static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
> struct ip_map *item = container_of(citem, struct ip_map, h);
>
> strcpy(new->m_class, item->m_class);
> - new->m_addr.s_addr = item->m_addr.s_addr;
> + ipv6_addr_copy(&new->m_addr, &item->m_addr);
> }
> static void update(struct cache_head *cnew, struct cache_head *citem)
> {
> @@ -149,22 +157,24 @@ static void ip_map_request(struct cache_detail *cd,
> struct cache_head *h,
> char **bpp, int *blen)
> {
> - char text_addr[20];
> + char text_addr[40];
> struct ip_map *im = container_of(h, struct ip_map, h);
> - __be32 addr = im->m_addr.s_addr;
> -
> - snprintf(text_addr, 20, "%u.%u.%u.%u",
> - ntohl(addr) >> 24 & 0xff,
> - ntohl(addr) >> 16 & 0xff,
> - ntohl(addr) >> 8 & 0xff,
> - ntohl(addr) >> 0 & 0xff);
>
> + if (ipv6_addr_v4mapped(&(im->m_addr))) {
> + snprintf(text_addr, 20, NIPQUAD_FMT,
> + ntohl(im->m_addr.s6_addr32[3]) >> 24 & 0xff,
> + ntohl(im->m_addr.s6_addr32[3]) >> 16 & 0xff,
> + ntohl(im->m_addr.s6_addr32[3]) >> 8 & 0xff,
> + ntohl(im->m_addr.s6_addr32[3]) >> 0 & 0xff);
> + } else {
> + snprintf(text_addr, 40, NIP6_FMT, NIP6(im->m_addr));
> + }
OK, so if a given ipv6 address is in the range of ipv4-mapped addresses,
then the upcall will look just like an existing upcall, otherwise it's
going to have an ipv6 address in it. Got it.
> qword_add(bpp, blen, im->m_class);
> qword_add(bpp, blen, text_addr);
> (*bpp)[-1] = '\n';
> }
>
> -static struct ip_map *ip_map_lookup(char *class, struct in_addr addr);
> +static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr);
> static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t expiry);
>
> static int ip_map_parse(struct cache_detail *cd,
> @@ -175,10 +185,10 @@ static int ip_map_parse(struct cache_detail *cd,
> * for scratch: */
> char *buf = mesg;
> int len;
> - int b1,b2,b3,b4;
> + int b1, b2, b3, b4, b5, b6, b7, b8;
> char c;
> char class[8];
> - struct in_addr addr;
> + struct in6_addr addr;
> int err;
>
> struct ip_map *ipmp;
> @@ -197,9 +207,26 @@ static int ip_map_parse(struct cache_detail *cd,
> len = qword_get(&mesg, buf, mlen);
> if (len <= 0) return -EINVAL;
>
> - if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
> + if (sscanf(buf, NIPQUAD_FMT "%c", &b1, &b2, &b3, &b4, &c) == 4) {
> + addr.s6_addr32[0] = 0;
> + addr.s6_addr32[1] = 0;
> + addr.s6_addr32[2] = htonl(0xffff);
> + addr.s6_addr32[3] =
> + htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
> + } else if (sscanf(buf, NIP6_FMT "%c",
> + &b1, &b2, &b3, &b4, &b5, &b6, &b7, &b8, &c) == 8) {
> + addr.s6_addr16[0] = htons(b1);
> + addr.s6_addr16[1] = htons(b2);
> + addr.s6_addr16[2] = htons(b3);
> + addr.s6_addr16[3] = htons(b4);
> + addr.s6_addr16[4] = htons(b5);
> + addr.s6_addr16[5] = htons(b6);
> + addr.s6_addr16[6] = htons(b7);
> + addr.s6_addr16[7] = htons(b8);
> + } else
> return -EINVAL;
And the downcall will accept either ipv4 or ipv6. OK, makes sense, I
think.
>
> +
Extra blank line?
> expiry = get_expiry(&mesg);
> if (expiry ==0)
> return -EINVAL;
> @@ -215,10 +242,7 @@ static int ip_map_parse(struct cache_detail *cd,
> } else
> dom = NULL;
>
> - addr.s_addr =
> - htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
> -
> - ipmp = ip_map_lookup(class,addr);
> + ipmp = ip_map_lookup(class, &addr);
> if (ipmp) {
> err = ip_map_update(ipmp,
> container_of(dom, struct unix_domain, h),
> @@ -238,7 +262,7 @@ static int ip_map_show(struct seq_file *m,
> struct cache_head *h)
> {
> struct ip_map *im;
> - struct in_addr addr;
> + struct in6_addr addr;
> char *dom = "-no-domain-";
>
> if (h == NULL) {
> @@ -247,20 +271,24 @@ static int ip_map_show(struct seq_file *m,
> }
> im = container_of(h, struct ip_map, h);
> /* class addr domain */
> - addr = im->m_addr;
> + ipv6_addr_copy(&addr, &im->m_addr);
>
> if (test_bit(CACHE_VALID, &h->flags) &&
> !test_bit(CACHE_NEGATIVE, &h->flags))
> dom = im->m_client->h.name;
>
> - seq_printf(m, "%s %d.%d.%d.%d %s\n",
> - im->m_class,
> - ntohl(addr.s_addr) >> 24 & 0xff,
> - ntohl(addr.s_addr) >> 16 & 0xff,
> - ntohl(addr.s_addr) >> 8 & 0xff,
> - ntohl(addr.s_addr) >> 0 & 0xff,
> - dom
> - );
> + if (ipv6_addr_v4mapped(&addr)) {
> + seq_printf(m, "%s" NIPQUAD_FMT "%s\n",
> + im->m_class,
> + ntohl(addr.s6_addr32[3]) >> 24 & 0xff,
> + ntohl(addr.s6_addr32[3]) >> 16 & 0xff,
> + ntohl(addr.s6_addr32[3]) >> 8 & 0xff,
> + ntohl(addr.s6_addr32[3]) >> 0 & 0xff,
> + dom);
> + } else {
> + seq_printf(m, "%s" NIP6_FMT "%s\n",
> + im->m_class, NIP6(addr), dom);
> + }
> return 0;
> }
>
> @@ -280,16 +308,16 @@ struct cache_detail ip_map_cache = {
> .alloc = ip_map_alloc,
> };
>
> -static struct ip_map *ip_map_lookup(char *class, struct in_addr addr)
> +static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr)
> {
> struct ip_map ip;
> struct cache_head *ch;
>
> strcpy(ip.m_class, class);
> - ip.m_addr = addr;
> + ipv6_addr_copy(&ip.m_addr, addr);
> ch = sunrpc_cache_lookup(&ip_map_cache, &ip.h,
> hash_str(class, IP_HASHBITS) ^
> - hash_ip(addr.s_addr));
> + hash_ip6(*addr));
>
> if (ch)
> return container_of(ch, struct ip_map, h);
> @@ -318,14 +346,14 @@ static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t ex
> ch = sunrpc_cache_update(&ip_map_cache,
> &ip.h, &ipm->h,
> hash_str(ipm->m_class, IP_HASHBITS) ^
> - hash_ip(ipm->m_addr.s_addr));
> + hash_ip6(ipm->m_addr));
> if (!ch)
> return -ENOMEM;
> cache_put(ch, &ip_map_cache);
> return 0;
> }
>
> -int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom)
> +int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom)
> {
> struct unix_domain *udom;
> struct ip_map *ipmp;
> @@ -352,7 +380,7 @@ int auth_unix_forget_old(struct auth_domain *dom)
> return 0;
> }
>
> -struct auth_domain *auth_unix_lookup(struct in_addr addr)
> +struct auth_domain *auth_unix_lookup(struct in6_addr *addr)
> {
> struct ip_map *ipm;
> struct auth_domain *rv;
> @@ -641,9 +669,24 @@ static int unix_gid_find(uid_t uid, struct group_info **gip,
> int
> svcauth_unix_set_client(struct svc_rqst *rqstp)
> {
> - struct sockaddr_in *sin = svc_addr_in(rqstp);
> + struct sockaddr_in *sin;
> + struct sockaddr_in6 *sin6, sin6_storage;
> struct ip_map *ipm;
>
> + switch (rqstp->rq_addr.ss_family) {
> + case AF_INET:
> + sin = svc_addr_in(rqstp);
> + sin6 = &sin6_storage;
> + ipv6_addr_set(&sin6->sin6_addr, 0, 0,
> + htonl(0x0000FFFF), sin->sin_addr.s_addr);
> + break;
> + case AF_INET6:
> + sin6 = svc_addr_in6(rqstp);
> + break;
> + default:
> + BUG();
> + }
> +
> rqstp->rq_client = NULL;
> if (rqstp->rq_proc == 0)
> return SVC_OK;
> @@ -651,7 +694,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> ipm = ip_map_cached_get(rqstp);
> if (ipm == NULL)
> ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
> - sin->sin_addr);
> + &sin6->sin6_addr);
>
> if (ipm == NULL)
> return SVC_DENIED;
Anyway, no big problem is jumping out at me; if you fix up the small
things I mentioned above then I'll give it another reading.
Thanks! Sorry for taking so long to take a look at this.
--b.
^ permalink raw reply
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Glen Turner @ 2008-01-15 23:16 UTC (permalink / raw)
To: slavon; +Cc: netdev
In-Reply-To: <20080116004602.zn4y94e8sg0w4o8k@mail.bigtelecom.ru>
On Wed, 2008-01-16 at 00:46 +0300, slavon@bigtelecom.ru wrote:
> But i have above 45 k classes and qdiscs.... After some time i will
> need patch to up max qdisc and classes more then 65k (> 0xfffe) =)))
> Also i have very bad TC commands performance then i have more then 10k rules.
In contrast a "brand name" router will support 4 to 16 queues
per (sub-)interface. Your large number of queues exceeds
expectations.
What are you trying to do? You may be better off inventing a
new qdisc to meet your need (eg, to do per-IP traffic shaping
or, less complexly, a traffic shaping based on the value of mark
which might offend DiffServ purists) or have, say, 1000 output
rates based on a marking and use the ipset feature of Netfilter
to set the mark. Using 1000 rates gives an error of 0.1% which
is unlikely to be noticed by your customers given the larger
effects of shaping on TCP performance but is beneath the
level where you are noticing performance issues.
^ permalink raw reply
* [PATCH] ne2k: add minimal ethtool setting support
From: Stephen Hemminger @ 2008-01-15 23:48 UTC (permalink / raw)
To: Jeff Garzik, Paul Gortmaker; +Cc: netdev
Add minimal ethtool settings support for ne2k driver. This is needed
for KVM/QEMU environment where ne2k seems to be the simplest stupid
hardware used.
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
--- a/drivers/net/ne2k-pci.c 2008-01-15 11:21:02.000000000 -0800
+++ b/drivers/net/ne2k-pci.c 2008-01-15 15:43:17.000000000 -0800
@@ -634,8 +634,21 @@ static void ne2k_pci_get_drvinfo(struct
strcpy(info->bus_info, pci_name(pci_dev));
}
+static int ne2k_pci_get_settings(struct net_device *dev,
+ struct ethtool_cmd *cmd)
+{
+ cmd->speed = SPEED_10;
+ cmd->duplex = (ei_status.ne2k_flags & FORCE_FDX)
+ ? DUPLEX_FULL : DUPLEX_HALF;
+ cmd->port = PORT_TP;
+ cmd->transceiver = XCVR_INTERNAL;
+ cmd->autoneg = AUTONEG_DISABLE;
+ return 0;
+}
+
static const struct ethtool_ops ne2k_pci_ethtool_ops = {
.get_drvinfo = ne2k_pci_get_drvinfo,
+ .get_settings = ne2k_get_settings,
};
static void __devexit ne2k_pci_remove_one (struct pci_dev *pdev)
--
Stephen Hemminger <stephen.hemminger@vyatta.com>
^ permalink raw reply
* Re: [Bugme-new] [Bug 9758] New: net_device refcnt bug when NFQUEUEing bridged packets
From: Andrew Morton @ 2008-01-15 23:56 UTC (permalink / raw)
To: netdev; +Cc: bugme-daemon, jckn
In-Reply-To: <bug-9758-10286@http.bugzilla.kernel.org/>
On Tue, 15 Jan 2008 15:28:31 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9758
>
> Summary: net_device refcnt bug when NFQUEUEing bridged packets
> Product: Networking
> Version: 2.5
> KernelVersion: 2.6.24-rc7
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Netfilter/Iptables
> AssignedTo: networking_netfilter-iptables@kernel-bugs.osdl.org
> ReportedBy: jckn@gmx.net
>
>
> The bug is probably around since the combination bridge+NFQUEUE is possible,
> and does not depend on distro or environment:
>
> Packets that are to be sent out over a bridge device are skb_clone()d in
> br_loop() before traversing the appropriate (FORWARD/OUTPUT) NF chain.
> The copies made by skb_clone() share their nf_bridge metadata with the
> original, which is no problem usually.
> If however one or more packets of a br_loop() run end up in a NFQUEUE,
> their shared nf_bridge metadata causes trouble when they are about to be
> reinjected: nf_reinject() decrements the net_device refcounts that were
> previously upped when queueing the packet in __nf_queue(), but as
> skb->nf_bridge->physoutdev points to the same device for all these
> packets, most (if not all) of them will affect the wrong refcnt.
>
> (I originally encountered the bug on a Xen host because the hypervisor
> refused to shutdown a virtual device with non-zero refcount... but it is
> perfectly reproducible with a standard kernel, too, although it was a
> bit more tedious to create a test scenario, involving a couple of UMLs.)
>
> I'd suggest to make a real copy of the nf_bridge member in br_loop() if
> CONFIG_BRIDGE_NETFILTER is defined, remedying the entanglement. I'd go ahead
> and create a patch, but I'm unsure as to where that logic should be
> implemented.
^ permalink raw reply
* Re: [PATCH 2/3] virtio: Net header needs gso_hdr_len
From: Herbert Xu @ 2008-01-16 0:06 UTC (permalink / raw)
To: Rusty Russell; +Cc: netdev, virtualization
In-Reply-To: <200801152143.37112.rusty@rustcorp.com.au>
Rusty Russell <rusty@rustcorp.com.au> wrote:
> It's far easier to deal with GSO if we don't have to parse the packet
> to figure out the header length. Add the field to the virtio_net_hdr
> struct (and fix the spaces that somehow crept in there).
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
> drivers/net/virtio_net.c | 4 +++-
> include/linux/virtio_net.h | 11 ++++++-----
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff -r 24ef33a4ab14 drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c Tue Jan 15 16:59:58 2008 +1100
> +++ b/drivers/net/virtio_net.c Tue Jan 15 21:21:40 2008 +1100
> @@ -126,6 +126,7 @@ static void receive_skb(struct net_devic
> /* Header must be checked, and gso_segs computed. */
> skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
> skb_shinfo(skb)->gso_segs = 0;
> + skb_set_transport_header(skb, hdr->gso_hdr_len);
Why do we need this? When receiving GSO packets from an untrusted
source the network stack will fill in the transport header offset
after verifying that the headers are sane.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: questions on NAPI processing latency and dropped network packets
From: Herbert Xu @ 2008-01-16 0:17 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: cfriesen, davem, netdev, linux-kernel
In-Reply-To: <20080115202905.GA2680@ami.dom.local>
Jarek Poplawski <jarkao2@gmail.com> wrote:
>
> So, it was more a rhetorical trick (sorry!) to suggest, that such a
> business model of being always late with kernels might be quite
> practical and reasonable for many companies, but looks like the
> worst possible development model for Linux.
Well people are always going to operate on this model for commercial
reasons. FWIW I used to work for a company that stuck to a specific
version of the Linux kernel, and I suppose I still do even now :)
But the important thing is that if you're going to do that, then the
cost that comes with it should be borne by the company and not the
community.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Zhang, Yanmin @ 2008-01-16 0:34 UTC (permalink / raw)
To: Herbert Xu; +Cc: Ilpo Järvinen, LKML, Netdev
In-Reply-To: <20080114105307.GA22866@gondor.apana.org.au>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 554 bytes --]
On Mon, 2008-01-14 at 21:53 +1100, Herbert Xu wrote:
> On Mon, Jan 14, 2008 at 08:44:40AM +0000, Ilpo Järvinen wrote:
> >
> > > > I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> > > > but the bisected kernel wasn't stable and went crazy.
> >
> > TCP work between that is very much non-existing.
>
> Make sure you haven't switched between SLAB/SLUB while testing this.
I can make sure. In addition, I tried both SLAB and SLUB and make sure the
regression is still there if CONFIG_SLAB=y.
Thanks,
-yanmin
^ permalink raw reply
* [PATCH] ethtool: update license field in specfile to be correctly defined
From: Jesse Brandeburg @ 2008-01-16 2:33 UTC (permalink / raw)
To: jeff; +Cc: jesse.brandeburg, netdev
From: Jesse Brandeburg <jesse.brandeburg@intel.com>
The ethtool spec file needs to have the correct syntax license line.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
ethtool.spec.in | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/ethtool.spec.in b/ethtool.spec.in
index 1705d7f..4ff736a 100644
--- a/ethtool.spec.in
+++ b/ethtool.spec.in
@@ -5,7 +5,7 @@ Group : Utilities
Summary : A tool for setting ethernet parameters
-Copyright : GPL
+License : GPL
URL : http://sourceforge.net/projects/gkernel/
Buildroot : %{_tmppath}/%{name}-%{version}
^ permalink raw reply related
* Re: [PATCH 3/4] bonding: Fix work rearming
From: Makito SHIOKAWA @ 2008-01-16 3:19 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: netdev
In-Reply-To: <20080115090532.GB1696@ff.dom.local>
This patch is supposing a case that bond_mii_monitor() is invoked in
bond_open(), and after that, 0 is set to miimon via sysfs (see same place on
other monitors).
Though message in bonding_store_miimon() says miimon value 1-INT_MAX rejected,
but it looks like 0 can be accepted and monitor must be stopped in that case.
>> Change code not to rearm bond_mii_monitor() when value 0 is set for miimon.
>>
>> Signed-off-by: Makito SHIOKAWA <mshiokawa@miraclelinux.com>
>> ---
>> drivers/net/bonding/bond_main.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -2388,7 +2388,8 @@ void bond_mii_monitor(struct work_struct
>>
>> delay = ((bond->params.miimon * HZ) / 1000) ? : 1;
>> read_unlock(&bond->lock);
>> - queue_delayed_work(bond->wq, &bond->mii_work, delay);
>> + if (bond->params.miimon)
>> + queue_delayed_work(bond->wq, &bond->mii_work, delay);
>> }
>
> Maybe I miss something, but is this bond_mii_monitor() function
> supposed to be ever started if (!bond->params.miimon)? (IOW: isn't
> it enough to control this where the parameter is changed only?)
>
> Regards,
> Jarek P.
--
Makito SHIOKAWA
MIRACLE LINUX CORPORATION
^ permalink raw reply
* Re: [PATCH 3/4] bonding: Fix work rearming
From: Makito SHIOKAWA @ 2008-01-16 3:28 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: netdev
In-Reply-To: <20080115090532.GB1696@ff.dom.local>
> Though message in bonding_store_miimon() says miimon value 1-INT_MAX
> rejected, but it looks like 0 can be accepted and monitor must be
> stopped in that case.
=> miimon value *not in range* 1-INT_MAX
--
Makito SHIOKAWA
MIRACLE LINUX CORPORATION
^ permalink raw reply
* Re: [PATCH net-2.6.25 4/7][ATM]: [br2864] routed support
From: Chung-Chi Lo @ 2008-01-16 4:16 UTC (permalink / raw)
To: chas williams - CONTRACTOR; +Cc: netdev, davem
In-Reply-To: <200712300107.lBU17O8i003566@cmf.nrl.navy.mil>
On Dec 30, 2007 9:07 AM, chas williams - CONTRACTOR
<chas@cmf.nrl.navy.mil> wrote:
> commit fea6b121bcc150fc91186e5012466c91944ce64d
> Author: Eric Kinzie <ekinzie@cmf.nrl.navy.mil>
> Date: Fri Oct 26 08:05:08 2007 -0400
>
> [ATM]: [br2864] routed support
>
> From: Eric Kinzie <ekinzie@cmf.nrl.navy.mil>
> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
>
> @@ -320,35 +361,50 @@ static void br2684_push(struct atm_vcc *atmvcc, struct sk_buff *skb)
> } else {
> /* first 2 chars should be 0 */
> if (*((u16 *) (skb->data)) != 0) {
> brdev->stats.rx_errors++;
> dev_kfree_skb(skb);
> return;
> }
> + skb_pull(skb, BR2684_PAD_LEN + ETH_HLEN); /* pad, dstmac, srcmac, ethtype */
> + skb->protocol = eth_type_trans(skb, net_dev);
> }
Question to decode the encaps=VCMUX and bridge mode:
This line
skb_pull((skb, BR2684_PAD_LEN + ETH_HLEN);
should be replaced by
skb_pull((skb, BR2684_PAD_LEN);
?
By the way, this routed mode patch doesn't include encaps=VCMUX and
RFC2684 routed
protocol decapsulation?
--
Lino, Chung-Chi Lo
^ permalink raw reply
* Re: [PATCH 2/3] virtio: Net header needs gso_hdr_len
From: Rusty Russell @ 2008-01-16 4:19 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev, virtualization
In-Reply-To: <E1JEvnR-0006ax-00@gondolin.me.apana.org.au>
On Wednesday 16 January 2008 11:06:21 Herbert Xu wrote:
> Rusty Russell <rusty@rustcorp.com.au> wrote:
> > It's far easier to deal with GSO if we don't have to parse the packet
> > to figure out the header length. Add the field to the virtio_net_hdr
> > struct (and fix the spaces that somehow crept in there).
> >
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > ---
> > drivers/net/virtio_net.c | 4 +++-
> > include/linux/virtio_net.h | 11 ++++++-----
> > 2 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff -r 24ef33a4ab14 drivers/net/virtio_net.c
> > --- a/drivers/net/virtio_net.c Tue Jan 15 16:59:58 2008 +1100
> > +++ b/drivers/net/virtio_net.c Tue Jan 15 21:21:40 2008 +1100
> > @@ -126,6 +126,7 @@ static void receive_skb(struct net_devic
> > /* Header must be checked, and gso_segs computed. */
> > skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
> > skb_shinfo(skb)->gso_segs = 0;
> > + skb_set_transport_header(skb, hdr->gso_hdr_len);
>
> Why do we need this? When receiving GSO packets from an untrusted
> source the network stack will fill in the transport header offset
> after verifying that the headers are sane.
Thanks for clarifying; it simplifies things.
I'll re-test and resend.
Rusty.
^ permalink raw reply
* RE: [PATCH 1/3] drivers/misc :UCC based TDM driver for MPC83xx platforms.
From: Aggrwal Poonam @ 2008-01-16 4:20 UTC (permalink / raw)
To: Andrew Morton
Cc: rubini, linux-kernel, linuxppc-dev, netdev, kumar.gala,
Barkowski Michael, Phillips Kim, Kalra Ashish, Cutler Richard
In-Reply-To: <20080114131441.878979d2.akpm@linux-foundation.org>
Thanks Morton for your comments,
I shall incorporate them and reesnd the patch.
With Regards
Poonam
-----Original Message-----
From: Andrew Morton [mailto:akpm@linux-foundation.org]
Sent: Tuesday, January 15, 2008 2:45 AM
To: Aggrwal Poonam
Cc: rubini@vision.unipv.it; linux-kernel@vger.kernel.org;
linuxppc-dev@ozlabs.org; netdev@vger.kernel.org;
kumar.gala@freescale.co; Barkowski Michael; Phillips Kim; Kalra Ashish;
Cutler Richard
Subject: Re: [PATCH 1/3] drivers/misc :UCC based TDM driver for MPC83xx
platforms.
On Mon, 10 Dec 2007 17:34:44 +0530 (IST)
Poonam_Aggrwal-b10812 <b10812@freescale.com> wrote:
> From: Poonam Aggrwal <b10812@freescale.com>
>
> The UCC TDM driver basically multiplexes and demultiplexes data from
> different channels. It can interface with for example SLIC kind of
> devices to receive TDM data demultiplex it and send to upper
> applications. At the transmit end it receives data for different
> channels multiplexes it and sends them on the TDM channel. It
> internally uses TSA( Time Slot Assigner) which does multiplexing and
> demultiplexing, UCC to perform SDMA between host buffers and the TSA,
CMX to connect TSA to UCC.
>
> This driver will run on MPC8323E-RDB platforms.
>
> ...
>
> +#define PREV_PHASE(x) ((x == 0) ? MAX_PHASE : (x - 1)) #define
> +NEXT_PHASE(x) (((x + 1) > MAX_PHASE) ? 0 : (x + 1))
These macros can reference their arg more than once and are hence
dangerous. What does PREV_PHASE(foo++) do to foo?
And, in general: do not implement in cpp that which could have been
implemented in C.
> +static struct ucc_tdm_info utdm_primary_info = {
> + .uf_info = {
> + .tsa = 1,
> + .cdp = 1,
> + .cds = 1,
> + .ctsp = 1,
> + .ctss = 1,
> + .revd = 1,
> + .urfs = 0x128,
> + .utfs = 0x128,
> + .utfet = 0,
> + .utftt = 0x128,
> + .ufpt = 256,
> + .ttx_trx =
UCC_FAST_GUMR_TRANSPARENT_TTX_TRX_TRANSPARENT,
> + .tenc = UCC_FAST_TX_ENCODING_NRZ,
> + .renc = UCC_FAST_RX_ENCODING_NRZ,
> + .tcrc = UCC_FAST_16_BIT_CRC,
> + .synl = UCC_FAST_SYNC_LEN_NOT_USED,
> + },
> + .ucc_busy = 0,
> +};
> +
> +static struct ucc_tdm_info utdm_info[8];
> +
> +static void dump_siram(struct tdm_ctrl *tdm_c) { #if defined(DEBUG)
Microscopic note: kernel code tends to do
#ifdef FOO
if only one identifier is being tested and
#if defined(FOO) && defined(BAR)
if more than one is being tested.
There is no rational reason for this ;)
> + int i;
> + u16 phy_num_ts;
> +
> + phy_num_ts = tdm_c->physical_num_ts;
> +
> + pr_debug("SI TxRAM dump\n");
> + /* each slot entry in SI RAM is of 2 bytes */
> + for (i = 0; i < phy_num_ts * 2; i++)
> + pr_debug("%x ", in_8(&qe_immr->sir.tx[i]));
> + pr_debug("\nSI RxRAM dump\n");
> + for (i = 0; i < phy_num_ts * 2; i++)
> + pr_debug("%x ", in_8(&qe_immr->sir.rx[i]));
> + pr_debug("\n");
> +#endif
> +}
> +
> +/*
> + * converts u-law compressed samples to linear PCM
> + * If the CONFIG_TDM_LINEAR_PCM flag is not set the
> + * TDM driver receives u-law compressed data from the
> + * SLIC device. This function converts the compressed
> + * data to linear PCM and sends it to upper layers.
> + */
> +static inline int ulaw2int(unsigned char log) {
> + u32 sign, segment, temp, quant;
> + int val;
> +
> + temp = log ^ 0xFF;
> + sign = (temp & 0x80) >> 7;
> + segment = (temp & 0x70) >> 4;
> + quant = temp & 0x0F;
> + quant <<= 1;
> + quant += 33;
> + quant <<= segment;
> + if (sign)
> + val = 33 - quant;
> + else
> + val = quant - 33;
> +
> + val *= 4;
> + return val;
> +}
> +
> +/*
> + * converts linear PCM samples to u-law compressed format.
> + * If the CONFIG_TDM_LINEAR_PCM flag is not set the
> + * TDM driver calls this function to convert the PCM samples
> + * to u-law compressed format before sending them to SLIC
> + * device.
> + */
> +static inline u8 int2ulaw(short linear) {
> + u8 quant, ret;
> + u16 output, absol, temp;
> + u32 i, sign;
> + char segment;
> +
> + ret = 0;
> + if (linear >= 0)
> + linear = (linear >> 2);
> + else
> + linear = (0xc000 | (linear >> 2));
> +
> + absol = abs(linear) + 33;
> + temp = absol;
> + sign = (linear >= 0) ? 1 : 0;
> + for (i = 0; i < 16; i++) {
> + output = temp & 0x8000;
> + if (output)
> + break;
> + temp <<= 1;
> + }
> + segment = 11 - i;
> + quant = (absol >> segment) & 0x0F;
> + segment--;
> + segment <<= 4;
> + output = segment + quant;
> + if (absol > 8191)
> + output = 0x7F;
> + if (sign)
> + ret ^= 0xFF;
> + else
> + ret ^= 0x7F;
> + return ret;
> +}
hrm, how many copies of ulaw/alaw conversion functions do we need in the
tree before someone writes a library function for it?
> + out_be16(&rx_bd->status, bd_status);
> + out_be32(&rx_bd->buf,
> + tdm_c->dma_input_addr + i * SAMPLE_DEPTH * act_num_ts);
> +
> + bd_status = (u16) ((T_R | T_CM | T_W) >> 16);
> + bd_len = SAMPLE_DEPTH * act_num_ts;
> + out_be16(&tx_bd->length, bd_len);
> + out_be16(&tx_bd->status, bd_status);
> + out_be32(&tx_bd->buf,
> + tdm_c->dma_output_addr + i * SAMPLE_DEPTH *
act_num_ts);
> +
> + config_si(tdm_c);
> +
> + setbits32(&qe_immr->ic.qimr, (0x80000000 >> ucc));
The compiler treats 0xNNN constants as unsigned so this works OK. I'd
have put a UL on the end of the constant to be sure ;)
> +static int tdm_start(struct tdm_ctrl *tdm_c) {
> + if (request_irq(tdm_c->ut_info->uf_info.irq, tdm_isr,
> + 0, "tdm", tdm_c)) {
> + printk(KERN_ERR "%s: request_irq for tdm_isr failed\n",
> + __FUNCTION__);
> + return -ENODEV;
> + }
> +
> + ucc_fast_enable(tdm_c->uf_private, COMM_DIR_RX | COMM_DIR_TX);
> +
> +#if !defined(CONFIG_TDM_LINEAR_PCM)
> + pr_info("%s 8-bit u-law compressed mode active\n",
__FUNCTION__);
> +#else
> + pr_info("%s 16-bit linear pcm mode active with"
> + " slots 0 & 2\n", __FUNCTION__);
> +#endif
Is this the sort of thing which should be controlled at compile-time?
I'd have thought that a runtime control would be more appropriate (a
sysfs knob or a module parameter). Or just work it out automagically?
> + dump_siram(tdm_c);
> + dump_ucc(tdm_c);
> +
> + setbits8(&(qe_immr->si1.siglmr1_h), (0x1 << tdm_c->tdm_port));
> + pr_info("%s UCC based TDM enabled\n", __FUNCTION__);
> +
> + return 0;
> +}
>
> ...
>
> +static void tdm_read(u32 driver_handle, short chn_id, short
*pcm_buffer,
> + short
len)
> +{
> + int i;
> + u32 phase_rx;
> + /* point to where to start for the current phase data processing
*/
> + u32 temp_rx;
> +
> + struct tdm_ctrl *tdm_c = (struct tdm_ctrl *)(driver_handle);
eek. What are we doing here, casting a 32-bit quantity to a kernel
pointer?
a) Seems to rule out ever using this driver on a 64-bit system
b) It's generally suspicious and indicates that some rethinking is
needed.
> +#if !defined(CONFIG_TDM_LINEAR_PCM)
> + u8 *input_tdm_buffer = tdm_c->tdm_input_data;
> +
> +#else
> + u16 *input_tdm_buffer =
> + (u16 *)tdm_c->tdm_input_data;
> +
> +#endif
> + phase_rx = tdm_c->phase_rx;
> + phase_rx = PREV_PHASE(phase_rx);
> +
> + temp_rx = phase_rx * SAMPLE_DEPTH * EFF_ACTIVE_CH;
> +
> +#if defined(UCC_CACHE_SNOOPING_DISABLED)
> + flush_dcache_range((size_t) &input_tdm_buffer[temp_rx],
> + (size_t) &input_tdm_buffer[temp_rx +
> + SAMPLE_DEPTH *
ACTIVE_CH]);
> +#endif
Again, is it appropriate that this behaviour be determined at
compile-time?
This is very user- and packager- and distributor-unfriendly.
> + for (i = 0; i < len; i++) {
> +#if !defined(CONFIG_TDM_LINEAR_PCM)
> + pcm_buffer[i] =
> + ulaw2int(input_tdm_buffer[i * EFF_ACTIVE_CH +
> + temp_rx + chn_id]);
> +#else
> + pcm_buffer[i] =
> + input_tdm_buffer[i * EFF_ACTIVE_CH + temp_rx +
chn_id]; #endif
> +
> + }
> +
> +}
> +
> +static int ucc_tdm_probe(struct of_device *ofdev,
> + const struct of_device_id *match) {
> + struct device_node *np = ofdev->node;
> + struct resource res;
> + const unsigned int *prop;
> + u32 ucc_num, device_num, err, ret = 0;
> + struct device_node *np_tmp = NULL;
> + dma_addr_t physaddr;
> + void *tdm_buff;
> + struct ucc_tdm_info *ut_info;
> +
> + prop = of_get_property(np, "device-id", NULL);
> + ucc_num = *prop - 1;
> + if ((ucc_num < 0) || (ucc_num > 7))
> + return -ENODEV;
> +
> + ut_info = &utdm_info[ucc_num];
> + if (ut_info == NULL) {
> + printk(KERN_ERR "additional data missing\n");
> + return -ENODEV;
> + }
> + if (ut_info->ucc_busy) {
> + printk(KERN_ERR "UCC in use by another TDM driver
instance\n");
> + return -EBUSY;
> + }
> +
> + ut_info->ucc_busy = 1;
> + tdm_ctrl[num_tdm_devices++] =
> + kzalloc(sizeof(struct tdm_ctrl), GFP_KERNEL);
Shouldn't this check for (num_tdm_devices > MAX_NUM_TDM_DEVICES))?
> + if (!tdm_ctrl[num_tdm_devices - 1]) {
> + printk(KERN_ERR "%s: no memory to allocate for"
> + " tdm control structure\n", __FUNCTION__);
> + num_tdm_devices--;
> + return -ENOMEM;
> + }
> + device_num = num_tdm_devices - 1;
> +
> + tdm_ctrl[device_num]->device = &ofdev->dev;
> + tdm_ctrl[device_num]->ut_info = ut_info;
> +
> + tdm_ctrl[device_num]->ut_info->uf_info.ucc_num = ucc_num;
> +
> + prop = of_get_property(np, "fsl,tdm-num", NULL);
> + if (prop == NULL) {
> + ret = -EINVAL;
> + goto get_property_error;
> + }
>
> ...
>
> +
> +#define SET_RX_SI_RAM(n, val) \
> + out_be16((u16 *)&qe_immr->sir.rx[(n)*2], (u16)(val))
> +
> +#define SET_TX_SI_RAM(n, val) \
> + out_be16((u16 *)&qe_immr->sir.tx[(n)*2], (u16)(val))
I don't think there's anything which requires that these be imlemented
in the preprocessor?
> +struct tdm_cfg {
> + u8 com_pin; /* Common receive and transmit pins
> + * 0 = separate pins
> + * 1 = common pins
> + */
> +
> + u8 fr_sync_level; /* SLx bit Frame Sync Polarity
> + * 0 = L1R/TSYNC active logic "1"
> + * 1 = L1R/TSYNC active logic "0"
> + */
> +
> + u8 clk_edge; /* CEx bit Tx Rx Clock Edge
> + * 0 = TX data on rising edge of clock
> + * RX data on falling edge
> + * 1 = TX data on falling edge of clock
> + * RX data on rising edge
> + */
> +
> + u8 fr_sync_edge; /* FEx bit Frame sync edge
> + * Determine when the sync pulses are
sampled
> + * 0 = Falling edge
> + * 1 = Rising edge
> + */
> +
> + u8 rx_fr_sync_delay; /* TFSDx/RFSDx bits Frame Sync Delay
> + * 00 = no bit delay
> + * 01 = 1 bit delay
> + * 10 = 2 bit delay
> + * 11 = 3 bit delay
> + */
> +
> + u8 tx_fr_sync_delay; /* TFSDx/RFSDx bits Frame Sync Delay
> + * 00 = no bit delay
> + * 01 = 1 bit delay
> + * 10 = 2 bit delay
> + * 11 = 3 bit delay
> + */
> +
> + u8 active_num_ts; /* Number of active time slots in TDM
> + * assume same active Rx/Tx time slots
> + */
> +};
Nice commenting.
^ permalink raw reply
* Re: [Bugme-new] [Bug 9758] New: net_device refcnt bug when NFQUEUEing bridged packets
From: Patrick McHardy @ 2008-01-16 4:54 UTC (permalink / raw)
To: jckn, Netfilter Development Mailinglist
Cc: Andrew Morton, netdev, bugme-daemon
In-Reply-To: <20080115155655.d1a24eaf.akpm@linux-foundation.org>
[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]
Andrew Morton wrote:
> On Tue, 15 Jan 2008 15:28:31 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=9758
>>
>> The bug is probably around since the combination bridge+NFQUEUE is possible,
>> and does not depend on distro or environment:
>>
>> Packets that are to be sent out over a bridge device are skb_clone()d in
>> br_loop() before traversing the appropriate (FORWARD/OUTPUT) NF chain.
>> The copies made by skb_clone() share their nf_bridge metadata with the
>> original, which is no problem usually.
>> If however one or more packets of a br_loop() run end up in a NFQUEUE,
>> their shared nf_bridge metadata causes trouble when they are about to be
>> reinjected: nf_reinject() decrements the net_device refcounts that were
>> previously upped when queueing the packet in __nf_queue(), but as
>> skb->nf_bridge->physoutdev points to the same device for all these
>> packets, most (if not all) of them will affect the wrong refcnt.
>>
>> (I originally encountered the bug on a Xen host because the hypervisor
>> refused to shutdown a virtual device with non-zero refcount... but it is
>> perfectly reproducible with a standard kernel, too, although it was a
>> bit more tedious to create a test scenario, involving a couple of UMLs.)
>>
>> I'd suggest to make a real copy of the nf_bridge member in br_loop() if
>> CONFIG_BRIDGE_NETFILTER is defined, remedying the entanglement. I'd go ahead
>> and create a patch, but I'm unsure as to where that logic should be
>> implemented.
Very nice catch, that explains quite a few bug reports about
refcnt leaks. Your patch looks correct and performs the copying
in the logically correct place, it would be nicer to keep this
crap limited to bridge netfilter however.
What should work is to perform the copying in br_netfilter.c
at the spots where phsyoutdev is assigned. As an optimization
we should be able to avoid the copying in most cases by
checking that the bridge info has a refcount above 1.
Could you test whether this patch also fixes the problem?
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1545 bytes --]
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 0e884fe..9759bd7 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -142,6 +142,22 @@ static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
return skb->nf_bridge;
}
+static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb)
+{
+ struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+
+ if (atomic_read(&nf_bridge->use) > 1) {
+ struct nf_bridge_info *tmp = nf_bridge_alloc(skb);
+
+ if (tmp) {
+ memcpy(tmp, nf_bridge, sizeof(struct nf_bridge_info));
+ nf_bridge_put(nf_bridge);
+ }
+ nf_bridge = tmp;
+ }
+ return nf_bridge;
+}
+
static inline void nf_bridge_push_encap_header(struct sk_buff *skb)
{
unsigned int len = nf_bridge_encap_header_len(skb);
@@ -637,6 +653,11 @@ static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff *skb,
if (!skb->nf_bridge)
return NF_ACCEPT;
+ /* Need exclusive nf_bridge_info since we might have multiple
+ * different physoutdevs. */
+ if (!nf_bridge_unshare(skb))
+ return NF_DROP;
+
parent = bridge_parent(out);
if (!parent)
return NF_DROP;
@@ -718,6 +739,11 @@ static unsigned int br_nf_local_out(unsigned int hook, struct sk_buff *skb,
if (!skb->nf_bridge)
return NF_ACCEPT;
+ /* Need exclusive nf_bridge_info since we might have multiple
+ * different physoutdevs. */
+ if (!nf_bridge_unshare(skb))
+ return NF_DROP;
+
nf_bridge = skb->nf_bridge;
if (!(nf_bridge->mask & BRNF_BRIDGED_DNAT))
return NF_ACCEPT;
^ permalink raw reply related
* Re: [Bugme-new] [Bug 9758] New: net_device refcnt bug when NFQUEUEing bridged packets
From: Patrick McHardy @ 2008-01-16 4:59 UTC (permalink / raw)
To: jckn, Netfilter Development Mailinglist
Cc: Andrew Morton, netdev, bugme-daemon
In-Reply-To: <478D8DF5.7080901@trash.net>
[-- Attachment #1: Type: text/plain, Size: 676 bytes --]
Patrick McHardy wrote:
> Very nice catch, that explains quite a few bug reports about
> refcnt leaks. Your patch looks correct and performs the copying
> in the logically correct place, it would be nicer to keep this
> crap limited to bridge netfilter however.
>
> What should work is to perform the copying in br_netfilter.c
> at the spots where phsyoutdev is assigned. As an optimization
> we should be able to avoid the copying in most cases by
> checking that the bridge info has a refcount above 1.
>
> Could you test whether this patch also fixes the problem?
That patch had a bug, we need to set the refcount of the
new bridge info to 1 after performing the copy.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1575 bytes --]
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 0e884fe..141f069 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -142,6 +142,23 @@ static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
return skb->nf_bridge;
}
+static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb)
+{
+ struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+
+ if (atomic_read(&nf_bridge->use) > 1) {
+ struct nf_bridge_info *tmp = nf_bridge_alloc(skb);
+
+ if (tmp) {
+ memcpy(tmp, nf_bridge, sizeof(struct nf_bridge_info));
+ atomic_set(&tmp->use, 1);
+ nf_bridge_put(nf_bridge);
+ }
+ nf_bridge = tmp;
+ }
+ return nf_bridge;
+}
+
static inline void nf_bridge_push_encap_header(struct sk_buff *skb)
{
unsigned int len = nf_bridge_encap_header_len(skb);
@@ -637,6 +654,11 @@ static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff *skb,
if (!skb->nf_bridge)
return NF_ACCEPT;
+ /* Need exclusive nf_bridge_info since we might have multiple
+ * different physoutdevs. */
+ if (!nf_bridge_unshare(skb))
+ return NF_DROP;
+
parent = bridge_parent(out);
if (!parent)
return NF_DROP;
@@ -718,6 +740,11 @@ static unsigned int br_nf_local_out(unsigned int hook, struct sk_buff *skb,
if (!skb->nf_bridge)
return NF_ACCEPT;
+ /* Need exclusive nf_bridge_info since we might have multiple
+ * different physoutdevs. */
+ if (!nf_bridge_unshare(skb))
+ return NF_DROP;
+
nf_bridge = skb->nf_bridge;
if (!(nf_bridge->mask & BRNF_BRIDGED_DNAT))
return NF_ACCEPT;
^ permalink raw reply related
* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: David Miller @ 2008-01-16 5:02 UTC (permalink / raw)
To: jesse.brandeburg; +Cc: slavon, elendil, netdev, linux-kernel
In-Reply-To: <36D9DB17C6DE9E40B059440DB8D95F52042FA541@orsmsx418.amr.corp.intel.com>
From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Date: Tue, 15 Jan 2008 13:53:43 -0800
> The tx code has an "early exit" that tries to limit the amount of tx
> packets handled in a single poll loop and requires napi or interrupt
> rescheduling based on the return value from e1000_clean_tx_irq.
That explains everything, thanks Jesse.
Ok, here is the patch I'll propose to fix this. The goal is to make
it as simple as possible without regressing the thing we were trying
to fix.
Something more sophisticated can be done later.
Three of the 5 Intel drivers had the TX breakout logic. e1000,
e1000e, and ixgbe. e100 and ixgb did not, so they don't have any
problems we need to fix here.
What the fix does is behave as if the budget was fully consumed if
*_clean_tx_irq() returns true.
The only valid way to return from ->poll() without copleting the NAPI
poll is by returning work_done == budget. That signals to the caller
that the NAPI instance has not been descheduled and therefore the
caller fully owns the NAPI context.
This does mean that for these drivers any time TX work is done, we'll
loop at least one extra time in the ->poll() loop of net_rx_work() but
that is historically what these drivers have caused to happen for
years.
For 2.6.25 or similar I would suggest investigating courses of action
to bring closure and consistency to this:
1) Determine whether the loop breakout is actually necessary.
Jesse explained to me that they had seen a case where a
thread on one cpu feeding the TX ring could keep a thread
on another cpu constantly running the *_clean_tx_irq() code
in a loop.
I find this hard to believe since even the slowest CPU should be
able to free up TX entries faster than they can be transmitted on
gigabit links :-)
2) If the investigation in #1 deems the breakout logic is necessary,
then consistently amongst all the 5 drivers a policy should be
implemented which is integrated with the NAPI budgetting logic.
For example, the simplest thing to do is to pass the budget and the
"work_done" thing down into *_clean_tx_irq() and break out if it is
exceeded.
As a further refinement we can say that TX work is about 1/4 the
expense of RX work and adjust the budget checking logic to match
that.
[NET]: Fix TX timeout regression in Intel drivers.
This fixes a regression added by changeset
53e52c729cc169db82a6105fac7a166e10c2ec36 ("[NET]: Make ->poll()
breakout consistent in Intel ethernet drivers.")
As pointed out by Jesse Brandeburg, for three of the drivers edited
above there is breakout logic in the *_clean_tx_irq() code to prevent
running TX reclaim forever. If this occurs, we have to elide NAPI
poll completion or else those TX events will never be serviced.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 13d57b0..0c9a6f7 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
{
struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -3929,14 +3929,17 @@ e1000_clean(struct napi_struct *napi, int budget)
* simultaneously. A failure obtaining the lock means
* tx_ring[0] is currently being cleaned anyway. */
if (spin_trylock(&adapter->tx_queue_lock)) {
- e1000_clean_tx_irq(adapter,
- &adapter->tx_ring[0]);
+ tx_cleaned = e1000_clean_tx_irq(adapter,
+ &adapter->tx_ring[0]);
spin_unlock(&adapter->tx_queue_lock);
}
adapter->clean_rx(adapter, &adapter->rx_ring[0],
&work_done, budget);
+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
if (likely(adapter->itr_setting & 3))
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4a6fc74..2ab3bfb 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1384,7 +1384,7 @@ static int e1000_clean(struct napi_struct *napi, int budget)
{
struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -1394,12 +1394,15 @@ static int e1000_clean(struct napi_struct *napi, int budget)
* simultaneously. A failure obtaining the lock means
* tx_ring is currently being cleaned anyway. */
if (spin_trylock(&adapter->tx_queue_lock)) {
- e1000_clean_tx_irq(adapter);
+ tx_cleaned = e1000_clean_tx_irq(adapter);
spin_unlock(&adapter->tx_queue_lock);
}
adapter->clean_rx(adapter, &work_done, budget);
+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
if (adapter->itr_setting & 3)
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a564916..de3f45e 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1468,13 +1468,16 @@ static int ixgbe_clean(struct napi_struct *napi, int budget)
struct ixgbe_adapter *adapter = container_of(napi,
struct ixgbe_adapter, napi);
struct net_device *netdev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;
/* In non-MSIX case, there is no multi-Tx/Rx queue */
- ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
+ tx_cleaned = ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
ixgbe_clean_rx_irq(adapter, &adapter->rx_ring[0], &work_done,
budget);
+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
netif_rx_complete(netdev, napi);
^ permalink raw reply related
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Patrick McHardy @ 2008-01-16 5:02 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478D226E.1050209@gmail.com>
Jarek Poplawski wrote:
> Patrick McHardy wrote, On 01/15/2008 05:05 PM:
>
>> Badalian Vyacheslav wrote:
>
> ...
>
>> Yes, packets in the old qdisc are lost.
>>
>>> Maybe if tc do changes - need create second queue (hash of rules or how
>>> you named it?) and do changes at it. Then replace old queue rules by
>>> created new.
>>> Logic -
>>> 1. Do snapshot
>>> 2. Do changes in shapshot
>>> 3. All new packets go to snapshot
>>> 4. If old queue not have packets - delete it.
>>> 5. Snapshot its default.
>>
>> That doesn't really work since qdiscs keep internal state that
>> in large parts depends on the packets queued. Take the qlen as
>> a simple example, the new qdisc doesn't know about the packets
>> in the old one and will exceed the limit.
>
> But, some similar alternative to killing packets 'to death' could
> be imagined, I suppose (in the future, of course!). So, e.g. doing
> the switch automatically after last packet has been dequeued (maybe
> even with some 'special' function/mode for this). After all even
> with accuracy lost, it could be less visible for clients than
> current way?
This would need support from the qdiscs to do it properly. Looks
non-trivial for HTB/HFSC/CBQ, but the others shouldn't be that hard.
^ permalink raw reply
* Re: Not understand some in htb_do_events function
From: David Miller @ 2008-01-16 5:07 UTC (permalink / raw)
To: kaber; +Cc: devik, slavon, netdev
In-Reply-To: <478CD741.7040004@trash.net>
From: Patrick McHardy <kaber@trash.net>
Date: Tue, 15 Jan 2008 16:54:41 +0100
> Martin Devera wrote:
> > to drain extra events asap. It the time of writing I was not able to
> > come with better solution and there were more bugs related to this
> > part of code than now.
>
> So this was meant to protect against endless loops?
I think, as Martin tried to explain further, it's trying
to avoid cases where the amount of work is extremely large.
Heuristics like this are by definition going to hit cases
where the value chosen is wrong or inappropriate. On fast
cpus maybe a value of 50000 instead of 500 would work "best"
Therefore I would suggest removing the limit altogether for now and in
the long term work on whatever deficiencies in the algorithm make this
expensive enough to warrant limits in the first place.
^ permalink raw reply
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Patrick McHardy @ 2008-01-16 5:12 UTC (permalink / raw)
To: slavon; +Cc: Jarek Poplawski, netdev
In-Reply-To: <20080116010459.676cchrf8ko4wk8o@mail.bigtelecom.ru>
slavon@bigtelecom.ru wrote:
> Good night! =)
>
> Sorry... i was wrong...
> I see that problem more serious....
>
> Lets see to scheme
>
> Class 1
> ---qdisc
> ------- 10k classes
> Class 2
> ---qdisc
> ------- 10k classes
>
> All traffic go to class 2... class 1 qdisc not have packets and if we
> delete it - packets not lost... in theory... lets try delete class 1
> qdisc (all childrens delete too)...
> PC freeze on 2-5 seconds... its not forward any traffic at this
> moment... its great tree lock?
>
> Its normal or code need to more accurate lock?
htb class destruction can be quite expansive if one of the rb trees
needs to be rebalanced. Doing that for 10000 classes would explain
the delay.
^ permalink raw reply
* Re: Not understand some in htb_do_events function
From: Patrick McHardy @ 2008-01-16 5:15 UTC (permalink / raw)
To: Martin Devera; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478D2C85.2090008@cdi.cz>
Martin Devera wrote:
>>
>> So this was meant to protect against endless loops?
>>
>>> We want way to smooth big burst of events over more dequeue invocations
>>> in order to not slow dequeue too much. Constant 500 is max. allowed
>>> "slowdown" of dequeue.
>>> Any bright idea how to do it more elegant, Patrick ?
>>
>>
>> Unfortunately not, but I believe simply removing the limit
>> completely would be better than picking an arbitary value.
>
> Grrr my comp crashed while I was writing this mail. Well the second
> attempt.
> When we allow unlimited events per dequeue, then there is case where
> all N classes in qdisc can be in the event queue with the same target
> time. Then they will all be acted on in the loop within single dequeue,
> costing us say some milliseconds. Additionaly, it tends to repeat itself
> then in cycles.
I see.
> Maybe it is acceptable but it seemed to me as rather big latency.
> Thus I wanted to do only limited work per dequeue call. One possibility
> is to remove the limit and "see what happend in wild".
>
> What do u think about to do limited no of transitions and then schedule
> tasklet to do the rest (again in limited buckets) ?
Alternatively we could just remove the printk and do what you
suggested first, return q->now + 1 to immediately continue
processing, but send out a packet first.
^ permalink raw reply
* Re: [PATCH 1/4] bonding: Fix work initing and cancelling
From: Makito SHIOKAWA @ 2008-01-16 5:17 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: netdev, Makito SHIOKAWA
In-Reply-To: <20080115105601.GC1696@ff.dom.local>
> I wonder why don't you use cancel_delayed_work_sync() here (and in a
> few other places), like in bond_work_cancel_all() from patch 2/4?
In bond_close(), you can't use cancel_delayed_work_sync() because you can't
unlock RTNL in it. (So, on current implementation, it becomes work cancel is
not ensured on bond_close().)
In sysfs, I think you are right, thanks. So, I will modify the patch as below
(other patches won't be affected).
---
drivers/net/bonding/bond_main.c | 46 ++++++++++++++++++++------------------
drivers/net/bonding/bond_sysfs.c | 34 ++++++++--------------------
drivers/net/bonding/bonding.h | 3 +-
3 files changed, 37 insertions(+), 46 deletions(-)
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2692,7 +2692,7 @@ out:
void bond_loadbalance_arp_mon(struct work_struct *work)
{
struct bonding *bond = container_of(work, struct bonding,
- arp_work.work);
+ lb_arp_work.work);
struct slave *slave, *oldcurrent;
int do_failover = 0;
int delta_in_ticks;
@@ -2803,7 +2803,7 @@ void bond_loadbalance_arp_mon(struct wor
re_arm:
if (bond->params.arp_interval)
- queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
+ queue_delayed_work(bond->wq, &bond->lb_arp_work, delta_in_ticks);
out:
read_unlock(&bond->lock);
}
@@ -2826,7 +2826,7 @@ out:
void bond_activebackup_arp_mon(struct work_struct *work)
{
struct bonding *bond = container_of(work, struct bonding,
- arp_work.work);
+ ab_arp_work.work);
struct slave *slave;
int delta_in_ticks;
int i;
@@ -3060,7 +3060,7 @@ void bond_activebackup_arp_mon(struct wo
re_arm:
if (bond->params.arp_interval) {
- queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
+ queue_delayed_work(bond->wq, &bond->ab_arp_work, delta_in_ticks);
}
out:
read_unlock(&bond->lock);
@@ -3679,30 +3679,23 @@ static int bond_open(struct net_device *
return -1;
}
- INIT_DELAYED_WORK(&bond->alb_work, bond_alb_monitor);
queue_delayed_work(bond->wq, &bond->alb_work, 0);
}
if (bond->params.miimon) { /* link check interval, in milliseconds. */
- INIT_DELAYED_WORK(&bond->mii_work, bond_mii_monitor);
queue_delayed_work(bond->wq, &bond->mii_work, 0);
}
if (bond->params.arp_interval) { /* arp interval, in milliseconds. */
if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
- INIT_DELAYED_WORK(&bond->arp_work,
- bond_activebackup_arp_mon);
+ queue_delayed_work(bond->wq, &bond->ab_arp_work, 0);
else
- INIT_DELAYED_WORK(&bond->arp_work,
- bond_loadbalance_arp_mon);
-
- queue_delayed_work(bond->wq, &bond->arp_work, 0);
+ queue_delayed_work(bond->wq, &bond->lb_arp_work, 0);
if (bond->params.arp_validate)
bond_register_arp(bond);
}
if (bond->params.mode == BOND_MODE_8023AD) {
- INIT_DELAYED_WORK(&bond->ad_work, bond_3ad_state_machine_handler);
queue_delayed_work(bond->wq, &bond->ad_work, 0);
/* register to receive LACPDUs */
bond_register_lacpdu(bond);
@@ -3736,7 +3729,10 @@ static int bond_close(struct net_device
}
if (bond->params.arp_interval) { /* arp interval, in milliseconds. */
- cancel_delayed_work(&bond->arp_work);
+ if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+ cancel_delayed_work(&bond->ab_arp_work);
+ else
+ cancel_delayed_work(&bond->lb_arp_work);
}
switch (bond->params.mode) {
@@ -4416,6 +4412,12 @@ static int bond_init(struct net_device *
if (!bond->wq)
return -ENOMEM;
+ INIT_DELAYED_WORK(&bond->alb_work, bond_alb_monitor);
+ INIT_DELAYED_WORK(&bond->mii_work, bond_mii_monitor);
+ INIT_DELAYED_WORK(&bond->ab_arp_work, bond_activebackup_arp_mon);
+ INIT_DELAYED_WORK(&bond->lb_arp_work, bond_loadbalance_arp_mon);
+ INIT_DELAYED_WORK(&bond->ad_work, bond_3ad_state_machine_handler);
+
/* Initialize pointers */
bond->first_slave = NULL;
bond->curr_active_slave = NULL;
@@ -4498,18 +4500,20 @@ static void bond_work_cancel_all(struct
bond->kill_timers = 1;
write_unlock_bh(&bond->lock);
- if (bond->params.miimon && delayed_work_pending(&bond->mii_work))
+ if (bond->params.miimon)
cancel_delayed_work(&bond->mii_work);
- if (bond->params.arp_interval && delayed_work_pending(&bond->arp_work))
- cancel_delayed_work(&bond->arp_work);
+ if (bond->params.arp_interval) {
+ if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+ cancel_delayed_work(&bond->ab_arp_work);
+ else
+ cancel_delayed_work(&bond->lb_arp_work);
+ }
- if (bond->params.mode == BOND_MODE_ALB &&
- delayed_work_pending(&bond->alb_work))
+ if (bond->params.mode == BOND_MODE_ALB)
cancel_delayed_work(&bond->alb_work);
- if (bond->params.mode == BOND_MODE_8023AD &&
- delayed_work_pending(&bond->ad_work))
+ if (bond->params.mode == BOND_MODE_8023AD)
cancel_delayed_work(&bond->ad_work);
}
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -643,10 +643,7 @@ static ssize_t bonding_store_arp_interva
"%s Disabling MII monitoring.\n",
bond->dev->name, bond->dev->name);
bond->params.miimon = 0;
- if (delayed_work_pending(&bond->mii_work)) {
- cancel_delayed_work(&bond->mii_work);
- flush_workqueue(bond->wq);
- }
+ cancel_delayed_work_sync(&bond->mii_work);
}
if (!bond->params.arp_targets[0]) {
printk(KERN_INFO DRV_NAME
@@ -660,16 +657,10 @@ static ssize_t bonding_store_arp_interva
* timer will get fired off when the open function
* is called.
*/
- if (!delayed_work_pending(&bond->arp_work)) {
- if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
- INIT_DELAYED_WORK(&bond->arp_work,
- bond_activebackup_arp_mon);
- else
- INIT_DELAYED_WORK(&bond->arp_work,
- bond_loadbalance_arp_mon);
-
- queue_delayed_work(bond->wq, &bond->arp_work, 0);
- }
+ if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+ queue_delayed_work(bond->wq, &bond->ab_arp_work, 0);
+ else
+ queue_delayed_work(bond->wq, &bond->lb_arp_work, 0);
}
out:
@@ -1022,10 +1013,10 @@ static ssize_t bonding_store_miimon(stru
bond->params.arp_validate =
BOND_ARP_VALIDATE_NONE;
}
- if (delayed_work_pending(&bond->arp_work)) {
- cancel_delayed_work(&bond->arp_work);
- flush_workqueue(bond->wq);
- }
+ if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+ cancel_delayed_work_sync(&bond->ab_arp_work);
+ else
+ cancel_delayed_work_sync(&bond->lb_arp_work);
}
if (bond->dev->flags & IFF_UP) {
@@ -1034,12 +1025,7 @@ static ssize_t bonding_store_miimon(stru
* timer will get fired off when the open function
* is called.
*/
- if (!delayed_work_pending(&bond->mii_work)) {
- INIT_DELAYED_WORK(&bond->mii_work,
- bond_mii_monitor);
- queue_delayed_work(bond->wq,
- &bond->mii_work, 0);
- }
+ queue_delayed_work(bond->wq, &bond->mii_work, 0);
}
}
out:
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -206,7 +206,8 @@ struct bonding {
struct packet_type arp_mon_pt;
struct workqueue_struct *wq;
struct delayed_work mii_work;
- struct delayed_work arp_work;
+ struct delayed_work ab_arp_work;
+ struct delayed_work lb_arp_work;
struct delayed_work alb_work;
struct delayed_work ad_work;
};
--
Makito SHIOKAWA
MIRACLE LINUX CORPORATION
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox