* Re: [PATCH 2/3] virtio: Net header needs gso_hdr_len
From: Herbert Xu @ 2008-01-16 0:06 UTC (permalink / raw)
To: Rusty Russell; +Cc: netdev, virtualization
In-Reply-To: <200801152143.37112.rusty@rustcorp.com.au>
Rusty Russell <rusty@rustcorp.com.au> wrote:
> It's far easier to deal with GSO if we don't have to parse the packet
> to figure out the header length. Add the field to the virtio_net_hdr
> struct (and fix the spaces that somehow crept in there).
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
> drivers/net/virtio_net.c | 4 +++-
> include/linux/virtio_net.h | 11 ++++++-----
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff -r 24ef33a4ab14 drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c Tue Jan 15 16:59:58 2008 +1100
> +++ b/drivers/net/virtio_net.c Tue Jan 15 21:21:40 2008 +1100
> @@ -126,6 +126,7 @@ static void receive_skb(struct net_devic
> /* Header must be checked, and gso_segs computed. */
> skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
> skb_shinfo(skb)->gso_segs = 0;
> + skb_set_transport_header(skb, hdr->gso_hdr_len);
Why do we need this? When receiving GSO packets from an untrusted
source the network stack will fill in the transport header offset
after verifying that the headers are sane.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [Bugme-new] [Bug 9758] New: net_device refcnt bug when NFQUEUEing bridged packets
From: Andrew Morton @ 2008-01-15 23:56 UTC (permalink / raw)
To: netdev; +Cc: bugme-daemon, jckn
In-Reply-To: <bug-9758-10286@http.bugzilla.kernel.org/>
On Tue, 15 Jan 2008 15:28:31 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9758
>
> Summary: net_device refcnt bug when NFQUEUEing bridged packets
> Product: Networking
> Version: 2.5
> KernelVersion: 2.6.24-rc7
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Netfilter/Iptables
> AssignedTo: networking_netfilter-iptables@kernel-bugs.osdl.org
> ReportedBy: jckn@gmx.net
>
>
> The bug is probably around since the combination bridge+NFQUEUE is possible,
> and does not depend on distro or environment:
>
> Packets that are to be sent out over a bridge device are skb_clone()d in
> br_loop() before traversing the appropriate (FORWARD/OUTPUT) NF chain.
> The copies made by skb_clone() share their nf_bridge metadata with the
> original, which is no problem usually.
> If however one or more packets of a br_loop() run end up in a NFQUEUE,
> their shared nf_bridge metadata causes trouble when they are about to be
> reinjected: nf_reinject() decrements the net_device refcounts that were
> previously upped when queueing the packet in __nf_queue(), but as
> skb->nf_bridge->physoutdev points to the same device for all these
> packets, most (if not all) of them will affect the wrong refcnt.
>
> (I originally encountered the bug on a Xen host because the hypervisor
> refused to shutdown a virtual device with non-zero refcount... but it is
> perfectly reproducible with a standard kernel, too, although it was a
> bit more tedious to create a test scenario, involving a couple of UMLs.)
>
> I'd suggest to make a real copy of the nf_bridge member in br_loop() if
> CONFIG_BRIDGE_NETFILTER is defined, remedying the entanglement. I'd go ahead
> and create a patch, but I'm unsure as to where that logic should be
> implemented.
^ permalink raw reply
* [PATCH] ne2k: add minimal ethtool setting support
From: Stephen Hemminger @ 2008-01-15 23:48 UTC (permalink / raw)
To: Jeff Garzik, Paul Gortmaker; +Cc: netdev
Add minimal ethtool settings support for ne2k driver. This is needed
for KVM/QEMU environment where ne2k seems to be the simplest stupid
hardware used.
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
--- a/drivers/net/ne2k-pci.c 2008-01-15 11:21:02.000000000 -0800
+++ b/drivers/net/ne2k-pci.c 2008-01-15 15:43:17.000000000 -0800
@@ -634,8 +634,21 @@ static void ne2k_pci_get_drvinfo(struct
strcpy(info->bus_info, pci_name(pci_dev));
}
+static int ne2k_pci_get_settings(struct net_device *dev,
+ struct ethtool_cmd *cmd)
+{
+ cmd->speed = SPEED_10;
+ cmd->duplex = (ei_status.ne2k_flags & FORCE_FDX)
+ ? DUPLEX_FULL : DUPLEX_HALF;
+ cmd->port = PORT_TP;
+ cmd->transceiver = XCVR_INTERNAL;
+ cmd->autoneg = AUTONEG_DISABLE;
+ return 0;
+}
+
static const struct ethtool_ops ne2k_pci_ethtool_ops = {
.get_drvinfo = ne2k_pci_get_drvinfo,
+ .get_settings = ne2k_get_settings,
};
static void __devexit ne2k_pci_remove_one (struct pci_dev *pdev)
--
Stephen Hemminger <stephen.hemminger@vyatta.com>
^ permalink raw reply
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Glen Turner @ 2008-01-15 23:16 UTC (permalink / raw)
To: slavon; +Cc: netdev
In-Reply-To: <20080116004602.zn4y94e8sg0w4o8k@mail.bigtelecom.ru>
On Wed, 2008-01-16 at 00:46 +0300, slavon@bigtelecom.ru wrote:
> But i have above 45 k classes and qdiscs.... After some time i will
> need patch to up max qdisc and classes more then 65k (> 0xfffe) =)))
> Also i have very bad TC commands performance then i have more then 10k rules.
In contrast a "brand name" router will support 4 to 16 queues
per (sub-)interface. Your large number of queues exceeds
expectations.
What are you trying to do? You may be better off inventing a
new qdisc to meet your need (eg, to do per-IP traffic shaping
or, less complexly, a traffic shaping based on the value of mark
which might offend DiffServ purists) or have, say, 1000 output
rates based on a marking and use the ipset feature of Netfilter
to set the mark. Using 1000 rates gives an error of 0.1% which
is unlikely to be noticed by your customers given the larger
effects of shaping on TCP performance but is beneath the
level where you are noticing performance issues.
^ permalink raw reply
* Re: [PATCH] IPv6 support for NFS server
From: J. Bruce Fields @ 2008-01-15 23:16 UTC (permalink / raw)
To: Aurélien Charbon; +Cc: netdev ML, Brian Haley, Mailing list NFSv4
In-Reply-To: <20080115223221.GE5028@fieldses.org>
Mostly just more trivial stuff for now, apologies....:
On Tue, Jan 15, 2008 at 05:32:21PM -0500, J. Bruce Fields wrote:
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index cbbc594..e29b431 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -35,6 +35,7 @@
> #include <linux/lockd/bind.h>
> #include <linux/sunrpc/msg_prot.h>
> #include <linux/sunrpc/gss_api.h>
> +#include <net/ipv6.h>
>
> #define NFSDDBG_FACILITY NFSDDBG_EXPORT
>
> @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
> {
> struct auth_domain *dom;
> int i, err;
> + struct in6_addr addr6;
>
> /* First, consistency check. */
> err = -EINVAL;
> @@ -1574,9 +1576,11 @@ exp_addclient(struct nfsctl_client *ncp)
> goto out_unlock;
>
> /* Insert client into hashtable. */
> - for (i = 0; i < ncp->cl_naddr; i++)
> - auth_unix_add_addr(ncp->cl_addrlist[i], dom);
> -
> + for (i = 0; i < ncp->cl_naddr; i++) {
> + /* Mapping address */
> + ipv6_addr_set_v4mapped(ncp->cl_addrlist[i].s_addr, &addr6);
I think the name of the function explains well enough what it's doing,
so the preceding comment is superfluous.
> + auth_unix_add_addr(&addr6, dom);
> + }
> auth_unix_forget_old(dom);
> auth_domain_put(dom);
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index e307972..a8f7a90 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -37,6 +37,7 @@
> #include <linux/nfsd/syscall.h>
>
> #include <asm/uaccess.h>
> +#include <net/ipv6.h>
>
> /*
> * We have a single directory with 9 nodes in it.
> @@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
> struct auth_domain *clp;
> int err = 0;
> struct knfsd_fh *res;
> + struct in6_addr in6;
>
> if (size < sizeof(*data))
> return -EINVAL;
> @@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
> res = (struct knfsd_fh*)buf;
>
> exp_readlock();
> - if (!(clp = auth_unix_lookup(sin->sin_addr)))
> +
> + /* IPv6 address mapping */
> + ipv6_addr_set_v4mapped(
> + (((struct sockaddr_in *)&data->gd_addr)->sin_addr.s_addr),
> + &in6);
The case there appears to already have been done in the assignment of
"sin" a few lines above; so couldn't this last line just be written:
ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, &in6);
?
> +
> + if (!(clp = auth_unix_lookup(&in6)))
> err = -EPERM;
I'd rather assignments be made separately from tests, so:
clp = auth_unix_lookup(&in6);
if (!clp)
err = -EPERM;
Yeah, I know, that's not what the original code did, but as long as
we're modifying that line anyway....
> else {
> err = exp_rootfh(clp, data->gd_path, res, data->gd_maxlen);
> @@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
> int err = 0;
> struct knfsd_fh fh;
> char *res;
> + struct in6_addr in6;
>
> if (size < sizeof(*data))
> return -EINVAL;
> @@ -271,7 +280,13 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
> res = buf;
> sin = (struct sockaddr_in *)&data->gd_addr;
> exp_readlock();
> - if (!(clp = auth_unix_lookup(sin->sin_addr)))
> + /* IPv6 address mapping */
> + ipv6_addr_set_v4mapped(
> + (((struct sockaddr_in *)&data->gd_addr)->sin_addr.s_addr),
> + &in6
> + );
> +
> + if (!(clp = auth_unix_lookup(&in6)))
> err = -EPERM;
See both comments above.
> else {
> err = exp_rootfh(clp, data->gd_path, &fh, NFS_FHSIZE);
> diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
> index 22e1ef8..9e6fb86 100644
> --- a/include/linux/sunrpc/svcauth.h
> +++ b/include/linux/sunrpc/svcauth.h
> @@ -120,10 +120,10 @@ extern void svc_auth_unregister(rpc_authflavor_t flavor);
>
> extern struct auth_domain *unix_domain_find(char *name);
> extern void auth_domain_put(struct auth_domain *item);
> -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom);
> +extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom);
> extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new);
> extern struct auth_domain *auth_domain_find(char *name);
> -extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
> +extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr);
> extern int auth_unix_forget_old(struct auth_domain *dom);
> extern void svcauth_unix_purge(void);
> extern void svcauth_unix_info_release(void *);
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index ae328b6..9394710 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -400,6 +400,15 @@ static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
> a->s6_addr32[2] == htonl(0x0000ffff));
> }
>
> +static inline void ipv6_addr_set_v4mapped(const __be32 addr,
> + struct in6_addr *v4mapped)
> +{
> + ipv6_addr_set(v4mapped,
> + 0, 0,
> + htonl(0x0000FFFF),
> + addr);
If the function call will fit on one line, don't break it into separate
lines unless there's a really good reason to.
> +}
> +
> /*
> * find the first different bit between two addresses
> * length of address must be a multiple of 32bits
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 4114794..5fe8f1f 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -11,7 +11,8 @@
> #include <linux/hash.h>
> #include <linux/string.h>
> #include <net/sock.h>
> -
> +#include <net/ipv6.h>
> +#include <linux/kernel.h>
> #define RPCDBG_FACILITY RPCDBG_AUTH
>
>
> @@ -84,7 +85,7 @@ static void svcauth_unix_domain_release(struct auth_domain *dom)
> struct ip_map {
> struct cache_head h;
> char m_class[8]; /* e.g. "nfsd" */
> - struct in_addr m_addr;
> + struct in6_addr m_addr;
> struct unix_domain *m_client;
> int m_add_change;
> };
> @@ -112,12 +113,19 @@ static inline int hash_ip(__be32 ip)
> return (hash ^ (hash>>8)) & 0xff;
> }
> #endif
> +static inline int hash_ip6(struct in6_addr ip)
> +{
> + return (hash_ip(ip.s6_addr32[0]) ^
> + hash_ip(ip.s6_addr32[1]) ^
> + hash_ip(ip.s6_addr32[2]) ^
> + hash_ip(ip.s6_addr32[3]));
> +}
> static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
> {
> struct ip_map *orig = container_of(corig, struct ip_map, h);
> struct ip_map *new = container_of(cnew, struct ip_map, h);
> return strcmp(orig->m_class, new->m_class) == 0
> - && orig->m_addr.s_addr == new->m_addr.s_addr;
> + && ipv6_addr_equal(&orig->m_addr, &new->m_addr);
> }
> static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
> {
> @@ -125,7 +133,7 @@ static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
> struct ip_map *item = container_of(citem, struct ip_map, h);
>
> strcpy(new->m_class, item->m_class);
> - new->m_addr.s_addr = item->m_addr.s_addr;
> + ipv6_addr_copy(&new->m_addr, &item->m_addr);
> }
> static void update(struct cache_head *cnew, struct cache_head *citem)
> {
> @@ -149,22 +157,24 @@ static void ip_map_request(struct cache_detail *cd,
> struct cache_head *h,
> char **bpp, int *blen)
> {
> - char text_addr[20];
> + char text_addr[40];
> struct ip_map *im = container_of(h, struct ip_map, h);
> - __be32 addr = im->m_addr.s_addr;
> -
> - snprintf(text_addr, 20, "%u.%u.%u.%u",
> - ntohl(addr) >> 24 & 0xff,
> - ntohl(addr) >> 16 & 0xff,
> - ntohl(addr) >> 8 & 0xff,
> - ntohl(addr) >> 0 & 0xff);
>
> + if (ipv6_addr_v4mapped(&(im->m_addr))) {
> + snprintf(text_addr, 20, NIPQUAD_FMT,
> + ntohl(im->m_addr.s6_addr32[3]) >> 24 & 0xff,
> + ntohl(im->m_addr.s6_addr32[3]) >> 16 & 0xff,
> + ntohl(im->m_addr.s6_addr32[3]) >> 8 & 0xff,
> + ntohl(im->m_addr.s6_addr32[3]) >> 0 & 0xff);
> + } else {
> + snprintf(text_addr, 40, NIP6_FMT, NIP6(im->m_addr));
> + }
OK, so if a given ipv6 address is in the range of ipv4-mapped addresses,
then the upcall will look just like an existing upcall, otherwise it's
going to have an ipv6 address in it. Got it.
> qword_add(bpp, blen, im->m_class);
> qword_add(bpp, blen, text_addr);
> (*bpp)[-1] = '\n';
> }
>
> -static struct ip_map *ip_map_lookup(char *class, struct in_addr addr);
> +static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr);
> static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t expiry);
>
> static int ip_map_parse(struct cache_detail *cd,
> @@ -175,10 +185,10 @@ static int ip_map_parse(struct cache_detail *cd,
> * for scratch: */
> char *buf = mesg;
> int len;
> - int b1,b2,b3,b4;
> + int b1, b2, b3, b4, b5, b6, b7, b8;
> char c;
> char class[8];
> - struct in_addr addr;
> + struct in6_addr addr;
> int err;
>
> struct ip_map *ipmp;
> @@ -197,9 +207,26 @@ static int ip_map_parse(struct cache_detail *cd,
> len = qword_get(&mesg, buf, mlen);
> if (len <= 0) return -EINVAL;
>
> - if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
> + if (sscanf(buf, NIPQUAD_FMT "%c", &b1, &b2, &b3, &b4, &c) == 4) {
> + addr.s6_addr32[0] = 0;
> + addr.s6_addr32[1] = 0;
> + addr.s6_addr32[2] = htonl(0xffff);
> + addr.s6_addr32[3] =
> + htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
> + } else if (sscanf(buf, NIP6_FMT "%c",
> + &b1, &b2, &b3, &b4, &b5, &b6, &b7, &b8, &c) == 8) {
> + addr.s6_addr16[0] = htons(b1);
> + addr.s6_addr16[1] = htons(b2);
> + addr.s6_addr16[2] = htons(b3);
> + addr.s6_addr16[3] = htons(b4);
> + addr.s6_addr16[4] = htons(b5);
> + addr.s6_addr16[5] = htons(b6);
> + addr.s6_addr16[6] = htons(b7);
> + addr.s6_addr16[7] = htons(b8);
> + } else
> return -EINVAL;
And the downcall will accept either ipv4 or ipv6. OK, makes sense, I
think.
>
> +
Extra blank line?
> expiry = get_expiry(&mesg);
> if (expiry ==0)
> return -EINVAL;
> @@ -215,10 +242,7 @@ static int ip_map_parse(struct cache_detail *cd,
> } else
> dom = NULL;
>
> - addr.s_addr =
> - htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
> -
> - ipmp = ip_map_lookup(class,addr);
> + ipmp = ip_map_lookup(class, &addr);
> if (ipmp) {
> err = ip_map_update(ipmp,
> container_of(dom, struct unix_domain, h),
> @@ -238,7 +262,7 @@ static int ip_map_show(struct seq_file *m,
> struct cache_head *h)
> {
> struct ip_map *im;
> - struct in_addr addr;
> + struct in6_addr addr;
> char *dom = "-no-domain-";
>
> if (h == NULL) {
> @@ -247,20 +271,24 @@ static int ip_map_show(struct seq_file *m,
> }
> im = container_of(h, struct ip_map, h);
> /* class addr domain */
> - addr = im->m_addr;
> + ipv6_addr_copy(&addr, &im->m_addr);
>
> if (test_bit(CACHE_VALID, &h->flags) &&
> !test_bit(CACHE_NEGATIVE, &h->flags))
> dom = im->m_client->h.name;
>
> - seq_printf(m, "%s %d.%d.%d.%d %s\n",
> - im->m_class,
> - ntohl(addr.s_addr) >> 24 & 0xff,
> - ntohl(addr.s_addr) >> 16 & 0xff,
> - ntohl(addr.s_addr) >> 8 & 0xff,
> - ntohl(addr.s_addr) >> 0 & 0xff,
> - dom
> - );
> + if (ipv6_addr_v4mapped(&addr)) {
> + seq_printf(m, "%s" NIPQUAD_FMT "%s\n",
> + im->m_class,
> + ntohl(addr.s6_addr32[3]) >> 24 & 0xff,
> + ntohl(addr.s6_addr32[3]) >> 16 & 0xff,
> + ntohl(addr.s6_addr32[3]) >> 8 & 0xff,
> + ntohl(addr.s6_addr32[3]) >> 0 & 0xff,
> + dom);
> + } else {
> + seq_printf(m, "%s" NIP6_FMT "%s\n",
> + im->m_class, NIP6(addr), dom);
> + }
> return 0;
> }
>
> @@ -280,16 +308,16 @@ struct cache_detail ip_map_cache = {
> .alloc = ip_map_alloc,
> };
>
> -static struct ip_map *ip_map_lookup(char *class, struct in_addr addr)
> +static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr)
> {
> struct ip_map ip;
> struct cache_head *ch;
>
> strcpy(ip.m_class, class);
> - ip.m_addr = addr;
> + ipv6_addr_copy(&ip.m_addr, addr);
> ch = sunrpc_cache_lookup(&ip_map_cache, &ip.h,
> hash_str(class, IP_HASHBITS) ^
> - hash_ip(addr.s_addr));
> + hash_ip6(*addr));
>
> if (ch)
> return container_of(ch, struct ip_map, h);
> @@ -318,14 +346,14 @@ static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t ex
> ch = sunrpc_cache_update(&ip_map_cache,
> &ip.h, &ipm->h,
> hash_str(ipm->m_class, IP_HASHBITS) ^
> - hash_ip(ipm->m_addr.s_addr));
> + hash_ip6(ipm->m_addr));
> if (!ch)
> return -ENOMEM;
> cache_put(ch, &ip_map_cache);
> return 0;
> }
>
> -int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom)
> +int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom)
> {
> struct unix_domain *udom;
> struct ip_map *ipmp;
> @@ -352,7 +380,7 @@ int auth_unix_forget_old(struct auth_domain *dom)
> return 0;
> }
>
> -struct auth_domain *auth_unix_lookup(struct in_addr addr)
> +struct auth_domain *auth_unix_lookup(struct in6_addr *addr)
> {
> struct ip_map *ipm;
> struct auth_domain *rv;
> @@ -641,9 +669,24 @@ static int unix_gid_find(uid_t uid, struct group_info **gip,
> int
> svcauth_unix_set_client(struct svc_rqst *rqstp)
> {
> - struct sockaddr_in *sin = svc_addr_in(rqstp);
> + struct sockaddr_in *sin;
> + struct sockaddr_in6 *sin6, sin6_storage;
> struct ip_map *ipm;
>
> + switch (rqstp->rq_addr.ss_family) {
> + case AF_INET:
> + sin = svc_addr_in(rqstp);
> + sin6 = &sin6_storage;
> + ipv6_addr_set(&sin6->sin6_addr, 0, 0,
> + htonl(0x0000FFFF), sin->sin_addr.s_addr);
> + break;
> + case AF_INET6:
> + sin6 = svc_addr_in6(rqstp);
> + break;
> + default:
> + BUG();
> + }
> +
> rqstp->rq_client = NULL;
> if (rqstp->rq_proc == 0)
> return SVC_OK;
> @@ -651,7 +694,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> ipm = ip_map_cached_get(rqstp);
> if (ipm == NULL)
> ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
> - sin->sin_addr);
> + &sin6->sin6_addr);
>
> if (ipm == NULL)
> return SVC_DENIED;
Anyway, no big problem is jumping out at me; if you fix up the small
things I mentioned above then I'll give it another reading.
Thanks! Sorry for taking so long to take a look at this.
--b.
^ permalink raw reply
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Jarek Poplawski @ 2008-01-15 22:49 UTC (permalink / raw)
To: slavon; +Cc: Patrick McHardy, netdev
In-Reply-To: <20080116010459.676cchrf8ko4wk8o@mail.bigtelecom.ru>
On Wed, Jan 16, 2008 at 01:04:59AM +0300, slavon@bigtelecom.ru wrote:
> Good night! =)
Good morning! ;)
>
> Sorry... i was wrong...
> I see that problem more serious....
>
> Lets see to scheme
>
> Class 1
> ---qdisc
> ------- 10k classes
> Class 2
> ---qdisc
> ------- 10k classes
>
> All traffic go to class 2... class 1 qdisc not have packets and if we
> delete it - packets not lost... in theory... lets try delete class 1 qdisc
> (all childrens delete too)...
> PC freeze on 2-5 seconds... its not forward any traffic at this moment...
> its great tree lock?
>
> Its normal or code need to more accurate lock?
I don't think it's normal. On the other hand I've never had such
problems... Probably this all isn't optimized for such big operations,
so maybe you should try to do this with some smaller chunks?
Jarek P.
^ permalink raw reply
* Re: [PATCH] IPv6 support for NFS server
From: J. Bruce Fields @ 2008-01-15 22:32 UTC (permalink / raw)
To: Aurélien Charbon; +Cc: netdev ML, Brian Haley, Mailing list NFSv4
In-Reply-To: <475ED028.2010109@ext.bull.net>
[-- Attachment #1: Type: text/plain, Size: 921 bytes --]
On Tue, Dec 11, 2007 at 07:00:08PM +0100, Aurélien Charbon wrote:
> Brian Haley wrote:
>
>> In an email back on October 29th I sent-out a similar patch with a new
>> ipv6_addr_set_v4mapped() inline - it might be useful to pull that
>> piece into your patch since it cleans it up a bit to get rid of the
>> ipv6_addr_set() calls. I can re-send you that patch off-line if you
>> can't find it.
>>
>> -Brian
>>
>>
> Thanks Brian. I forgot to include your changes in my tree.
> OK Bruce you can take this one.
One trivial note: I'd prefer patches inline with the message, instead of
attached. If you need to attach it, please add From:, Subject: and a
patch comment in the standard format. Something like git-format-patch
will do all that stuff for you.
E.g. see below (also with a minor whitespace problem fixed--fun
scripts/checkpatch.pl before submitting and it'll catch that.)
--b.
[-- Attachment #2: TMP --]
[-- Type: text/plain, Size: 12802 bytes --]
>From c19360e877cfa1576dce98792cd513665deaa2ec Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Aur=C3=A9lien=20Charbon?= <aurelien.charbon@ext.bull.net>
Date: Fri, 21 Dec 2007 16:44:46 +0100
Subject: [PATCH] IPv6 support for NFS server
Prepare for IPv6 text-based mounts and exports.
Tested with IPv4 network and basic nfs ops (mount, file creation and
modification), and with unmodified nfs-utils-1.1.1 + CITI_NFS4_ALL
patch.
Signed-off-by: Aurelien Charbon <aurelien.charbon@bull.net>
Cc: Neil Brown <neilb@suse.de>
---
fs/nfsd/export.c | 10 ++-
fs/nfsd/nfsctl.c | 19 ++++++-
include/linux/sunrpc/svcauth.h | 4 +-
include/net/ipv6.h | 9 +++
net/sunrpc/svcauth_unix.c | 119 +++++++++++++++++++++++++++-------------
5 files changed, 116 insertions(+), 45 deletions(-)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index cbbc594..e29b431 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -35,6 +35,7 @@
#include <linux/lockd/bind.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/sunrpc/gss_api.h>
+#include <net/ipv6.h>
#define NFSDDBG_FACILITY NFSDDBG_EXPORT
@@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
{
struct auth_domain *dom;
int i, err;
+ struct in6_addr addr6;
/* First, consistency check. */
err = -EINVAL;
@@ -1574,9 +1576,11 @@ exp_addclient(struct nfsctl_client *ncp)
goto out_unlock;
/* Insert client into hashtable. */
- for (i = 0; i < ncp->cl_naddr; i++)
- auth_unix_add_addr(ncp->cl_addrlist[i], dom);
-
+ for (i = 0; i < ncp->cl_naddr; i++) {
+ /* Mapping address */
+ ipv6_addr_set_v4mapped(ncp->cl_addrlist[i].s_addr, &addr6);
+ auth_unix_add_addr(&addr6, dom);
+ }
auth_unix_forget_old(dom);
auth_domain_put(dom);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e307972..a8f7a90 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -37,6 +37,7 @@
#include <linux/nfsd/syscall.h>
#include <asm/uaccess.h>
+#include <net/ipv6.h>
/*
* We have a single directory with 9 nodes in it.
@@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
struct auth_domain *clp;
int err = 0;
struct knfsd_fh *res;
+ struct in6_addr in6;
if (size < sizeof(*data))
return -EINVAL;
@@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size)
res = (struct knfsd_fh*)buf;
exp_readlock();
- if (!(clp = auth_unix_lookup(sin->sin_addr)))
+
+ /* IPv6 address mapping */
+ ipv6_addr_set_v4mapped(
+ (((struct sockaddr_in *)&data->gd_addr)->sin_addr.s_addr),
+ &in6);
+
+ if (!(clp = auth_unix_lookup(&in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data->gd_path, res, data->gd_maxlen);
@@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
int err = 0;
struct knfsd_fh fh;
char *res;
+ struct in6_addr in6;
if (size < sizeof(*data))
return -EINVAL;
@@ -271,7 +280,13 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
res = buf;
sin = (struct sockaddr_in *)&data->gd_addr;
exp_readlock();
- if (!(clp = auth_unix_lookup(sin->sin_addr)))
+ /* IPv6 address mapping */
+ ipv6_addr_set_v4mapped(
+ (((struct sockaddr_in *)&data->gd_addr)->sin_addr.s_addr),
+ &in6
+ );
+
+ if (!(clp = auth_unix_lookup(&in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data->gd_path, &fh, NFS_FHSIZE);
diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index 22e1ef8..9e6fb86 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -120,10 +120,10 @@ extern void svc_auth_unregister(rpc_authflavor_t flavor);
extern struct auth_domain *unix_domain_find(char *name);
extern void auth_domain_put(struct auth_domain *item);
-extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom);
+extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom);
extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new);
extern struct auth_domain *auth_domain_find(char *name);
-extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
+extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr);
extern int auth_unix_forget_old(struct auth_domain *dom);
extern void svcauth_unix_purge(void);
extern void svcauth_unix_info_release(void *);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index ae328b6..9394710 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -400,6 +400,15 @@ static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
a->s6_addr32[2] == htonl(0x0000ffff));
}
+static inline void ipv6_addr_set_v4mapped(const __be32 addr,
+ struct in6_addr *v4mapped)
+{
+ ipv6_addr_set(v4mapped,
+ 0, 0,
+ htonl(0x0000FFFF),
+ addr);
+}
+
/*
* find the first different bit between two addresses
* length of address must be a multiple of 32bits
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 4114794..5fe8f1f 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -11,7 +11,8 @@
#include <linux/hash.h>
#include <linux/string.h>
#include <net/sock.h>
-
+#include <net/ipv6.h>
+#include <linux/kernel.h>
#define RPCDBG_FACILITY RPCDBG_AUTH
@@ -84,7 +85,7 @@ static void svcauth_unix_domain_release(struct auth_domain *dom)
struct ip_map {
struct cache_head h;
char m_class[8]; /* e.g. "nfsd" */
- struct in_addr m_addr;
+ struct in6_addr m_addr;
struct unix_domain *m_client;
int m_add_change;
};
@@ -112,12 +113,19 @@ static inline int hash_ip(__be32 ip)
return (hash ^ (hash>>8)) & 0xff;
}
#endif
+static inline int hash_ip6(struct in6_addr ip)
+{
+ return (hash_ip(ip.s6_addr32[0]) ^
+ hash_ip(ip.s6_addr32[1]) ^
+ hash_ip(ip.s6_addr32[2]) ^
+ hash_ip(ip.s6_addr32[3]));
+}
static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
{
struct ip_map *orig = container_of(corig, struct ip_map, h);
struct ip_map *new = container_of(cnew, struct ip_map, h);
return strcmp(orig->m_class, new->m_class) == 0
- && orig->m_addr.s_addr == new->m_addr.s_addr;
+ && ipv6_addr_equal(&orig->m_addr, &new->m_addr);
}
static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
{
@@ -125,7 +133,7 @@ static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
struct ip_map *item = container_of(citem, struct ip_map, h);
strcpy(new->m_class, item->m_class);
- new->m_addr.s_addr = item->m_addr.s_addr;
+ ipv6_addr_copy(&new->m_addr, &item->m_addr);
}
static void update(struct cache_head *cnew, struct cache_head *citem)
{
@@ -149,22 +157,24 @@ static void ip_map_request(struct cache_detail *cd,
struct cache_head *h,
char **bpp, int *blen)
{
- char text_addr[20];
+ char text_addr[40];
struct ip_map *im = container_of(h, struct ip_map, h);
- __be32 addr = im->m_addr.s_addr;
-
- snprintf(text_addr, 20, "%u.%u.%u.%u",
- ntohl(addr) >> 24 & 0xff,
- ntohl(addr) >> 16 & 0xff,
- ntohl(addr) >> 8 & 0xff,
- ntohl(addr) >> 0 & 0xff);
+ if (ipv6_addr_v4mapped(&(im->m_addr))) {
+ snprintf(text_addr, 20, NIPQUAD_FMT,
+ ntohl(im->m_addr.s6_addr32[3]) >> 24 & 0xff,
+ ntohl(im->m_addr.s6_addr32[3]) >> 16 & 0xff,
+ ntohl(im->m_addr.s6_addr32[3]) >> 8 & 0xff,
+ ntohl(im->m_addr.s6_addr32[3]) >> 0 & 0xff);
+ } else {
+ snprintf(text_addr, 40, NIP6_FMT, NIP6(im->m_addr));
+ }
qword_add(bpp, blen, im->m_class);
qword_add(bpp, blen, text_addr);
(*bpp)[-1] = '\n';
}
-static struct ip_map *ip_map_lookup(char *class, struct in_addr addr);
+static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr);
static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t expiry);
static int ip_map_parse(struct cache_detail *cd,
@@ -175,10 +185,10 @@ static int ip_map_parse(struct cache_detail *cd,
* for scratch: */
char *buf = mesg;
int len;
- int b1,b2,b3,b4;
+ int b1, b2, b3, b4, b5, b6, b7, b8;
char c;
char class[8];
- struct in_addr addr;
+ struct in6_addr addr;
int err;
struct ip_map *ipmp;
@@ -197,9 +207,26 @@ static int ip_map_parse(struct cache_detail *cd,
len = qword_get(&mesg, buf, mlen);
if (len <= 0) return -EINVAL;
- if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
+ if (sscanf(buf, NIPQUAD_FMT "%c", &b1, &b2, &b3, &b4, &c) == 4) {
+ addr.s6_addr32[0] = 0;
+ addr.s6_addr32[1] = 0;
+ addr.s6_addr32[2] = htonl(0xffff);
+ addr.s6_addr32[3] =
+ htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
+ } else if (sscanf(buf, NIP6_FMT "%c",
+ &b1, &b2, &b3, &b4, &b5, &b6, &b7, &b8, &c) == 8) {
+ addr.s6_addr16[0] = htons(b1);
+ addr.s6_addr16[1] = htons(b2);
+ addr.s6_addr16[2] = htons(b3);
+ addr.s6_addr16[3] = htons(b4);
+ addr.s6_addr16[4] = htons(b5);
+ addr.s6_addr16[5] = htons(b6);
+ addr.s6_addr16[6] = htons(b7);
+ addr.s6_addr16[7] = htons(b8);
+ } else
return -EINVAL;
+
expiry = get_expiry(&mesg);
if (expiry ==0)
return -EINVAL;
@@ -215,10 +242,7 @@ static int ip_map_parse(struct cache_detail *cd,
} else
dom = NULL;
- addr.s_addr =
- htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
-
- ipmp = ip_map_lookup(class,addr);
+ ipmp = ip_map_lookup(class, &addr);
if (ipmp) {
err = ip_map_update(ipmp,
container_of(dom, struct unix_domain, h),
@@ -238,7 +262,7 @@ static int ip_map_show(struct seq_file *m,
struct cache_head *h)
{
struct ip_map *im;
- struct in_addr addr;
+ struct in6_addr addr;
char *dom = "-no-domain-";
if (h == NULL) {
@@ -247,20 +271,24 @@ static int ip_map_show(struct seq_file *m,
}
im = container_of(h, struct ip_map, h);
/* class addr domain */
- addr = im->m_addr;
+ ipv6_addr_copy(&addr, &im->m_addr);
if (test_bit(CACHE_VALID, &h->flags) &&
!test_bit(CACHE_NEGATIVE, &h->flags))
dom = im->m_client->h.name;
- seq_printf(m, "%s %d.%d.%d.%d %s\n",
- im->m_class,
- ntohl(addr.s_addr) >> 24 & 0xff,
- ntohl(addr.s_addr) >> 16 & 0xff,
- ntohl(addr.s_addr) >> 8 & 0xff,
- ntohl(addr.s_addr) >> 0 & 0xff,
- dom
- );
+ if (ipv6_addr_v4mapped(&addr)) {
+ seq_printf(m, "%s" NIPQUAD_FMT "%s\n",
+ im->m_class,
+ ntohl(addr.s6_addr32[3]) >> 24 & 0xff,
+ ntohl(addr.s6_addr32[3]) >> 16 & 0xff,
+ ntohl(addr.s6_addr32[3]) >> 8 & 0xff,
+ ntohl(addr.s6_addr32[3]) >> 0 & 0xff,
+ dom);
+ } else {
+ seq_printf(m, "%s" NIP6_FMT "%s\n",
+ im->m_class, NIP6(addr), dom);
+ }
return 0;
}
@@ -280,16 +308,16 @@ struct cache_detail ip_map_cache = {
.alloc = ip_map_alloc,
};
-static struct ip_map *ip_map_lookup(char *class, struct in_addr addr)
+static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr)
{
struct ip_map ip;
struct cache_head *ch;
strcpy(ip.m_class, class);
- ip.m_addr = addr;
+ ipv6_addr_copy(&ip.m_addr, addr);
ch = sunrpc_cache_lookup(&ip_map_cache, &ip.h,
hash_str(class, IP_HASHBITS) ^
- hash_ip(addr.s_addr));
+ hash_ip6(*addr));
if (ch)
return container_of(ch, struct ip_map, h);
@@ -318,14 +346,14 @@ static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t ex
ch = sunrpc_cache_update(&ip_map_cache,
&ip.h, &ipm->h,
hash_str(ipm->m_class, IP_HASHBITS) ^
- hash_ip(ipm->m_addr.s_addr));
+ hash_ip6(ipm->m_addr));
if (!ch)
return -ENOMEM;
cache_put(ch, &ip_map_cache);
return 0;
}
-int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom)
+int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom)
{
struct unix_domain *udom;
struct ip_map *ipmp;
@@ -352,7 +380,7 @@ int auth_unix_forget_old(struct auth_domain *dom)
return 0;
}
-struct auth_domain *auth_unix_lookup(struct in_addr addr)
+struct auth_domain *auth_unix_lookup(struct in6_addr *addr)
{
struct ip_map *ipm;
struct auth_domain *rv;
@@ -641,9 +669,24 @@ static int unix_gid_find(uid_t uid, struct group_info **gip,
int
svcauth_unix_set_client(struct svc_rqst *rqstp)
{
- struct sockaddr_in *sin = svc_addr_in(rqstp);
+ struct sockaddr_in *sin;
+ struct sockaddr_in6 *sin6, sin6_storage;
struct ip_map *ipm;
+ switch (rqstp->rq_addr.ss_family) {
+ case AF_INET:
+ sin = svc_addr_in(rqstp);
+ sin6 = &sin6_storage;
+ ipv6_addr_set(&sin6->sin6_addr, 0, 0,
+ htonl(0x0000FFFF), sin->sin_addr.s_addr);
+ break;
+ case AF_INET6:
+ sin6 = svc_addr_in6(rqstp);
+ break;
+ default:
+ BUG();
+ }
+
rqstp->rq_client = NULL;
if (rqstp->rq_proc == 0)
return SVC_OK;
@@ -651,7 +694,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
ipm = ip_map_cached_get(rqstp);
if (ipm == NULL)
ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
- sin->sin_addr);
+ &sin6->sin6_addr);
if (ipm == NULL)
return SVC_DENIED;
--
1.5.4.rc2.60.gb2e62
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
NFSv4 mailing list
NFSv4@linux-nfs.org
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
^ permalink raw reply related
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Jarek Poplawski @ 2008-01-15 22:32 UTC (permalink / raw)
To: slavon; +Cc: Patrick McHardy, netdev
In-Reply-To: <20080116004602.zn4y94e8sg0w4o8k@mail.bigtelecom.ru>
On Wed, Jan 16, 2008 at 12:46:02AM +0300, slavon@bigtelecom.ru wrote:
...
> Hmmm... i found way to fix this for me... but its not look good
>
> Scheme look like:
> Root - prio bands 3 priomap 0 0 0 0 ....
> --- Class 1
> --- Class 2
> -------- Copy of all table (Last this qdisc be root)
> --- Class 3
> -------- Copy of all table (Last this qdisc be root)
>
> 2. Add filter to root - flowid all packets to class 2
> 3. Delete qdisc at class 3
> 4. Create all table on class 3 (~20k qdiscs and 20k classes)
> 5. Replace filter on root - flowid all packets to class 3
> 6. If need update go to step 3, but use class 2
>
> All work good... and packets not dropeed =)
> But i have above 45 k classes and qdiscs.... After some time i will need
> patch to up max qdisc and classes more then 65k (> 0xfffe) =)))
> Also i have very bad TC commands performance then i have more then 10k rules.
Right! I get your idea (not all details...), and this really shows
there is needed something simpler for this.
Thanks,
Jarek P.
^ permalink raw reply
* [PATCH] 8390: fix CONFIG_LOCKDEP error, 2.6.24-rc7
From: Frank Rowand @ 2008-01-15 22:23 UTC (permalink / raw)
To: p_gortmaker; +Cc: netdev
From: Frank Rowand <frank.rowand@am.sony.com>
Turning on CONFIG_LOCKDEP for CONFIG_PREEMPT invokes a path which may
sleep with IRQs disabled. Change disable_irq_nosync_lockdep() to
disable_irq_nosync(), etc. Note the comment near the top of
drivers/net/lib8390.c, which is an lkml email from Alan Cox, pre-saging
the need of this patch.
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
---
drivers/net/lib8390.c | 14 7 + 7 - 0 !
1 files changed, 7 insertions(+), 7 deletions(-)
Index: linux-2.6.24-rc5/drivers/net/lib8390.c
===================================================================
--- linux-2.6.24-rc5.orig/drivers/net/lib8390.c
+++ linux-2.6.24-rc5/drivers/net/lib8390.c
@@ -284,7 +284,7 @@ static void ei_tx_timeout(struct net_dev
/* Ugly but a reset can be slow, yet must be protected */
- disable_irq_nosync_lockdep(dev->irq);
+ disable_irq_nosync(dev->irq);
spin_lock(&ei_local->page_lock);
/* Try to restart the card. Perhaps the user has fixed something. */
@@ -292,7 +292,7 @@ static void ei_tx_timeout(struct net_dev
__NS8390_init(dev, 1);
spin_unlock(&ei_local->page_lock);
- enable_irq_lockdep(dev->irq);
+ enable_irq(dev->irq);
netif_wake_queue(dev);
}
@@ -334,7 +334,7 @@ static int ei_start_xmit(struct sk_buff
* Slow phase with lock held.
*/
- disable_irq_nosync_lockdep_irqsave(dev->irq, &flags);
+ disable_irq_nosync(dev->irq);
spin_lock(&ei_local->page_lock);
@@ -373,7 +373,7 @@ static int ei_start_xmit(struct sk_buff
netif_stop_queue(dev);
ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
spin_unlock(&ei_local->page_lock);
- enable_irq_lockdep_irqrestore(dev->irq, &flags);
+ enable_irq(dev->irq);
ei_local->stat.tx_errors++;
return 1;
}
@@ -414,7 +414,7 @@ static int ei_start_xmit(struct sk_buff
ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
spin_unlock(&ei_local->page_lock);
- enable_irq_lockdep_irqrestore(dev->irq, &flags);
+ enable_irq(dev->irq);
dev_kfree_skb (skb);
ei_local->stat.tx_bytes += send_length;
@@ -530,9 +530,9 @@ static irqreturn_t __ei_interrupt(int ir
#ifdef CONFIG_NET_POLL_CONTROLLER
static void __ei_poll(struct net_device *dev)
{
- disable_irq_lockdep(dev->irq);
+ disable_irq(dev->irq);
__ei_interrupt(dev->irq, dev);
- enable_irq_lockdep(dev->irq);
+ enable_irq(dev->irq);
}
#endif
^ permalink raw reply
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: slavon @ 2008-01-15 22:04 UTC (permalink / raw)
To: slavon; +Cc: Jarek Poplawski, Patrick McHardy, netdev
In-Reply-To: <20080116004602.zn4y94e8sg0w4o8k@mail.bigtelecom.ru>
Good night! =)
Sorry... i was wrong...
I see that problem more serious....
Lets see to scheme
Class 1
---qdisc
------- 10k classes
Class 2
---qdisc
------- 10k classes
All traffic go to class 2... class 1 qdisc not have packets and if we
delete it - packets not lost... in theory... lets try delete class 1
qdisc (all childrens delete too)...
PC freeze on 2-5 seconds... its not forward any traffic at this
moment... its great tree lock?
Its normal or code need to more accurate lock?
Thanks!
> Quoting Jarek Poplawski <jarkao2@gmail.com>:
>
>> Patrick McHardy wrote, On 01/15/2008 05:05 PM:
>>
>>> Badalian Vyacheslav wrote:
>>
>> ...
>>
>>> Yes, packets in the old qdisc are lost.
>>>
>>>> Maybe if tc do changes - need create second queue (hash of rules or how
>>>> you named it?) and do changes at it. Then replace old queue rules by
>>>> created new.
>>>> Logic -
>>>> 1. Do snapshot
>>>> 2. Do changes in shapshot
>>>> 3. All new packets go to snapshot
>>>> 4. If old queue not have packets - delete it.
>>>> 5. Snapshot its default.
>>>
>>>
>>> That doesn't really work since qdiscs keep internal state that
>>> in large parts depends on the packets queued. Take the qlen as
>>> a simple example, the new qdisc doesn't know about the packets
>>> in the old one and will exceed the limit.
>>
>> But, some similar alternative to killing packets 'to death' could
>> be imagined, I suppose (in the future, of course!). So, e.g. doing
>> the switch automatically after last packet has been dequeued (maybe
>> even with some 'special' function/mode for this). After all even
>> with accuracy lost, it could be less visible for clients than
>> current way?
>>
>> Regards,
>> Jarek P.
>
> Hmmm... i found way to fix this for me... but its not look good
>
> Scheme look like:
> Root - prio bands 3 priomap 0 0 0 0 ....
> --- Class 1
> --- Class 2
> -------- Copy of all table (Last this qdisc be root)
> --- Class 3
> -------- Copy of all table (Last this qdisc be root)
>
> 2. Add filter to root - flowid all packets to class 2
> 3. Delete qdisc at class 3
> 4. Create all table on class 3 (~20k qdiscs and 20k classes)
> 5. Replace filter on root - flowid all packets to class 3
> 6. If need update go to step 3, but use class 2
>
> All work good... and packets not dropeed =)
> But i have above 45 k classes and qdiscs.... After some time i will
> need patch to up max qdisc and classes more then 65k (> 0xfffe) =)))
> Also i have very bad TC commands performance then i have more then 10k rules.
>
> Thanks =)
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
^ permalink raw reply
* Re: Not understand some in htb_do_events function
From: Martin Devera @ 2008-01-15 21:58 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478CD741.7040004@trash.net>
>
> So this was meant to protect against endless loops?
>
>> We want way to smooth big burst of events over more dequeue invocations
>> in order to not slow dequeue too much. Constant 500 is max. allowed
>> "slowdown" of dequeue.
>> Any bright idea how to do it more elegant, Patrick ?
>
>
> Unfortunately not, but I believe simply removing the limit
> completely would be better than picking an arbitary value.
Grrr my comp crashed while I was writing this mail. Well the second
attempt.
When we allow unlimited events per dequeue, then there is case where
all N classes in qdisc can be in the event queue with the same target
time. Then they will all be acted on in the loop within single dequeue,
costing us say some milliseconds. Additionaly, it tends to repeat itself
then in cycles.
Maybe it is acceptable but it seemed to me as rather big latency.
Thus I wanted to do only limited work per dequeue call. One possibility
is to remove the limit and "see what happend in wild".
What do u think about to do limited no of transitions and then schedule
tasklet to do the rest (again in limited buckets) ?
^ permalink raw reply
* RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: Brandeburg, Jesse @ 2008-01-15 21:53 UTC (permalink / raw)
To: slavon, Frans Pop; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20080115190458.rxt3yhb2o8o404kc@mail.bigtelecom.ru>
slavon@bigtelecom.ru wrote:
> Quoting Frans Pop <elendil@planet.nl>:
>>> (Note this isn't the final correct patch we should apply. There is
>>> no reason why this revert back to the older ->poll() logic here
>>> should have any effect on the TX hang triggering...)
>>
>> s/no reason/no obvious reason/ ? ;-)
The tx code has an "early exit" that tries to limit the amount of tx
packets handled in a single poll loop and requires napi or interrupt
rescheduling based on the return value from e1000_clean_tx_irq.
see this code in e1000_clean_tx_irq
4005 #ifdef CONFIG_E1000_NAPI
4006 #define E1000_TX_WEIGHT 64
4007 > > /* weight of a sort for tx, to avoid endless
transmit cleanup */
4008 > > if (count++ == E1000_TX_WEIGHT) break;
4009 #endif
I think that is probably related. For a test you could apply the
original patch, and remove this "break" just by commenting out line
4008. This would guarantee all tx work is cleaned at every e1000_clean
Jesse
^ permalink raw reply
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: slavon @ 2008-01-15 21:46 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Patrick McHardy, Badalian Vyacheslav, netdev
In-Reply-To: <478D226E.1050209@gmail.com>
Quoting Jarek Poplawski <jarkao2@gmail.com>:
> Patrick McHardy wrote, On 01/15/2008 05:05 PM:
>
>> Badalian Vyacheslav wrote:
>
> ...
>
>> Yes, packets in the old qdisc are lost.
>>
>>> Maybe if tc do changes - need create second queue (hash of rules or how
>>> you named it?) and do changes at it. Then replace old queue rules by
>>> created new.
>>> Logic -
>>> 1. Do snapshot
>>> 2. Do changes in shapshot
>>> 3. All new packets go to snapshot
>>> 4. If old queue not have packets - delete it.
>>> 5. Snapshot its default.
>>
>>
>> That doesn't really work since qdiscs keep internal state that
>> in large parts depends on the packets queued. Take the qlen as
>> a simple example, the new qdisc doesn't know about the packets
>> in the old one and will exceed the limit.
>
> But, some similar alternative to killing packets 'to death' could
> be imagined, I suppose (in the future, of course!). So, e.g. doing
> the switch automatically after last packet has been dequeued (maybe
> even with some 'special' function/mode for this). After all even
> with accuracy lost, it could be less visible for clients than
> current way?
>
> Regards,
> Jarek P.
Hmmm... i found way to fix this for me... but its not look good
Scheme look like:
Root - prio bands 3 priomap 0 0 0 0 ....
--- Class 1
--- Class 2
-------- Copy of all table (Last this qdisc be root)
--- Class 3
-------- Copy of all table (Last this qdisc be root)
2. Add filter to root - flowid all packets to class 2
3. Delete qdisc at class 3
4. Create all table on class 3 (~20k qdiscs and 20k classes)
5. Replace filter on root - flowid all packets to class 3
6. If need update go to step 3, but use class 2
All work good... and packets not dropeed =)
But i have above 45 k classes and qdiscs.... After some time i will
need patch to up max qdisc and classes more then 65k (> 0xfffe) =)))
Also i have very bad TC commands performance then i have more then 10k rules.
Thanks =)
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
^ permalink raw reply
* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Eric Dumazet @ 2008-01-15 21:16 UTC (permalink / raw)
To: Robert Olsson; +Cc: Stephen Hemminger, David Miller, robert.olsson, netdev
In-Reply-To: <18317.5368.952926.46132@robur.slu.se>
Robert Olsson a écrit :
>
> Stephen Hemminger writes:
> > This is how I did it:
>
> Yes looks like an elegant solution. Did you even test it?
> Maybe we see some effects in just dumping a full table?
>
> Anyway lookup should be tested in some way. We can a lot
> of analyzing before getting to right entry, local_table
> backtracking, main lookup w. ev. backtracking etc. So
> hopefully we get paid for this work.
>
> Also it might be idea to do some analysis of the fib_aliases
> list. Maybe the trick can be done again? ;)
>
Back in 2.6.9 times, sizeof(bi_alias) was 16 bytes on i386
Nowadays, 64/128 bytes are the norm :(
SLAB_HWCACHE_ALIGN is not our friend.
^ permalink raw reply
* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Jarek Poplawski @ 2008-01-15 21:15 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478CD9D6.3000504@trash.net>
Patrick McHardy wrote, On 01/15/2008 05:05 PM:
> Badalian Vyacheslav wrote:
...
> Yes, packets in the old qdisc are lost.
>
>> Maybe if tc do changes - need create second queue (hash of rules or how
>> you named it?) and do changes at it. Then replace old queue rules by
>> created new.
>> Logic -
>> 1. Do snapshot
>> 2. Do changes in shapshot
>> 3. All new packets go to snapshot
>> 4. If old queue not have packets - delete it.
>> 5. Snapshot its default.
>
>
> That doesn't really work since qdiscs keep internal state that
> in large parts depends on the packets queued. Take the qlen as
> a simple example, the new qdisc doesn't know about the packets
> in the old one and will exceed the limit.
But, some similar alternative to killing packets 'to death' could
be imagined, I suppose (in the future, of course!). So, e.g. doing
the switch automatically after last packet has been dequeued (maybe
even with some 'special' function/mode for this). After all even
with accuracy lost, it could be less visible for clients than
current way?
Regards,
Jarek P.
^ permalink raw reply
* Is there an easy way for non-privileged users to determine an interface's speed?
From: Mark Seger @ 2008-01-15 21:07 UTC (permalink / raw)
To: netdev
I think the current answer is they can't but I also wanted to confirm it
with this list. I had hoped I might be able to find it in /proc, /sys
or with a utility like ethtool or even ifconfig, but alas it's nowhere
to be seen. So if I'm correct, that leads to the second question of why
not? Would it be that tough to put it in under /sys/class/net/*? One
of the reasons I ask is that I'd like to be able to tell if a network
pipe is nearing its capacity and so be able to alert someone about it
when it occurs.
-mark
^ permalink raw reply
* Re: [PATCH 1/3] skb_partial_csum_set
From: Rusty Russell @ 2008-01-15 21:03 UTC (permalink / raw)
To: David Miller; +Cc: netdev, virtualization
In-Reply-To: <20080115.031422.06112893.davem@davemloft.net>
On Tuesday 15 January 2008 22:14:22 David Miller wrote:
> From: Rusty Russell <rusty@rustcorp.com.au>
> Date: Tue, 15 Jan 2008 21:41:55 +1100
>
> > Implement skb_partial_csum_set, for setting partial csums on untrusted
> > packets.
> >
> > Use it in virtio_net (replacing buggy version there), it's also going
> > to be used by TAP for partial csum support.
> >
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> Looks fine to me.
>
> Acked-by: David S. Miller <davem@davemloft.net>
>
> If you like I can merge this into my net-2.6.25 tree, or alternatively
> if it makes your life easier you then you can handle it yourself.
Thanks, that will reduce coordination pain.
Cheers,
Rusty.
^ permalink raw reply
* SO_RCVBUF doesn't change receiver advertised window
From: Ritesh Kumar @ 2008-01-15 20:36 UTC (permalink / raw)
To: netdev
Hi,
I am using linux 2.6.20 and am trying to limit the receiver window
size for a TCP connection. However, it seems that auto tuning is not
turning itself off even after I use the syscall
rwin=65536
setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &rwin, sizeof(rwin));
and verify using
getsockopt(sock, SOL_SOCKET, SO_RCVBUF, &rwin, &rwin_size);
that RCVBUF indeed is getting set (the value returned from getsockopt
is double that, 131072).
The above calls are made before connect() on the client side and
before bind(), accept() on the server side. Bulk data is being sent
from the client to the server. The client and the server machines also
have tcp_moderate_rcvbuf set to 0 (though I don't think that's really
needed; setting a value to SO_RCVBUF should automatically turnoff auto
tuning.).
However the tcp trace shows the SYN, SYN/ACK and the first few packets as:
14:34:18.831703 IP 192.168.1.153.45038 > 192.168.2.204.9999: S
3947298186:3947298186(0) win 5840 <mss 1460,sackOK,timestamp 2842625
0,nop,wscale 5>
14:34:18.836000 IP 192.168.2.204.9999 > 192.168.1.153.45038: S
3955381015:3955381015(0) ack 3947298187 win 5792 <mss
1460,sackOK,timestamp 2843649 2842625,nop,wscale 2>
14:34:18.837654 IP 192.168.1.153.45038 > 192.168.2.204.9999: . ack 1
win 183 <nop,nop,timestamp 2842634 2843649>
14:34:18.837849 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
1:1449(1448) ack 1 win 183 <nop,nop,timestamp 2842634 2843649>
14:34:18.837851 IP 192.168.1.153.45038 > 192.168.2.204.9999: P
1449:1461(12) ack 1 win 183 <nop,nop,timestamp 2842634 2843649>
14:34:18.839001 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
1449 win 2172 <nop,nop,timestamp 2843652 2842634>
14:34:18.839011 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
1461 win 2172 <nop,nop,timestamp 2843652 2842634>
14:34:18.840875 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
1461:2909(1448) ack 1 win 183 <nop,nop,timestamp 2842637 2843652>
14:34:18.840997 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
2909:4357(1448) ack 1 win 183 <nop,nop,timestamp 2842637 2843652>
14:34:18.841120 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
4357:5805(1448) ack 1 win 183 <nop,nop,timestamp 2842637 2843652>
14:34:18.841244 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
5805:7253(1448) ack 1 win 183 <nop,nop,timestamp 2842637 2843652>
14:34:18.841388 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
2909 win 2896 <nop,nop,timestamp 2843655 2842637>
14:34:18.841399 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
4357 win 3620 <nop,nop,timestamp 2843655 2842637>
14:34:18.841413 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
5805 win 4344 <nop,nop,timestamp 2843655 2842637>
As you can see, the syn and syn ack show rcv windows to be 5840 and
5792 and it automatically increases for the receiver to values 2172
till 4344 and more in the later part of the trace till 24214.
The values for the tcp sysctl variables are given below:
/proc/sys/net/ipv4/tcp_moderate_rcvbuf 0
/proc/sys/net/ipv4/tcp_mem 32768 43690 65536
/proc/sys/net/ipv4/tcp_rmem 4096 87380 1398080
/proc/sys/net/ipv4/tcp_wmem 4096 16384 1398080
/proc/sys/net/core/rmem_max 131071
/proc/sys/net/core/wmem_max 131071
/proc/sys/net/core/wmem_default 109568
/proc/sys/net/core/rmem_default 109568
I will really appreciate your help,
Ritesh
^ permalink raw reply
* [PATCH 02/03] ISATAP V2 (ndisc.c; route.c changes)
From: Templin, Fred L @ 2008-01-15 19:59 UTC (permalink / raw)
To: netdev; +Cc: YOSHIFUJI Hideaki / 吉藤英明
In-Reply-To: <39C363776A4E8C4A94691D2BD9D1C9A1029EDDAC@XCH-NW-7V2.nw.nos.boeing.com>
This patch updates the Linux the Intra-Site Automatic Tunnel Addressing
Protocol (ISATAP) implementation. It places the ISATAP potential router
list (PRL) in the kernel and adds three new private ioctls for PRL
management. The diffs are specific to the netdev net-2.6.25 development
tree taken by "git pull" on 1/14/08.
Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
--- net-2.6.25/net/ipv6/ndisc.c.orig 2008-01-14 15:35:55.000000000 -0800
+++ net-2.6.25/net/ipv6/ndisc.c 2008-01-15 09:02:23.000000000 -0800
@@ -1090,6 +1090,12 @@ static void ndisc_router_discovery(struc
return;
}
+ if (skb->rtr_type == RTRTYPE_HOST) {
+ ND_PRINTK2(KERN_WARNING
+ "ICMPv6 RA: from host or unauthorized router\n");
+ return;
+ }
+
/*
* set the RA_RECV flag in the interface
*/
@@ -1113,6 +1119,10 @@ static void ndisc_router_discovery(struc
return;
}
+ /* skip link-specific parameters from interior routers */
+ if (skb->rtr_type == RTRTYPE_INTERIOR)
+ goto skip_linkparms;
+
if (in6_dev->if_flags & IF_RS_SENT) {
/*
* flag that an RA was received after an RS was sent
@@ -1227,6 +1237,8 @@ skip_defrtr:
}
}
+skip_linkparms:
+
/*
* Process options.
*/
@@ -1266,6 +1278,10 @@ skip_defrtr:
}
#endif
+ /* skip link-specific ndopts from interior routers */
+ if (skb->rtr_type == RTRTYPE_INTERIOR)
+ goto out;
+
if (in6_dev->cnf.accept_ra_pinfo && ndopts.nd_opts_pi) {
struct nd_opt_hdr *p;
for (p = ndopts.nd_opts_pi;
@@ -1329,6 +1345,14 @@ static void ndisc_redirect_rcv(struct sk
int optlen;
u8 *lladdr = NULL;
+ switch (skb->rtr_type) {
+ case RTRTYPE_HOST:
+ case RTRTYPE_INTERIOR:
+ ND_PRINTK2(KERN_WARNING
+ "ICMPv6 Redirect: from host or unauthorized router\n");
+ return;
+ }
+
if (!(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL)) {
ND_PRINTK2(KERN_WARNING
"ICMPv6 Redirect: source address is not link-local.\n");
--- net-2.6.25/net/ipv6/route.c.orig 2008-01-14 15:39:40.000000000 -0800
+++ net-2.6.25/net/ipv6/route.c 2008-01-14 15:39:55.000000000 -0800
@@ -1655,8 +1655,6 @@ struct rt6_info *rt6_get_dflt_router(str
return rt;
}
-EXPORT_SYMBOL(rt6_get_dflt_router);
-
struct rt6_info *rt6_add_dflt_router(struct in6_addr *gwaddr,
struct net_device *dev,
unsigned int pref)
^ permalink raw reply
* Re: questions on NAPI processing latency and dropped network packets
From: Jarek Poplawski @ 2008-01-15 20:29 UTC (permalink / raw)
To: Chris Friesen; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <478CC76B.1020804@nortel.com>
On Tue, Jan 15, 2008 at 08:47:07AM -0600, Chris Friesen wrote:
> Jarek Poplawski wrote:
>
>> IMHO, checking this with a current stable, which probably you are going
>> to do some day, anyway, should be 100% acceptable: giving some input to
>> netdev, while still working for yourself.
>
> While I would love to do this, it's not that simple.
...Hmm... As a matter of fact, I expected you'd treat my point less
literally... Of course, I know it could be sometimes very hard to get
something working even after upgrading one version, let alone several
at once.
So, it was more a rhetorical trick (sorry!) to suggest, that such a
business model of being always late with kernels might be quite
practical and reasonable for many companies, but looks like the
worst possible development model for Linux.
On the other hand, it seems there are not so much, nor expensive
changes needed (a bit more perspective thinking?) to make everybody
happy...
Jarek P.
^ permalink raw reply
* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Robert Olsson @ 2008-01-15 20:18 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Eric Dumazet, David Miller, robert.olsson, netdev
In-Reply-To: <20080115094753.32e35823@deepthought>
Stephen Hemminger writes:
> This is how I did it:
Yes looks like an elegant solution. Did you even test it?
Maybe we see some effects in just dumping a full table?
Anyway lookup should be tested in some way. We can a lot
of analyzing before getting to right entry, local_table
backtracking, main lookup w. ev. backtracking etc. So
hopefully we get paid for this work.
Also it might be idea to do some analysis of the fib_aliases
list. Maybe the trick can be done again? ;)
Cheers
--ro
> --- a/net/ipv4/fib_trie.c 2008-01-15 09:14:53.000000000 -0800
> +++ b/net/ipv4/fib_trie.c 2008-01-15 09:21:48.000000000 -0800
> @@ -101,13 +101,6 @@ struct node {
> t_key key;
> };
>
> -struct leaf {
> - unsigned long parent;
> - t_key key;
> - struct hlist_head list;
> - struct rcu_head rcu;
> -};
> -
> struct leaf_info {
> struct hlist_node hlist;
> struct rcu_head rcu;
> @@ -115,6 +108,13 @@ struct leaf_info {
> struct list_head falh;
> };
>
> +struct leaf {
> + unsigned long parent;
> + t_key key;
> + struct hlist_head list;
> + struct rcu_head rcu;
> +};
> +
> struct tnode {
> unsigned long parent;
> t_key key;
> @@ -321,16 +321,6 @@ static void __leaf_free_rcu(struct rcu_h
> kmem_cache_free(trie_leaf_kmem, leaf);
> }
>
> -static void __leaf_info_free_rcu(struct rcu_head *head)
> -{
> - kfree(container_of(head, struct leaf_info, rcu));
> -}
> -
> -static inline void free_leaf_info(struct leaf_info *leaf)
> -{
> - call_rcu(&leaf->rcu, __leaf_info_free_rcu);
> -}
> -
> static struct tnode *tnode_alloc(size_t size)
> {
> struct page *pages;
> @@ -357,7 +347,7 @@ static void __tnode_free_rcu(struct rcu_
> free_pages((unsigned long)tn, get_order(size));
> }
>
> -static inline void tnode_free(struct tnode *tn)
> +static void tnode_free(struct tnode *tn)
> {
> if (IS_LEAF(tn)) {
> struct leaf *l = (struct leaf *) tn;
> @@ -376,16 +366,41 @@ static struct leaf *leaf_new(void)
> return l;
> }
>
> +static void leaf_info_init(struct leaf_info *li, int plen)
> +{
> + li->plen = plen;
> + INIT_LIST_HEAD(&li->falh);
> +}
> +
> +static struct leaf_info *leaf_info_first(struct leaf *l, int plen)
> +{
> + struct leaf_info *li = (struct leaf_info *) (l + 1);
> + leaf_info_init(li, plen);
> + return li;
> +}
> +
> static struct leaf_info *leaf_info_new(int plen)
> {
> struct leaf_info *li = kmalloc(sizeof(struct leaf_info), GFP_KERNEL);
> - if (li) {
> - li->plen = plen;
> - INIT_LIST_HEAD(&li->falh);
> - }
> + if (li)
> + leaf_info_init(li, plen);
> +
> return li;
> }
>
> +static void __leaf_info_free_rcu(struct rcu_head *head)
> +{
> + kfree(container_of(head, struct leaf_info, rcu));
> +}
> +
> +static inline void free_leaf_info(struct leaf *l, struct leaf_info *leaf)
> +{
> + if (leaf == (struct leaf_info *)(l + 1))
> + return;
> +
> + call_rcu(&leaf->rcu, __leaf_info_free_rcu);
> +}
> +
> static struct tnode* tnode_new(t_key key, int pos, int bits)
> {
> size_t sz = sizeof(struct tnode) + (sizeof(struct node *) << bits);
> @@ -1047,18 +1062,13 @@ static struct list_head *fib_insert_node
> insert_leaf_info(&l->list, li);
> goto done;
> }
> - l = leaf_new();
>
> + l = leaf_new();
> if (!l)
> return NULL;
>
> l->key = key;
> - li = leaf_info_new(plen);
> -
> - if (!li) {
> - tnode_free((struct tnode *) l);
> - return NULL;
> - }
> + li = leaf_info_first(l, plen);
>
> fa_head = &li->falh;
> insert_leaf_info(&l->list, li);
> @@ -1091,7 +1101,7 @@ static struct list_head *fib_insert_node
> }
>
> if (!tn) {
> - free_leaf_info(li);
> + free_leaf_info(l, li);
> tnode_free((struct tnode *) l);
> return NULL;
> }
> @@ -1624,7 +1634,7 @@ static int fn_trie_delete(struct fib_tab
>
> if (list_empty(fa_head)) {
> hlist_del_rcu(&li->hlist);
> - free_leaf_info(li);
> + free_leaf_info(l, li);
> }
>
> if (hlist_empty(&l->list))
> @@ -1668,7 +1678,7 @@ static int trie_flush_leaf(struct trie *
>
> if (list_empty(&li->falh)) {
> hlist_del_rcu(&li->hlist);
> - free_leaf_info(li);
> + free_leaf_info(l, li);
> }
> }
> return found;
> @@ -1935,7 +1945,8 @@ void __init fib_hash_init(void)
> fn_alias_kmem = kmem_cache_create("ip_fib_alias", sizeof(struct fib_alias),
> 0, SLAB_PANIC, NULL);
>
> - trie_leaf_kmem = kmem_cache_create("ip_fib_trie", sizeof(struct leaf),
> + trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
> + sizeof(struct leaf) + sizeof(struct leaf_info),
> 0, SLAB_PANIC, NULL);
> }
>
^ permalink raw reply
* [PATCH 03/03] ISATAP V2 (sit.c changes)
From: Templin, Fred L @ 2008-01-15 20:00 UTC (permalink / raw)
To: netdev; +Cc: YOSHIFUJI Hideaki / 吉藤英明
In-Reply-To: <39C363776A4E8C4A94691D2BD9D1C9A1029EDDAC@XCH-NW-7V2.nw.nos.boeing.com>
This patch updates the Linux the Intra-Site Automatic Tunnel Addressing
Protocol (ISATAP) implementation. It places the ISATAP potential router
list (PRL) in the kernel and adds three new private ioctls for PRL
management. The diffs are specific to the netdev net-2.6.25 development
tree taken by "git pull" on 1/14/08.
Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
--- net-2.6.25/net/ipv6/sit.c.orig 2008-01-14 15:33:36.000000000 -0800
+++ net-2.6.25/net/ipv6/sit.c 2008-01-15 10:21:31.000000000 -0800
@@ -16,7 +16,7 @@
* Changes:
* Roger Venning <r.venning@telstra.com>: 6to4 support
* Nate Thompson <nate@thebog.net>: 6to4 support
- * Fred L. Templin <fltemplin@acm.org>: isatap support
+ * Fred Templin <fred.l.templin@boeing.com>: isatap support
*/
#include <linux/module.h>
@@ -200,6 +200,118 @@ failed:
return NULL;
}
+static struct ip_tunnel_prlent *
+ipip6_tunnel_locate_prl(struct ip_tunnel *t, __be32 addr)
+{
+ struct ip_tunnel_prlent *p = (struct ip_tunnel_prlent *)NULL;
+
+ for (p = t->prl; p; p = p->next)
+ if (p->ent.addr == addr)
+ break;
+ return p;
+
+}
+
+static int
+ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prladdr *a, int chg)
+{
+ struct ip_tunnel_prlent *p;
+
+ for (p = t->prl; p; p = p->next) {
+ if (p->ent.addr == a->addr) {
+ if (chg) {
+ p->ent = *a;
+ return 0;
+ }
+ return -EEXIST;
+ }
+ }
+
+ if (chg)
+ return -ENXIO;
+
+ if (!(p = kzalloc(sizeof(struct ip_tunnel_prlent), GFP_KERNEL)))
+ return -ENOBUFS;
+
+ p->ent = *a;
+ p->next = t->prl;
+ t->prl = p;
+ return 0;
+}
+
+static int
+ipip6_tunnel_del_prl(struct ip_tunnel *t, struct ip_tunnel_prladdr *a)
+{
+ struct ip_tunnel_prlent *x, **p;
+
+ if (a) {
+ for (p = &t->prl; *p; p = &(*p)->next) {
+ if ((*p)->ent.addr == a->addr) {
+ x = *p;
+ *p = x->next;
+ kfree(x);
+ return 0;
+ }
+ }
+ return -ENXIO;
+ } else {
+ while (t->prl) {
+ x = t->prl;
+ t->prl = t->prl->next;
+ kfree(x);
+ }
+ }
+ return 0;
+}
+
+/* copied directly from anycast.c */
+static int
+ipip6_onlink(struct in6_addr *addr, struct net_device *dev)
+{
+ struct inet6_dev *idev;
+ struct inet6_ifaddr *ifa;
+ int onlink;
+
+ onlink = 0;
+ rcu_read_lock();
+ idev = __in6_dev_get(dev);
+ if (idev) {
+ read_lock_bh(&idev->lock);
+ for (ifa=idev->addr_list; ifa; ifa=ifa->if_next) {
+ onlink = ipv6_prefix_equal(addr, &ifa->addr,
+ ifa->prefix_len);
+ if (onlink)
+ break;
+ }
+ read_unlock_bh(&idev->lock);
+ }
+ rcu_read_unlock();
+ return onlink;
+}
+
+static int
+isatap_chksrc(struct sk_buff *skb, struct iphdr *iph, struct ip_tunnel *t)
+{
+ struct ip_tunnel_prlent *p = ipip6_tunnel_locate_prl(t, iph->saddr);
+ int ok = 1;
+
+ if (p) {
+ if (p->ent.flags & PRL_BORDER)
+ skb->rtr_type = RTRTYPE_BORDER;
+ else
+ skb->rtr_type = RTRTYPE_INTERIOR;
+ } else {
+ struct in6_addr *addr6 = &ipv6_hdr(skb)->saddr;
+ if (ipv6_addr_is_isatap(addr6) &&
+ (addr6->s6_addr32[3] == iph->saddr) &&
+ ipip6_onlink(addr6, t->dev))
+ skb->rtr_type = RTRTYPE_HOST;
+ else
+ ok = 0;
+ }
+ return ok;
+}
+
static void ipip6_tunnel_uninit(struct net_device *dev)
{
if (dev == ipip6_fb_tunnel_dev) {
@@ -209,6 +321,7 @@ static void ipip6_tunnel_uninit(struct n
dev_put(dev);
} else {
ipip6_tunnel_unlink(netdev_priv(dev));
+ ipip6_tunnel_del_prl(netdev_priv(dev), 0);
dev_put(dev);
}
}
@@ -368,48 +481,6 @@ static inline void ipip6_ecn_decapsulate
IP6_ECN_set_ce(ipv6_hdr(skb));
}
-/* ISATAP (RFC4214) - check source address */
-static int
-isatap_srcok(struct sk_buff *skb, struct iphdr *iph, struct net_device *dev)
-{
- struct neighbour *neigh;
- struct dst_entry *dst;
- struct rt6_info *rt;
- struct flowi fl;
- struct in6_addr *addr6;
- struct in6_addr rtr;
- struct ipv6hdr *iph6;
- int ok = 0;
-
- /* from onlink default router */
- ipv6_addr_set(&rtr, htonl(0xFE800000), 0, 0, 0);
- ipv6_isatap_eui64(rtr.s6_addr + 8, iph->saddr);
- if ((rt = rt6_get_dflt_router(&rtr, dev))) {
- dst_release(&rt->u.dst);
- return 1;
- }
-
- iph6 = ipv6_hdr(skb);
- memset(&fl, 0, sizeof(fl));
- fl.proto = iph6->nexthdr;
- ipv6_addr_copy(&fl.fl6_dst, &iph6->saddr);
- fl.oif = dev->ifindex;
- security_skb_classify_flow(skb, &fl);
-
- dst = ip6_route_output(NULL, &fl);
- if (!dst->error && (dst->dev == dev) && (neigh = dst->neighbour)) {
-
- addr6 = (struct in6_addr*)&neigh->primary_key;
-
- /* from correct previous hop */
- if (ipv6_addr_is_isatap(addr6) &&
- (addr6->s6_addr32[3] == iph->saddr))
- ok = 1;
- }
- dst_release(dst);
- return ok;
-}
-
static int ipip6_rcv(struct sk_buff *skb)
{
struct iphdr *iph;
@@ -430,7 +501,7 @@ static int ipip6_rcv(struct sk_buff *skb
skb->pkt_type = PACKET_HOST;
if ((tunnel->dev->priv_flags & IFF_ISATAP) &&
- !isatap_srcok(skb, iph, tunnel->dev)) {
+ !isatap_chksrc(skb, iph, tunnel)) {
tunnel->stat.rx_errors++;
read_unlock(&ipip6_lock);
kfree_skb(skb);
@@ -710,6 +781,7 @@ ipip6_tunnel_ioctl (struct net_device *d
{
int err = 0;
struct ip_tunnel_parm p;
+ struct ip_tunnel_prladdr prl;
struct ip_tunnel *t;
switch (cmd) {
@@ -809,6 +881,31 @@ ipip6_tunnel_ioctl (struct net_device *d
err = 0;
break;
+ case SIOCADDPRL:
+ case SIOCDELPRL:
+ case SIOCCHGPRL:
+ err = -EPERM;
+ if (!capable(CAP_NET_ADMIN))
+ goto done;
+ err = -EINVAL;
+ if (dev == ipip6_fb_tunnel_dev)
+ goto done;
+ err = -EFAULT;
+ if (copy_from_user(&prl, ifr->ifr_ifru.ifru_data, sizeof(prl)))
+ goto done;
+ err = -ENOENT;
+ if (!(t = netdev_priv(dev)))
+ goto done;
+
+ ipip6_tunnel_unlink(t);
+ if (cmd == SIOCDELPRL)
+ err = ipip6_tunnel_del_prl(t, &prl);
+ else
+ err = ipip6_tunnel_add_prl(t, &prl, cmd == SIOCCHGPRL);
+ ipip6_tunnel_link(t);
+ netdev_state_change(dev);
+ break;
+
default:
err = -EINVAL;
}
^ permalink raw reply
* Re: [PATCH] net: EMAC: Fix problem with mtu > 4080 on non TAH equipped 4xx PPC's
From: Eugene Surovegin @ 2008-01-15 20:00 UTC (permalink / raw)
To: Stefan Roese; +Cc: linuxppc-dev, netdev, benh
In-Reply-To: <200801152046.01881.sr@denx.de>
On Tue, Jan 15, 2008 at 08:46:01PM +0100, Stefan Roese wrote:
> On Tuesday 15 January 2008, Eugene Surovegin wrote:
> > On Tue, Jan 15, 2008 at 01:40:09PM +0100, Stefan Roese wrote:
> > > Currently, all non TAH equipped 4xx PPC's call emac_start_xmit() upon
> > > xmit. This routine doesn't check if the frame length exceeds the max.
> > > MAL buffer size.
> > >
> > > This patch now changes the driver to call emac_start_xmit_sg() on all
> > > platforms and not only the TAH equipped ones (440GX). This enables an
> > > MTU of 9000 instead 4080.
> > >
> > > Tested on Kilauea (405EX) with gbit link -> jumbo frames enabled.
> > >
> > > Signed-off-by: Stefan Roese <sr@denx.de>
> > > ---
> > > Eugene & Ben, do you see any problems with this patch? If not, then I'll
> > > send another version for the newemac driver too.
> >
> > Hmm, so why not make GigE support a condition to hook SG version of
> > xmit then? I don't like when you change behaviour for chips where it
> > perefectly legal not to do this check because you cannot change MTU
> > anyways.
>
> OK. But how do we detect GigE support? Seems like GigE enabled devices have
> CONFIG_IBM_EMAC4 defined. If nobody objects I'll fix up another version
> tomorrow.
Look couple of lines down where I set MTU changing hook. If you cannot
change MTU you cannot get big frames.
--
Eugene
^ permalink raw reply
* [PATCH 01/03] ISATAP V2 (header file changes)
From: Templin, Fred L @ 2008-01-15 19:57 UTC (permalink / raw)
To: netdev; +Cc: YOSHIFUJI Hideaki / 吉藤英明
In-Reply-To: <20071129.195459.55971471.yoshfuji@linux-ipv6.org>
This patch updates the Linux the Intra-Site Automatic Tunnel Addressing
Protocol (ISATAP) implementation. It places the ISATAP potential router
list (PRL) in the kernel and adds three new private ioctls for PRL
management. The diffs are specific to the netdev net-2.6.25 development
tree taken by "git pull" on 1/14/08.
Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
--- net-2.6.25/include/linux/skbuff.h.orig 2008-01-14 15:33:36.000000000 -0800
+++ net-2.6.25/include/linux/skbuff.h 2008-01-14 15:43:06.000000000 -0800
@@ -311,7 +311,8 @@ struct sk_buff {
__u16 tc_verd; /* traffic control verdict */
#endif
#endif
- /* 2 byte hole */
+ __u8 rtr_type;
+ /* 1 byte hole */
#ifdef CONFIG_NET_DMA
dma_cookie_t dma_cookie;
--- net-2.6.25/include/linux/if_tunnel.h.orig 2008-01-14 15:33:36.000000000 -0800
+++ net-2.6.25/include/linux/if_tunnel.h 2008-01-14 15:42:14.000000000 -0800
@@ -7,6 +7,9 @@
#define SIOCADDTUNNEL (SIOCDEVPRIVATE + 1)
#define SIOCDELTUNNEL (SIOCDEVPRIVATE + 2)
#define SIOCCHGTUNNEL (SIOCDEVPRIVATE + 3)
+#define SIOCADDPRL (SIOCDEVPRIVATE + 4)
+#define SIOCDELPRL (SIOCDEVPRIVATE + 5)
+#define SIOCCHGPRL (SIOCDEVPRIVATE + 6)
#define GRE_CSUM __constant_htons(0x8000)
#define GRE_ROUTING __constant_htons(0x4000)
@@ -17,9 +20,6 @@
#define GRE_FLAGS __constant_htons(0x00F8)
#define GRE_VERSION __constant_htons(0x0007)
-/* i_flags values for SIT mode */
-#define SIT_ISATAP 0x0001
-
struct ip_tunnel_parm
{
char name[IFNAMSIZ];
@@ -30,5 +30,15 @@ struct ip_tunnel_parm
__be32 o_key;
struct iphdr iph;
};
+/* SIT-mode i_flags */
+#define SIT_ISATAP 0x0001
+
+struct ip_tunnel_prladdr {
+ __be32 addr;
+ __be16 flags;
+ __be16 rsvd;
+};
+/* PRL flags */
+#define PRL_BORDER 0x0001
#endif /* _IF_TUNNEL_H_ */
--- net-2.6.25/include/net/ipip.h.orig 2008-01-14 15:33:36.000000000 -0800
+++ net-2.6.25/include/net/ipip.h 2008-01-14 15:41:21.000000000 -0800
@@ -24,6 +24,13 @@ struct ip_tunnel
int mlink;
struct ip_tunnel_parm parms;
+ struct ip_tunnel_prlent *prl; /* potential router list */
+};
+
+struct ip_tunnel_prlent
+{
+ struct ip_tunnel_prlent *next;
+ struct ip_tunnel_prladdr ent;
};
#define IPTUNNEL_XMIT() do { \
--- net-2.6.25/include/net/ndisc.h.orig 2008-01-14 15:40:28.000000000 -0800
+++ net-2.6.25/include/net/ndisc.h 2008-01-15 08:43:21.000000000 -0800
@@ -12,6 +12,16 @@
#define NDISC_REDIRECT 137
/*
+ * Router type: cross-layer information from link-layer to
+ * IPv6 layer reported by certain link types (e.g., RFC4214).
+ */
+
+#define RTRTYPE_UNSPEC 0 /* unspecified (default) */
+#define RTRTYPE_HOST 1 /* host or unauthorized router */
+#define RTRTYPE_INTERIOR 2 /* site-interior router */
+#define RTRTYPE_BORDER 3 /* site border router */
+
+/*
* ndisc options
*/
^ permalink raw reply
* Re: [PATCH] net: EMAC: Fix problem with mtu > 4080 on non TAH equipped 4xx PPC's
From: Stefan Roese @ 2008-01-15 19:46 UTC (permalink / raw)
To: Eugene Surovegin; +Cc: linuxppc-dev, netdev, benh
In-Reply-To: <20080115173202.GA1268@gate.ebshome.net>
On Tuesday 15 January 2008, Eugene Surovegin wrote:
> On Tue, Jan 15, 2008 at 01:40:09PM +0100, Stefan Roese wrote:
> > Currently, all non TAH equipped 4xx PPC's call emac_start_xmit() upon
> > xmit. This routine doesn't check if the frame length exceeds the max.
> > MAL buffer size.
> >
> > This patch now changes the driver to call emac_start_xmit_sg() on all
> > platforms and not only the TAH equipped ones (440GX). This enables an
> > MTU of 9000 instead 4080.
> >
> > Tested on Kilauea (405EX) with gbit link -> jumbo frames enabled.
> >
> > Signed-off-by: Stefan Roese <sr@denx.de>
> > ---
> > Eugene & Ben, do you see any problems with this patch? If not, then I'll
> > send another version for the newemac driver too.
>
> Hmm, so why not make GigE support a condition to hook SG version of
> xmit then? I don't like when you change behaviour for chips where it
> perefectly legal not to do this check because you cannot change MTU
> anyways.
OK. But how do we detect GigE support? Seems like GigE enabled devices have
CONFIG_IBM_EMAC4 defined. If nobody objects I'll fix up another version
tomorrow.
Thanks.
Best regards,
Stefan
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox