* Re: [RFC] restore netdev_priv optimization
From: Benjamin Thery @ 2007-08-20 11:51 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, netdev, ebiederm
In-Reply-To: <20070817.160409.88475506.davem@davemloft.net>
Hi,
David Miller wrote:
> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Fri, 17 Aug 2007 15:40:22 -0700
>
>> Compile tested only!!!
>
> Obviously. The first loopback transmit is guarenteed to crash.
>
> [...]
>
> And this also breaks loopback again, which uses a static struct netdev
> in the kernel image, it doesn't use alloc_netdev(), so egress_subqueue
> of loopback will be NULL.
Talking about loopback, don't you think it could be the right time
to make it behave like any other kind of net devices, and allocate it
dynamically.
Having a dynamically allocated loopback could make maintenance easier
(removing special cases).
Also this is something we'll need to support multiple loopbacks for
example for network namespaces.
Eric Biederman has written a nice patch that does this.
I'm using it on 2.6.23-rc2.
Benjamin
--
B e n j a m i n T h e r y - BULL/DT/Open Software R&D
http://www.bull.com
^ permalink raw reply
* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: Andi Kleen @ 2007-08-20 11:07 UTC (permalink / raw)
To: Felix Marti
Cc: David Miller, sean.hefty, netdev, rdreier, general, linux-kernel,
jeff
In-Reply-To: <8A71B368A89016469F72CD08050AD334018E20C1@maui.asicdesigners.com>
"Felix Marti" <felix@chelsio.com> writes:
> > avoidance gains of TSO and LRO are still a very worthwhile savings.
> So, i.e. with TSO, your saving about 16 headers (let us say 14 + 20 +
> 20), 864B, when moving ~64KB of payload - looks like very much in the
> noise to me.
TSO is beneficial for the software again. The linux code currently
takes several locks and does quite a few function calls for each
packet and using larger packets lowers this overhead. At least with
10GbE saving CPU cycles is still quite important.
> an option to get 'high performance'
Shouldn't you qualify that?
It is unlikely you really duplicated all the tuning for corner cases
that went over many years into good software TCP stacks in your
hardware. So e.g. for wide area networks with occasional packet loss
the software might well perform better.
-Andi
^ permalink raw reply
* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: Evgeniy Polyakov @ 2007-08-20 9:43 UTC (permalink / raw)
To: Felix Marti
Cc: David Miller, sean.hefty, netdev, rdreier, general, linux-kernel,
jeff
In-Reply-To: <8A71B368A89016469F72CD08050AD334018E20BE@maui.asicdesigners.com>
On Sun, Aug 19, 2007 at 05:47:59PM -0700, Felix Marti (felix@chelsio.com) wrote:
> [Felix Marti] David and Herbert, so you agree that the user<>kernel
> space memory copy overhead is a significant overhead and we want to
> enable zero-copy in both the receive and transmit path? - Yes, copy
It depends. If you need to access that data after received, you will get
cache miss and performance will not be much better (if any) that with
copy.
> avoidance is mainly an API issue and unfortunately the so widely used
> (synchronous) sockets API doesn't make copy avoidance easy, which is one
> area where protocol offload can help. Yes, some apps can resort to
> sendfile() but there are many apps which seem to have trouble switching
> to that API... and what about the receive path?
There is number of implementations, and all they are suitable for is
to have recvfile(), since this is likely the only case, which can work
without cache.
And actually RDMA stack exist and no one said it should be thrown away
_until_ it messes with main stack. It started to speal ports. What will
happen when it gest all port space and no new legal network conection
can be opened, although there is no way to show to user who got it?
What will happen if hardware RDMA connection got terminated and software
could not free the port? Will RDMA request to export connection reset
functions out of stack to drop network connections which are on the ports
which are supposed to be used by new RDMA connections?
RDMA is not a problem, but how it influence to the network stack is.
Let's better think about how to work correctly with network stack (since
we already have that cr^Wdifferent hardware) instead of saying that
others do bad work and do not allow shiny new feature to exist.
--
Evgeniy Polyakov
^ permalink raw reply
* Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"
From: Karl Meyer @ 2007-08-20 9:25 UTC (permalink / raw)
To: Francois Romieu; +Cc: linux-kernel, netdev
In-Reply-To: <20070816141122.GA6625@electric-eye.fr.zoreil.com>
The error exists from patch 2 on. I did some network testing with
patch 1 and currently use it and have no errors so far.
>From my experiences up to now patch 1 should be error free.
2007/8/16, Francois Romieu <romieu@fr.zoreil.com>:
> (please do not remove the netdev Cc:)
>
> Francois Romieu <romieu@fr.zoreil.com> :
> [...]
> > If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944
> > tomorrow.
>
> You will find a tgz archive in attachment which contains a serie of patches
> (0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2
> to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps.
>
> Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it
> still works, apply 0002 on top of 0001, etc.
>
> --
> Ueimor
>
>
^ permalink raw reply
* Re: [RFT] r8169 changes against 2.6.23-rc3
From: Dirk @ 2007-08-20 8:34 UTC (permalink / raw)
To: netdev
In-Reply-To: <46C7ADD6.5080707@gmail.com>
On 8/19/07, Bruce Cole <bacole@gmail.com> wrote:
> So it seems that when the driver tries to queue a packet while the
> controller is busy processing the queue, the newly queued packet does
> not get noticed by the controller (until further packet activity occurs).
> Perhaps there is a problem with the memory barriers when adding to the
> TX queue, but I'm a newbie on linux kernel memory barriers.
One thing I noticed a while ago (march) is that floodpinging (ping -f)
the r8169 host from an external system also increases performance
without changing code.
My original post about the problem:
http://marc.info/?l=linux-netdev&m=117207362010321&w=2
I ended up (until now perhaps :-) with disabling the onboard nic and
adding an e1000 card.
Kind regards,
Dirk
^ permalink raw reply
* [PATCH 4/4 - rev2] Initialize and fill IPv6 route age
From: Varun Chandramohan @ 2007-08-20 8:16 UTC (permalink / raw)
To: davem; +Cc: netdev, kaber, socketcan, shemminger, krkumar2, varuncha
The age field of the ipv6 route structures are initilized with the current timeval at the time of route
creation. When the route dump is called the route age value stored in the structure is subtracted from the
present timeval and the difference is passed on as the route age.
Signed-off-by: Varun Chandramohan <varunc@linux.vnet.ibm.com>
---
include/net/ip6_fib.h | 1 +
include/net/ip6_route.h | 3 +++
net/ipv6/addrconf.c | 5 +++++
net/ipv6/route.c | 24 ++++++++++++++++++++----
4 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index c48ea87..e30a1cf 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -98,6 +98,7 @@ struct rt6_info
u32 rt6i_flags;
u32 rt6i_metric;
+ time_t rt6i_age;
atomic_t rt6i_ref;
struct fib6_table *rt6i_table;
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 5456fdd..fc9716c 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -36,6 +36,9 @@ struct route_info {
#define RT6_LOOKUP_F_REACHABLE 0x2
#define RT6_LOOKUP_F_HAS_SADDR 0x4
+#define RT6_SET_ROUTE_INFO 0x0
+#define RT6_GET_ROUTE_INFO 0x1
+
extern struct rt6_info ip6_null_entry;
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 91ef3be..666ec28 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4182,6 +4182,7 @@ EXPORT_SYMBOL(unregister_inet6addr_notif
int __init addrconf_init(void)
{
+ struct timeval tv;
int err = 0;
/* The addrconf netdev notifier requires that loopback_dev
@@ -4209,10 +4210,14 @@ int __init addrconf_init(void)
if (err)
return err;
+ do_gettimeofday(&tv);
ip6_null_entry.rt6i_idev = in6_dev_get(&loopback_dev);
+ ip6_null_entry.rt6i_age = timeval_to_sec(&tv);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
ip6_prohibit_entry.rt6i_idev = in6_dev_get(&loopback_dev);
+ ip6_prohibit_entry.rt6i_age = timeval_to_sec(&tv);
ip6_blk_hole_entry.rt6i_idev = in6_dev_get(&loopback_dev);
+ ip6_blk_hole_entry.rt6i_age = timeval_to_sec(&tv);
#endif
register_netdevice_notifier(&ipv6_dev_notf);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 55ea80f..9df756c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -600,7 +600,14 @@ static int __ip6_ins_rt(struct rt6_info
{
int err;
struct fib6_table *table;
+ struct timeval tv;
+ do_gettimeofday(&tv);
+ /* Update the timeval for new routes
+ * We add it here to make it common irrespective
+ * of how the new route is added.
+ */
+ rt->rt6i_age = timeval_to_sec(&tv);
table = rt->rt6i_table;
write_lock_bh(&table->tb6_lock);
err = fib6_add(&table->tb6_root, rt, info);
@@ -2112,6 +2119,7 @@ static inline size_t rt6_nlmsg_size(void
+ nla_total_size(4) /* RTA_IIF */
+ nla_total_size(4) /* RTA_OIF */
+ nla_total_size(4) /* RTA_PRIORITY */
+ + nla_total_size(4) /*RTA_AGE*/
+ RTAX_MAX * nla_total_size(4) /* RTA_METRICS */
+ nla_total_size(sizeof(struct rta_cacheinfo));
}
@@ -2119,10 +2127,11 @@ static inline size_t rt6_nlmsg_size(void
static int rt6_fill_node(struct sk_buff *skb, struct rt6_info *rt,
struct in6_addr *dst, struct in6_addr *src,
int iif, int type, u32 pid, u32 seq,
- int prefix, unsigned int flags)
+ int prefix, unsigned int flags, int dumpflg)
{
struct rtmsg *rtm;
struct nlmsghdr *nlh;
+ struct timeval tv;
long expires;
u32 table;
@@ -2186,6 +2195,13 @@ static int rt6_fill_node(struct sk_buff
if (ipv6_get_saddr(&rt->u.dst, dst, &saddr_buf) == 0)
NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf);
}
+
+ if (dumpflg) {
+ do_gettimeofday(&tv);
+ NLA_PUT_U32(skb, RTA_AGE, timeval_to_sec(&tv) - rt->rt6i_age);
+ } else {
+ NLA_PUT_U32(skb, RTA_AGE, rt->rt6i_age);
+ }
if (rtnetlink_put_metrics(skb, rt->u.dst.metrics) < 0)
goto nla_put_failure;
@@ -2223,7 +2239,7 @@ int rt6_dump_route(struct rt6_info *rt,
return rt6_fill_node(arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
NETLINK_CB(arg->cb->skb).pid, arg->cb->nlh->nlmsg_seq,
- prefix, NLM_F_MULTI);
+ prefix, NLM_F_MULTI, RT6_GET_ROUTE_INFO);
}
static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void *arg)
@@ -2288,7 +2304,7 @@ static int inet6_rtm_getroute(struct sk_
err = rt6_fill_node(skb, rt, &fl.fl6_dst, &fl.fl6_src, iif,
RTM_NEWROUTE, NETLINK_CB(in_skb).pid,
- nlh->nlmsg_seq, 0, 0);
+ nlh->nlmsg_seq, 0, 0, RT6_GET_ROUTE_INFO);
if (err < 0) {
kfree_skb(skb);
goto errout;
@@ -2317,7 +2333,7 @@ void inet6_rt_notify(int event, struct r
if (skb == NULL)
goto errout;
- err = rt6_fill_node(skb, rt, NULL, NULL, 0, event, pid, seq, 0, 0);
+ err = rt6_fill_node(skb, rt, NULL, NULL, 0, event, pid, seq, 0, 0, RT6_SET_ROUTE_INFO);
if (err < 0) {
/* -EMSGSIZE implies BUG in rt6_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
--
1.4.3.4
^ permalink raw reply related
* [PATCH 3/4 - rev 2] Initilize and populate age field
From: Varun Chandramohan @ 2007-08-20 8:16 UTC (permalink / raw)
To: davem; +Cc: netdev, kaber, socketcan, shemminger, krkumar2, varuncha
The age field is filled with the current time at the time of creation of the route. When the routes are dumped
then the age value stored in the route structure is subtracted from the current time value and the difference is the age expressed in secs.
Signed-off-by: Varun Chandramohan <varunc@linux.vnet.ibm.com>
---
net/ipv4/fib_hash.c | 3 +++
net/ipv4/fib_lookup.h | 3 ++-
net/ipv4/fib_semantics.c | 16 +++++++++++++---
net/ipv4/fib_trie.c | 1 +
4 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 9ad1d9f..228ab27 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -448,6 +448,7 @@ static int fn_hash_insert(struct fib_tab
fa->fa_info = fi;
fa->fa_type = cfg->fc_type;
fa->fa_scope = cfg->fc_scope;
+ fa->fa_age = 0;
state = fa->fa_state;
fa->fa_state &= ~FA_S_ACCESSED;
fib_hash_genid++;
@@ -507,6 +508,7 @@ static int fn_hash_insert(struct fib_tab
new_fa->fa_type = cfg->fc_type;
new_fa->fa_scope = cfg->fc_scope;
new_fa->fa_state = 0;
+ new_fa->fa_age = 0;
/*
* Insert new entry to the list.
@@ -697,6 +699,7 @@ fn_hash_dump_bucket(struct sk_buff *skb,
f->fn_key,
fz->fz_order,
fa->fa_tos,
+ &fa->fa_age,
fa->fa_info,
NLM_F_MULTI) < 0) {
cb->args[4] = i;
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index eef9eec..c9145b5 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -13,6 +13,7 @@ struct fib_alias {
u8 fa_type;
u8 fa_scope;
u8 fa_state;
+ time_t fa_age;
};
#define FA_S_ACCESSED 0x01
@@ -27,7 +28,7 @@ extern struct fib_info *fib_create_info(
extern int fib_nh_match(struct fib_config *cfg, struct fib_info *fi);
extern int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
u32 tb_id, u8 type, u8 scope, __be32 dst,
- int dst_len, u8 tos, struct fib_info *fi,
+ int dst_len, u8 tos, time_t *age, struct fib_info *fi,
unsigned int);
extern void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
int dst_len, u32 tb_id, struct nl_info *info,
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c434119..1822d92 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -278,7 +278,8 @@ static inline size_t fib_nlmsg_size(stru
+ nla_total_size(4) /* RTA_TABLE */
+ nla_total_size(4) /* RTA_DST */
+ nla_total_size(4) /* RTA_PRIORITY */
- + nla_total_size(4); /* RTA_PREFSRC */
+ + nla_total_size(4) /* RTA_PREFSRC */
+ + nla_total_size(4); /*RTA_AGE*/
/* space for nested metrics */
payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
@@ -313,7 +314,7 @@ void rtmsg_fib(int event, __be32 key, st
err = fib_dump_info(skb, info->pid, seq, event, tb_id,
fa->fa_type, fa->fa_scope, key, dst_len,
- fa->fa_tos, fa->fa_info, nlm_flags);
+ fa->fa_tos, &fa->fa_age, fa->fa_info, nlm_flags);
if (err < 0) {
/* -EMSGSIZE implies BUG in fib_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
@@ -940,11 +941,12 @@ __be32 __fib_res_prefsrc(struct fib_resu
}
int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
- u32 tb_id, u8 type, u8 scope, __be32 dst, int dst_len, u8 tos,
+ u32 tb_id, u8 type, u8 scope, __be32 dst, int dst_len, u8 tos, time_t *age,
struct fib_info *fi, unsigned int flags)
{
struct nlmsghdr *nlh;
struct rtmsg *rtm;
+ struct timeval tv;
nlh = nlmsg_put(skb, pid, seq, event, sizeof(*rtm), flags);
if (nlh == NULL)
@@ -985,6 +987,14 @@ int fib_dump_info(struct sk_buff *skb, u
NLA_PUT_U32(skb, RTA_FLOW, fi->fib_nh[0].nh_tclassid);
#endif
}
+
+ do_gettimeofday(&tv);
+ if (!*age) {
+ *age = timeval_to_sec(&tv);
+ NLA_PUT_U32(skb, RTA_AGE, *age);
+ } else {
+ NLA_PUT_U32(skb, RTA_AGE, timeval_to_sec(&tv) - *age);
+ }
#ifdef CONFIG_IP_ROUTE_MULTIPATH
if (fi->fib_nhs > 1) {
struct rtnexthop *rtnh;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 52b2891..82a8bac 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1892,6 +1892,7 @@ static int fn_trie_dump_fa(t_key key, in
xkey,
plen,
fa->fa_tos,
+ &fa->fa_age,
fa->fa_info, 0) < 0) {
cb->args[4] = i;
return -1;
--
1.4.3.4
^ permalink raw reply related
* [PATCH 2/4 - rev2] Add new timeval_to_sec function
From: Varun Chandramohan @ 2007-08-20 8:15 UTC (permalink / raw)
To: davem; +Cc: netdev, kaber, socketcan, shemminger, krkumar2, varuncha
A new function for converting timeval to time_t is added in time.h. Its a common function used in different
places.
Signed-off-by: Varun Chandramohan <varunc@linux.vnet.ibm.com>
---
include/linux/time.h | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/include/linux/time.h b/include/linux/time.h
index 6a5f503..1faf65c 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -149,6 +149,18 @@ static inline s64 timeval_to_ns(const st
}
/**
+ * timeval_to_sec - Convert timeval to seconds
+ * @tv: pointer to the timeval variable to be converted
+ *
+ * Returns the seconds representation of timeval parameter.
+ * Note : Here we round up the value. We dont need accuracy.
+ */
+static inline time_t timeval_to_sec(const struct timeval *tv)
+{
+ return (tv->tv_sec + (tv->tv_usec ? 1 : 0));
+}
+
+/**
* ns_to_timespec - Convert nanoseconds to timespec
* @nsec: the nanoseconds value to be converted
*
--
1.4.3.4
^ permalink raw reply related
* [PATCH 1/4 - rev2] New attribute RTA_AGE
From: Varun Chandramohan @ 2007-08-20 8:14 UTC (permalink / raw)
To: davem; +Cc: netdev, kaber, socketcan, shemminger, krkumar2, varuncha
A new attribute RTA_AGE is added for the age value to be exported to userlevel using netlink
Signed-off-by: Varun Chandramohan <varunc@linux.vnet.ibm.com>
---
include/linux/rtnetlink.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index c91476c..68046a4 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -263,6 +263,7 @@ enum rtattr_type_t
RTA_SESSION,
RTA_MP_ALGO, /* no longer used */
RTA_TABLE,
+ RTA_AGE,
__RTA_MAX
};
--
1.4.3.4
^ permalink raw reply related
* [PATCH 0/4 - rev 2] Age Entry For IPv4 & IPv6 Route Table
From: Varun Chandramohan @ 2007-08-20 8:13 UTC (permalink / raw)
To: davem; +Cc: netdev, kaber, socketcan, shemminger, krkumar2, varuncha
Hi Dave,
This is rev2 of the patch set i sent out sometime ago. I have made it against net-2.6.24 tree. Can you please review and let me know? There have been a few minor changes since rev1.
Original Message:
According to the RFC 4292 (IP Forwarding Table MIB) there is a need for an age entry for all the routes in therouting table. The entry in the RFC is inetCidrRouteAge and oid is inetCidrRouteAge.1.10.
Many snmp application require this age entry. So iam adding the age field in the routing table for ipv4 and ipv6 and providing the interface for this value netlink.
Signed-off-by: Varun Chandramohan <varunc@linux.vnet.ibm.com>
---
^ permalink raw reply
* [PATCH] smc911x irq sense request and MPR2 board support
From: Markus Brunner @ 2007-08-20 6:36 UTC (permalink / raw)
To: jgarzik; +Cc: netdev, Mark Jonas
Hi,
this are the changes to the smc911x driver, which were necessary
to get it running on the Magic Panel R2 (smsc9115).
It is a SH3-DSP based board. The other patches are available on
the linuxsh-dev mailinglist.
http://marc.info/?l=linuxsh-dev&r=1&b=200708&w=2
It was necessary to set the irq sense to low level.
Therefor the SMC_IRQ_SENSE define was added.
How are the chances for inclusion in 2.6.24?
Signed-off by: Markus Brunner <super.firetwister@gmail.com>
Signed-off by: Mark Jonas <toertel@gmail.com>
---
Kconfig | 2 +-
smc911x.c | 2 +-
smc911x.h | 6 ++++++
3 files changed, 8 insertions(+), 2 deletions(-)
--- sh-2.6-intc/drivers/net/Kconfig 2007-08-02 07:05:16.000000000 +0200
+++ sh-2.6/drivers/net/Kconfig 2007-08-03 09:46:20.000000000 +0200
@@ -944,7 +944,7 @@ config SMC911X
tristate "SMSC LAN911[5678] support"
select CRC32
select MII
- depends on ARCH_PXA
+ depends on ARCH_PXA || SUPERH
help
This is a driver for SMSC's LAN911x series of Ethernet chipsets
including the new LAN9115, LAN9116, LAN9117, and LAN9118.
--- sh-2.6-intc/drivers/net/smc911x.c 2007-07-04 21:46:34.000000000 +0200
+++ sh-2.6/drivers/net/smc911x.c 2007-08-14 10:43:16.000000000 +0200
@@ -2084,7 +2084,7 @@ static int __init smc911x_probe(struct n
/* Grab the IRQ */
retval = request_irq(dev->irq, &smc911x_interrupt,
- IRQF_SHARED | IRQF_TRIGGER_FALLING, dev->name, dev);
+ IRQF_SHARED | SMC_IRQ_SENSE, dev->name, dev);
if (retval)
goto err_out;
--- sh-2.6-intc/drivers/net/smc911x.h 2007-07-04 21:46:34.000000000 +0200
+++ sh-2.6/drivers/net/smc911x.h 2007-08-10 13:16:34.000000000 +0200
@@ -36,6 +36,12 @@
#define SMC_USE_PXA_DMA 1
#define SMC_USE_16BIT 0
#define SMC_USE_32BIT 1
+ #define SMC_IRQ_SENSE IRQF_TRIGGER_FALLING
+#elif CONFIG_SH_MAGIC_PANEL_R2
+ #define SMC_USE_SH_DMA 0
+ #define SMC_USE_16BIT 0
+ #define SMC_USE_32BIT 1
+ #define SMC_IRQ_SENSE IRQF_TRIGGER_LOW
#endif
^ permalink raw reply
* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP portsfrom the host TCP port space.
From: ssufficool @ 2007-08-20 4:31 UTC (permalink / raw)
To: David Miller; +Cc: jeff, netdev, rdreier, linux-kernel, general
In-Reply-To: <20070819.002337.06589160.davem@davemloft.net>
[-- Attachment #1.1: Type: text/plain, Size: 2478 bytes --]
We implemented a small office solution using Infiniband purely on a cost
per performance mark. We have a small cluster of 10 servers and for less
that 120K, all from HP.
Pure and simple, Infiniband offers the best price per performance when
considering SAN and MPI consolidation vs F.C. + GbE.
Not limited to top 500 HPC anymore, just those with common sense.
On Sun, 2007-08-19 at 00:23 -0700, David Miller wrote:
> From: "Sean Hefty" <sean.hefty@intel.com>
> Date: Sun, 19 Aug 2007 00:01:07 -0700
>
> > Millions of Infiniband ports are in operation today. Over 25% of the top 500
> > supercomputers use Infiniband. The formation of the OpenFabrics Alliance was
> > pushed and has been continuously funded by an RDMA customer - the US National
> > Labs. RDMA technologies are backed by Cisco, IBM, Intel, QLogic, Sun, Voltaire,
> > Mellanox, NetApp, AMD, Dell, HP, Oracle, Unisys, Emulex, Hitachi, NEC, Fujitsu,
> > LSI, SGI, Sandia, and at least two dozen other companies. IDC expects
> > Infiniband adapter revenue to triple between 2006 and 2011, and switch revenue
> > to increase six-fold (combined revenues of 1 billion).
>
> Scale these numbers with reality and usage.
>
> These vendors pour in huge amounts of money into a relatively small
> number of extremely large cluster installations. Besides the folks
> doing nuke and whole-earth simulations at some government lab, nobody
> cares. And part of the investment is not being done wholly for smart
> economic reasons, but also largely publicity purposes.
>
> So present your great Infiniband numbers with that being admitted up
> front, ok?
>
> It's relevance to Linux as a general purpose operating system that
> should be "good enough" for %99 of the world is close to NIL.
>
> People have been pouring tons of money and research into doing stupid
> things to make clusters go fast, and in such a way that make zero
> sense for general purpose operating systems, for ages. RDMA is just
> one such example.
>
> BTW, I find it ironic that you mention memory bandwidth as a retort,
> as Roland's favorite stateless offload devil, TSO, deals explicity
> with lowering the per-packet BUS bandwidth usage of TCP. LRO
> offloading does likewise.
> _______________________________________________
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[-- Attachment #1.2: Type: text/html, Size: 3938 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply
* Re: [PATCH] IPv6: Fix kernel panic while send SCTP data with IP fragments
From: Wei Yongjun @ 2007-08-20 2:27 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Wei Yongjun, netdev
In-Reply-To: <20070820021238.GU24792@ghostprotocols.net>
Hi Arnaldo Carvalho de Melo:
> Em Mon, Aug 20, 2007 at 09:28:27AM +0800, Wei Yongjun escreveu:
>
>> If ICMP6 message with "Packet Too Big" is received after send SCTP DATA,
>> kernel panic will occur when SCTP DATA is send again.
>>
>> This is because of a bad dest address when call to skb_copy_bits().
>>
>> The messages sequence is like this:
>>
>> Endpoint A Endpoint B
>> <------- SCTP DATA (size=1432)
>> ICMP6 message ------->
>> (Packet Too Big pmtu=1280)
>> <------- Resend SCTP DATA (size=1432)
>> ------------kernel panic---------------
>>
>
> Thanks! I'm to blame for this one, problem was introduced in:
>
> b0e380b1d8a8e0aca215df97702f99815f05c094
>
> @@ -761,7 +762,7 @@ slow_path:
> /*
> * Copy a block of the IP datagram.
> */
> - if (skb_copy_bits(skb, ptr, frag->h.raw, len))
> + if (skb_copy_bits(skb, ptr, skb_transport_header(skb),
> len))
> BUG();
> left -= len;
>
> So please add:
>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
>
> To this patch.
>
> - Arnaldo
>
>
>
>> printing eip:
>> c05be62a
>> *pde = 00000000
>> Oops: 0002 [#1]
>> SMP
>> Modules linked in: scomm l2cap bluetooth ipv6 dm_mirror dm_mod video output sbs battery lp floppy sg i2c_piix4 i2c_core pcnet32 mii button ac parport_pc parport ide_cd cdrom serio_raw mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
>> CPU: 0
>> EIP: 0060:[<c05be62a>] Not tainted VLI
>> EFLAGS: 00010282 (2.6.23-rc2 #1)
>> EIP is at skb_copy_bits+0x4f/0x1ef
>> eax: 000004d0 ebx: ce12a980 ecx: 00000134 edx: cfd5a880
>> esi: c8246858 edi: 00000000 ebp: c0759b14 esp: c0759adc
>> ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
>> Process swapper (pid: 0, ti=c0759000 task=c06d0340 task.ti=c0713000)
>> Stack: c0759b88 c0405867 ce12a980 c8bff838 c789c084 00000000 00000028 cfd5a880
>> d09f1890 000005dc 0000007b ce12a980 cfd5a880 c8bff838 c0759b88 d09bc521
>> 000004d0 fffff96c 00000200 00000100 c0759b50 cfd5a880 00000246 c0759bd4
>> Call Trace:
>> [<c0405e1d>] show_trace_log_lvl+0x1a/0x2f
>> [<c0405ecd>] show_stack_log_lvl+0x9b/0xa3
>> [<c040608d>] show_registers+0x1b8/0x289
>> [<c0406271>] die+0x113/0x246
>> [<c0625dbc>] do_page_fault+0x4ad/0x57e
>> [<c0624642>] error_code+0x72/0x78
>> [<d09bc521>] ip6_output+0x8e5/0xab2 [ipv6]
>> [<d09bcec1>] ip6_xmit+0x2ea/0x3a3 [ipv6]
>> [<d0a3f2ca>] sctp_v6_xmit+0x248/0x253 [sctp]
>> [<d0a3c934>] sctp_packet_transmit+0x53f/0x5ae [sctp]
>> [<d0a34bf8>] sctp_outq_flush+0x555/0x587 [sctp]
>> [<d0a34d3c>] sctp_retransmit+0xf8/0x10f [sctp]
>> [<d0a3d183>] sctp_icmp_frag_needed+0x57/0x5b [sctp]
>> [<d0a3ece2>] sctp_v6_err+0xcd/0x148 [sctp]
>> [<d09cf1ce>] icmpv6_notify+0xe6/0x167 [ipv6]
>> [<d09d009a>] icmpv6_rcv+0x7d7/0x849 [ipv6]
>> [<d09be240>] ip6_input+0x1dc/0x310 [ipv6]
>> [<d09be965>] ipv6_rcv+0x294/0x2df [ipv6]
>> [<c05c3789>] netif_receive_skb+0x2d2/0x335
>> [<c05c5733>] process_backlog+0x7f/0xd0
>> [<c05c58f6>] net_rx_action+0x96/0x17e
>> [<c042e722>] __do_softirq+0x64/0xcd
>> [<c0406f37>] do_softirq+0x5c/0xac
>> =======================
>> Code: 00 00 29 ca 89 d0 2b 45 e0 89 55 ec 85 c0 7e 35 39 45 08 8b 55 e4 0f 4e 45 08 8b 75 e0 8b 7d dc 89 c1 c1 e9 02 03 b2 a0 00 00 00 <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 45 08 0f 84 7b 01 00 00 01
>> EIP: [<c05be62a>] skb_copy_bits+0x4f/0x1ef SS:ESP 0068:c0759adc
>> Kernel panic - not syncing: Fatal exception in interrupt
>>
>> Following is the patch.
>>
Have changed. Thanks
Regards
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
--- a/net/ipv6/ip6_output.c 2007-08-14 10:36:03.000000000 -0400
+++ b/net/ipv6/ip6_output.c 2007-08-17 15:33:35.000000000 -0400
@@ -794,7 +794,7 @@ slow_path:
/*
* Copy a block of the IP datagram.
*/
- if (skb_copy_bits(skb, ptr, skb_transport_header(skb), len))
+ if (skb_copy_bits(skb, ptr, skb_transport_header(frag), len))
BUG();
left -= len;
^ permalink raw reply
* Re: [PATCH] IPv6: Fix kernel panic while send SCTP data with IP fragments
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2007-08-20 2:29 UTC (permalink / raw)
To: yjwei, davem; +Cc: netdev, yoshfuji
In-Reply-To: <46C8EE3B.40105@cn.fujitsu.com>
In article <46C8EE3B.40105@cn.fujitsu.com> (at Mon, 20 Aug 2007 09:28:27 +0800), Wei Yongjun <yjwei@cn.fujitsu.com> says:
> If ICMP6 message with "Packet Too Big" is received after send SCTP DATA,
> kernel panic will occur when SCTP DATA is send again.
>
> This is because of a bad dest address when call to skb_copy_bits().
:
> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
--yoshfuji
^ permalink raw reply
* Re: [PATCH] IPv6: Fix kernel panic while send SCTP data with IP fragments
From: Arnaldo Carvalho de Melo @ 2007-08-20 2:12 UTC (permalink / raw)
To: Wei Yongjun; +Cc: netdev
In-Reply-To: <46C8EE3B.40105@cn.fujitsu.com>
Em Mon, Aug 20, 2007 at 09:28:27AM +0800, Wei Yongjun escreveu:
> If ICMP6 message with "Packet Too Big" is received after send SCTP DATA,
> kernel panic will occur when SCTP DATA is send again.
>
> This is because of a bad dest address when call to skb_copy_bits().
>
> The messages sequence is like this:
>
> Endpoint A Endpoint B
> <------- SCTP DATA (size=1432)
> ICMP6 message ------->
> (Packet Too Big pmtu=1280)
> <------- Resend SCTP DATA (size=1432)
> ------------kernel panic---------------
Thanks! I'm to blame for this one, problem was introduced in:
b0e380b1d8a8e0aca215df97702f99815f05c094
@@ -761,7 +762,7 @@ slow_path:
/*
* Copy a block of the IP datagram.
*/
- if (skb_copy_bits(skb, ptr, frag->h.raw, len))
+ if (skb_copy_bits(skb, ptr, skb_transport_header(skb),
len))
BUG();
left -= len;
So please add:
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
To this patch.
- Arnaldo
> printing eip:
> c05be62a
> *pde = 00000000
> Oops: 0002 [#1]
> SMP
> Modules linked in: scomm l2cap bluetooth ipv6 dm_mirror dm_mod video output sbs battery lp floppy sg i2c_piix4 i2c_core pcnet32 mii button ac parport_pc parport ide_cd cdrom serio_raw mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
> CPU: 0
> EIP: 0060:[<c05be62a>] Not tainted VLI
> EFLAGS: 00010282 (2.6.23-rc2 #1)
> EIP is at skb_copy_bits+0x4f/0x1ef
> eax: 000004d0 ebx: ce12a980 ecx: 00000134 edx: cfd5a880
> esi: c8246858 edi: 00000000 ebp: c0759b14 esp: c0759adc
> ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
> Process swapper (pid: 0, ti=c0759000 task=c06d0340 task.ti=c0713000)
> Stack: c0759b88 c0405867 ce12a980 c8bff838 c789c084 00000000 00000028 cfd5a880
> d09f1890 000005dc 0000007b ce12a980 cfd5a880 c8bff838 c0759b88 d09bc521
> 000004d0 fffff96c 00000200 00000100 c0759b50 cfd5a880 00000246 c0759bd4
> Call Trace:
> [<c0405e1d>] show_trace_log_lvl+0x1a/0x2f
> [<c0405ecd>] show_stack_log_lvl+0x9b/0xa3
> [<c040608d>] show_registers+0x1b8/0x289
> [<c0406271>] die+0x113/0x246
> [<c0625dbc>] do_page_fault+0x4ad/0x57e
> [<c0624642>] error_code+0x72/0x78
> [<d09bc521>] ip6_output+0x8e5/0xab2 [ipv6]
> [<d09bcec1>] ip6_xmit+0x2ea/0x3a3 [ipv6]
> [<d0a3f2ca>] sctp_v6_xmit+0x248/0x253 [sctp]
> [<d0a3c934>] sctp_packet_transmit+0x53f/0x5ae [sctp]
> [<d0a34bf8>] sctp_outq_flush+0x555/0x587 [sctp]
> [<d0a34d3c>] sctp_retransmit+0xf8/0x10f [sctp]
> [<d0a3d183>] sctp_icmp_frag_needed+0x57/0x5b [sctp]
> [<d0a3ece2>] sctp_v6_err+0xcd/0x148 [sctp]
> [<d09cf1ce>] icmpv6_notify+0xe6/0x167 [ipv6]
> [<d09d009a>] icmpv6_rcv+0x7d7/0x849 [ipv6]
> [<d09be240>] ip6_input+0x1dc/0x310 [ipv6]
> [<d09be965>] ipv6_rcv+0x294/0x2df [ipv6]
> [<c05c3789>] netif_receive_skb+0x2d2/0x335
> [<c05c5733>] process_backlog+0x7f/0xd0
> [<c05c58f6>] net_rx_action+0x96/0x17e
> [<c042e722>] __do_softirq+0x64/0xcd
> [<c0406f37>] do_softirq+0x5c/0xac
> =======================
> Code: 00 00 29 ca 89 d0 2b 45 e0 89 55 ec 85 c0 7e 35 39 45 08 8b 55 e4 0f 4e 45 08 8b 75 e0 8b 7d dc 89 c1 c1 e9 02 03 b2 a0 00 00 00 <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 45 08 0f 84 7b 01 00 00 01
> EIP: [<c05be62a>] skb_copy_bits+0x4f/0x1ef SS:ESP 0068:c0759adc
> Kernel panic - not syncing: Fatal exception in interrupt
>
> Following is the patch.
>
> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
>
>
> --- a/net/ipv6/ip6_output.c 2007-08-14 10:36:03.000000000 -0400
> +++ b/net/ipv6/ip6_output.c 2007-08-17 15:33:35.000000000 -0400
> @@ -794,7 +794,7 @@ slow_path:
> /*
> * Copy a block of the IP datagram.
> */
> - if (skb_copy_bits(skb, ptr, skb_transport_header(skb), len))
> + if (skb_copy_bits(skb, ptr, skb_transport_header(frag), len))
> BUG();
> left -= len;
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: Felix Marti @ 2007-08-20 1:45 UTC (permalink / raw)
To: Andi Kleen; +Cc: jeff, netdev, rdreier, linux-kernel, general, David Miller
In-Reply-To: <p73hcmv14wo.fsf@bingen.suse.de>
> -----Original Message-----
> From: ak@suse.de [mailto:ak@suse.de] On Behalf Of Andi Kleen
> Sent: Sunday, August 19, 2007 4:28 PM
> To: Felix Marti
> Cc: David Miller; jeff@garzik.org; netdev@vger.kernel.org;
> rdreier@cisco.com; linux-kernel@vger.kernel.org;
> general@lists.openfabrics.org
> Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate
> PS_TCPportsfrom the host TCP port space.
>
> "Felix Marti" <felix@chelsio.com> writes:
>
> > what benefits does the TSO infrastructure give the
> > non-TSO capable devices?
>
> It improves performance on software queueing devices between guests
> and hypervisors. This is a more and more important application these
> days. Even when the system running the Hypervisor has a non TSO
> capable device in the end it'll still save CPU cycles this way. Right
> now
> virtualized IO tends to much more CPU intensive than direct IO so any
> help it can get is beneficial.
>
> It also makes loopback faster, although given that's probably not that
> useful.
>
> And a lot of the "TSO infrastructure" was needed for zero copy TX
> anyways,
> which benefits most reasonable modern NICs (anything with hardware
> checksumming)
Hi Andi, yes, you're right. I should have chosen my example more
carefully.
>
> -Andi
^ permalink raw reply
* RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: Felix Marti @ 2007-08-20 1:41 UTC (permalink / raw)
To: David Miller; +Cc: sean.hefty, netdev, rdreier, general, linux-kernel, jeff
In-Reply-To: <20070819.180540.74750322.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Sunday, August 19, 2007 6:06 PM
> To: Felix Marti
> Cc: sean.hefty@intel.com; netdev@vger.kernel.org; rdreier@cisco.com;
> general@lists.openfabrics.org; linux-kernel@vger.kernel.org;
> jeff@garzik.org
> Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate
> PS_TCPportsfrom the host TCP port space.
>
> From: "Felix Marti" <felix@chelsio.com>
> Date: Sun, 19 Aug 2007 17:47:59 -0700
>
> > [Felix Marti]
>
> Please stop using this to start your replies, thank you.
Better?
>
> > David and Herbert, so you agree that the user<>kernel
> > space memory copy overhead is a significant overhead and we want to
> > enable zero-copy in both the receive and transmit path? - Yes, copy
> > avoidance is mainly an API issue and unfortunately the so widely
used
> > (synchronous) sockets API doesn't make copy avoidance easy, which is
> one
> > area where protocol offload can help. Yes, some apps can resort to
> > sendfile() but there are many apps which seem to have trouble
> switching
> > to that API... and what about the receive path?
>
> On the send side none of this is an issue. You either are sending
> static content, in which using sendfile() is trivial, or you're
> generating data dynamically in which case the data copy is in the
> noise or too small to do zerocopy on and if not you can use a shared
> mmap to generate your data into, and then sendfile out from that file,
> to avoid the copy that way.
>
> splice() helps a lot too.
>
> Splice has the capability to do away with the receive side too, and
> there are a few receivefile() implementations that could get cleaned
> up and merged in.
I don't believe it is as simple as that. Many apps synthesize their
payload in user space buffers (i.e. malloced memory) and expect to
receive their data in user space buffers _and_ expect the received data
to have a certain alignment and to be contiguous - something not
addressed by these 'new' APIs. Look, people writing HPC apps tend to
take advantage of whatever they can to squeeze some extra performance
out of their apps and they are resorting to protocol offload technology
for a reason, wouldn't you agree?
>
> Also, the I/O bus is still the more limiting factor and main memory
> bandwidth in all of this, it is the smallest data pipe for
> communications out to and from the network. So the protocol header
> avoidance gains of TSO and LRO are still a very worthwhile savings.
So, i.e. with TSO, your saving about 16 headers (let us say 14 + 20 +
20), 864B, when moving ~64KB of payload - looks like very much in the
noise to me. And again, PCI-E provides more bandwidth than the wire...
>
> But even if RDMA increases performance 100 fold, it still doesn't
> avoid the issue that it doesn't fit in with the rest of the networking
> stack and feature set.
>
> Any monkey can change the rules around ("ok I can make it go fast as
> long as you don't need firewalling, packet scheduling, classification,
> and you only need to talk to specific systems that speak this same
> special protocol") to make things go faster. On the other hand well
> designed solutions can give performance gains within the constraints
> of the full system design and without sactificing functionality.
While I believe that you should give people an option to get 'high
performance' _instead_ of other features and let them chose whatever
they care about, I really do agree with what you're saying and believe
that offload devices _should_ be integrated with the facilities that you
mention (in fact, offload can do a much better job at lots of things
that you mention ;) ... but you're not letting offload devices integrate
and you're slowing down innovation in this field.
^ permalink raw reply
* [PATCH] IPv6: Fix kernel panic while send SCTP data with IP fragments
From: Wei Yongjun @ 2007-08-20 1:28 UTC (permalink / raw)
To: netdev
If ICMP6 message with "Packet Too Big" is received after send SCTP DATA,
kernel panic will occur when SCTP DATA is send again.
This is because of a bad dest address when call to skb_copy_bits().
The messages sequence is like this:
Endpoint A Endpoint B
<------- SCTP DATA (size=1432)
ICMP6 message ------->
(Packet Too Big pmtu=1280)
<------- Resend SCTP DATA (size=1432)
------------kernel panic---------------
printing eip:
c05be62a
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: scomm l2cap bluetooth ipv6 dm_mirror dm_mod video output sbs battery lp floppy sg i2c_piix4 i2c_core pcnet32 mii button ac parport_pc parport ide_cd cdrom serio_raw mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU: 0
EIP: 0060:[<c05be62a>] Not tainted VLI
EFLAGS: 00010282 (2.6.23-rc2 #1)
EIP is at skb_copy_bits+0x4f/0x1ef
eax: 000004d0 ebx: ce12a980 ecx: 00000134 edx: cfd5a880
esi: c8246858 edi: 00000000 ebp: c0759b14 esp: c0759adc
ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
Process swapper (pid: 0, ti=c0759000 task=c06d0340 task.ti=c0713000)
Stack: c0759b88 c0405867 ce12a980 c8bff838 c789c084 00000000 00000028 cfd5a880
d09f1890 000005dc 0000007b ce12a980 cfd5a880 c8bff838 c0759b88 d09bc521
000004d0 fffff96c 00000200 00000100 c0759b50 cfd5a880 00000246 c0759bd4
Call Trace:
[<c0405e1d>] show_trace_log_lvl+0x1a/0x2f
[<c0405ecd>] show_stack_log_lvl+0x9b/0xa3
[<c040608d>] show_registers+0x1b8/0x289
[<c0406271>] die+0x113/0x246
[<c0625dbc>] do_page_fault+0x4ad/0x57e
[<c0624642>] error_code+0x72/0x78
[<d09bc521>] ip6_output+0x8e5/0xab2 [ipv6]
[<d09bcec1>] ip6_xmit+0x2ea/0x3a3 [ipv6]
[<d0a3f2ca>] sctp_v6_xmit+0x248/0x253 [sctp]
[<d0a3c934>] sctp_packet_transmit+0x53f/0x5ae [sctp]
[<d0a34bf8>] sctp_outq_flush+0x555/0x587 [sctp]
[<d0a34d3c>] sctp_retransmit+0xf8/0x10f [sctp]
[<d0a3d183>] sctp_icmp_frag_needed+0x57/0x5b [sctp]
[<d0a3ece2>] sctp_v6_err+0xcd/0x148 [sctp]
[<d09cf1ce>] icmpv6_notify+0xe6/0x167 [ipv6]
[<d09d009a>] icmpv6_rcv+0x7d7/0x849 [ipv6]
[<d09be240>] ip6_input+0x1dc/0x310 [ipv6]
[<d09be965>] ipv6_rcv+0x294/0x2df [ipv6]
[<c05c3789>] netif_receive_skb+0x2d2/0x335
[<c05c5733>] process_backlog+0x7f/0xd0
[<c05c58f6>] net_rx_action+0x96/0x17e
[<c042e722>] __do_softirq+0x64/0xcd
[<c0406f37>] do_softirq+0x5c/0xac
=======================
Code: 00 00 29 ca 89 d0 2b 45 e0 89 55 ec 85 c0 7e 35 39 45 08 8b 55 e4 0f 4e 45 08 8b 75 e0 8b 7d dc 89 c1 c1 e9 02 03 b2 a0 00 00 00 <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 45 08 0f 84 7b 01 00 00 01
EIP: [<c05be62a>] skb_copy_bits+0x4f/0x1ef SS:ESP 0068:c0759adc
Kernel panic - not syncing: Fatal exception in interrupt
Following is the patch.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
--- a/net/ipv6/ip6_output.c 2007-08-14 10:36:03.000000000 -0400
+++ b/net/ipv6/ip6_output.c 2007-08-17 15:33:35.000000000 -0400
@@ -794,7 +794,7 @@ slow_path:
/*
* Copy a block of the IP datagram.
*/
- if (skb_copy_bits(skb, ptr, skb_transport_header(skb), len))
+ if (skb_copy_bits(skb, ptr, skb_transport_header(frag), len))
BUG();
left -= len;
^ permalink raw reply
* Re: [PATCH 6/7 v2] fs_enet: Be an of_platform device when CONFIG_PPC_CPM_NEW_BINDING is set.
From: Scott Wood @ 2007-08-20 1:29 UTC (permalink / raw)
To: Kumar Gala; +Cc: netdev, jgarzik, linuxppc-dev
In-Reply-To: <55FCD650-548C-40C9-8B1E-16259EF93F1D@kernel.crashing.org>
On Sat, Aug 18, 2007 at 11:36:24AM -0500, Kumar Gala wrote:
> This patch seems to mix moving to using the device tree directly w/o
> some other modifications. Can it be broken into those two changes as
> they'd be easier to review.
The last iteration of these patches, I got complaints that I was
splitting them up too fine-grained. I don't think it's productive to
keep iterating on exactly how much is in any given patch until I hit the
right combination of granularity and whoever happens to be reading e-mail
when I submit.
In the case of this particular patch, most of what isn't directly related
to converting to using the device tree directly is fixing problems that I
encountered in doing so -- what value is there in coming up with
intermediary versions that kind-of-sort-of make sense, on a good day, if
you don't look to closely? The existing codebase is crap, and if every
logical change were its own patch, the patchset would be ten times as
long, and take ten times as long to produce. Note that I did separate
what I thought were the more relevant-to-review and/or highly indpendent
changes.
The major thing I see in this patch that could have been usefully
separated out was the conversion of mii_bitbang.c to use the generic code
introduced by patch 1. However, that would require retrieving and
retesting the intermediate version, and I don't think there's sufficient
damage to reviewability (apart perhaps from diff's stupidity in thinking
that a single "{" is relevant common ground between completely unrelated
chunks of code) relative to the cost in preparing such a split.
Is there anything of the actual content of the patch that you object to,
or have a question about?
-Scott
^ permalink raw reply
* Re: Marvell 88E8056 gigabit ethernet controller
From: Kevin E @ 2007-08-20 1:15 UTC (permalink / raw)
To: linux-kernel, netdev
In-Reply-To: <2456F7EB-000C-4B71-B002-64340DD17BA8@linuxmontreal.com>
Someone wrote me with a solution to try and so far
it's working. They suggested I try the driver up on
Marvell's website but to make sure I powered off the
machine completely and when it rebooted to not have
any of the regular kernel drivers for the Marvell
chipset to load. They had found that letting the sky2
load and then unloading the module would mean the
vendor's driver wouldn't work.
So I got down the latest driver package they have
(10.0.5.3). At first I couldn't get it compiled
against kernel 2.6.22.3 that I'm running, but I have
it compiled with the 2.6.21.5 kernel, which is what
the machine is running now. And I'm happy to say that
it's working fine so far. I've transfered about 4G
over the link and it's still working fine.
Since Marvell's driver seems to be working for the
88E8056 chipset and from what I've looked at the code
it's marked as GPL, could it be rolled into the kernel
for those of us that have 88E8056 chipsets that are
working to use?
____________________________________________________________________________________
Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games.
http://sims.yahoo.com/
^ permalink raw reply
* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: David Miller @ 2007-08-20 1:05 UTC (permalink / raw)
To: felix; +Cc: jeff, netdev, rdreier, linux-kernel, general
In-Reply-To: <8A71B368A89016469F72CD08050AD334018E20BE@maui.asicdesigners.com>
From: "Felix Marti" <felix@chelsio.com>
Date: Sun, 19 Aug 2007 17:47:59 -0700
> [Felix Marti]
Please stop using this to start your replies, thank you.
> David and Herbert, so you agree that the user<>kernel
> space memory copy overhead is a significant overhead and we want to
> enable zero-copy in both the receive and transmit path? - Yes, copy
> avoidance is mainly an API issue and unfortunately the so widely used
> (synchronous) sockets API doesn't make copy avoidance easy, which is one
> area where protocol offload can help. Yes, some apps can resort to
> sendfile() but there are many apps which seem to have trouble switching
> to that API... and what about the receive path?
On the send side none of this is an issue. You either are sending
static content, in which using sendfile() is trivial, or you're
generating data dynamically in which case the data copy is in the
noise or too small to do zerocopy on and if not you can use a shared
mmap to generate your data into, and then sendfile out from that file,
to avoid the copy that way.
splice() helps a lot too.
Splice has the capability to do away with the receive side too, and
there are a few receivefile() implementations that could get cleaned
up and merged in.
Also, the I/O bus is still the more limiting factor and main memory
bandwidth in all of this, it is the smallest data pipe for
communications out to and from the network. So the protocol header
avoidance gains of TSO and LRO are still a very worthwhile savings.
But even if RDMA increases performance 100 fold, it still doesn't
avoid the issue that it doesn't fit in with the rest of the networking
stack and feature set.
Any monkey can change the rules around ("ok I can make it go fast as
long as you don't need firewalling, packet scheduling, classification,
and you only need to talk to specific systems that speak this same
special protocol") to make things go faster. On the other hand well
designed solutions can give performance gains within the constraints
of the full system design and without sactificing functionality.
^ permalink raw reply
* RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: Felix Marti @ 2007-08-20 0:47 UTC (permalink / raw)
To: David Miller; +Cc: jeff, netdev, rdreier, linux-kernel, general
In-Reply-To: <20070819.174017.77241227.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Sunday, August 19, 2007 5:40 PM
> To: Felix Marti
> Cc: sean.hefty@intel.com; netdev@vger.kernel.org; rdreier@cisco.com;
> general@lists.openfabrics.org; linux-kernel@vger.kernel.org;
> jeff@garzik.org
> Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate
> PS_TCPportsfrom the host TCP port space.
>
> From: "Felix Marti" <felix@chelsio.com>
> Date: Sun, 19 Aug 2007 17:32:39 -0700
>
> [ Why do you put that "[Felix Marti]" everywhere you say something?
> It's annoying and superfluous. The quoting done by your mail client
> makes clear who is saying what. ]
>
> > Hmmm, interesting... I guess it is impossible to even have
> > a discussion on the subject.
>
> Nice try, Herbert Xu gave a great explanation.
[Felix Marti] David and Herbert, so you agree that the user<>kernel
space memory copy overhead is a significant overhead and we want to
enable zero-copy in both the receive and transmit path? - Yes, copy
avoidance is mainly an API issue and unfortunately the so widely used
(synchronous) sockets API doesn't make copy avoidance easy, which is one
area where protocol offload can help. Yes, some apps can resort to
sendfile() but there are many apps which seem to have trouble switching
to that API... and what about the receive path?
^ permalink raw reply
* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: David Miller @ 2007-08-20 0:40 UTC (permalink / raw)
To: felix; +Cc: jeff, netdev, rdreier, linux-kernel, general
In-Reply-To: <8A71B368A89016469F72CD08050AD334018E20BC@maui.asicdesigners.com>
From: "Felix Marti" <felix@chelsio.com>
Date: Sun, 19 Aug 2007 17:32:39 -0700
[ Why do you put that "[Felix Marti]" everywhere you say something?
It's annoying and superfluous. The quoting done by your mail client
makes clear who is saying what. ]
> Hmmm, interesting... I guess it is impossible to even have
> a discussion on the subject.
Nice try, Herbert Xu gave a great explanation.
^ permalink raw reply
* RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: Felix Marti @ 2007-08-20 0:32 UTC (permalink / raw)
To: David Miller; +Cc: jeff, netdev, rdreier, linux-kernel, general
In-Reply-To: <20070819.160428.76330262.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Sunday, August 19, 2007 4:04 PM
> To: Felix Marti
> Cc: sean.hefty@intel.com; netdev@vger.kernel.org; rdreier@cisco.com;
> general@lists.openfabrics.org; linux-kernel@vger.kernel.org;
> jeff@garzik.org
> Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate
> PS_TCPportsfrom the host TCP port space.
>
> From: "Felix Marti" <felix@chelsio.com>
> Date: Sun, 19 Aug 2007 12:49:05 -0700
>
> > You're not at all addressing the fact that RDMA does solve the
> > memory BW problem and stateless offload doesn't.
>
> It does, I just didn't retort to your claims because they were
> so blatantly wrong.
[Felix Marti] Hmmm, interesting... I guess it is impossible to even have
a discussion on the subject.
^ permalink raw reply
* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
From: Herbert Xu @ 2007-08-20 0:18 UTC (permalink / raw)
To: Felix Marti
Cc: davem, sean.hefty, netdev, rdreier, general, linux-kernel, jeff
In-Reply-To: <8A71B368A89016469F72CD08050AD334018E208F@maui.asicdesigners.com>
Felix Marti <felix@chelsio.com> wrote:
>
> [Felix Marti] Aren't you confusing memory and bus BW here? - RDMA
> enables DMA from/to application buffers removing the user-to-kernel/
> kernel-to-user memory copy with is a significant overhead at the
> rates we're talking about: memory copy at 20Gbps (10Gbps in and 10Gbps
> out) requires 60Gbps of BW on most common platforms. So, receiving and
> transmitting at 10Gbps with LRO and TSO requires 80Gbps of system
> memory BW (which is beyond what most systems can do) whereas RDMA can
> do with 20Gbps!
Actually this is false. TSO only requires a copy if the user
chooses to use the sendmsg interface instead of sendpage. The
same is true for RDMA really. Except that instead of having to
switch your application to sendfile/splice, you're switching it
to RDMA.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox