* Re: Please pull 'upstream' branch of wireless-2.6
From: Jeff Garzik @ 2006-04-26 10:18 UTC (permalink / raw)
To: John W. Linville, netdev
In-Reply-To: <20060424204044.GC21761@tuxdriver.com>
John W. Linville wrote:
> The following changes since commit 7c241d37fe0e6442c5cf3b5d73f7f58f2dc66352:
> Michael Buesch:
> bcm43xx: make PIO mode usable
>
> are found in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream
pulled
^ permalink raw reply
* Re: Please pull 'upstream-fixes' branch of wireless-2.6
From: Jeff Garzik @ 2006-04-26 10:17 UTC (permalink / raw)
To: jeff, netdev
In-Reply-To: <20060424194006.GB21761@tuxdriver.com>
John W. Linville wrote:
> The following changes since commit 6b426e785cb81e53dc2fc4dcf997661472b470ef:
> Linus Torvalds:
> Merge git://git.kernel.org/.../kyle/parisc-2.6
>
> are found in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-fixes
pulled
^ permalink raw reply
* Re: [PATCH] e1000: skb truesize fix
From: Jeff Garzik @ 2006-04-26 10:16 UTC (permalink / raw)
To: Kok, Auke
Cc: stable, netdev, Miller, David, Ronciak, John, Brandeburg, Jesse,
Kirsher, Jeff, Kok, Auke
In-Reply-To: <20060426061229.25966.83974.stgit@gitlost.site>
Kok, Auke wrote:
> Hi,
>
> This patch was already merged in Jeff Garzik's netdev upstream branch but
> needs to go into 12.6.16.y and 2.6.17rc* as it fixes a critical buffersize
> skb bug that is exposed by an earlier patch by Dave Miller and Herbert
> Xiu. I'm therefore resending it:
>
> Please apply to 2.6.16.y and queue for 2.6.17-rc.
>
> These changes are available through git. Jeff, please pull from:
>
> git://lost.foo-projects.org/~ahkok/git/linux-2.6 skb_truesize
pulled, queued for 2.6.17-rc
^ permalink raw reply
* Re: [PATCH] netdev: hotplug napi race cleanup
From: David S. Miller @ 2006-04-26 9:43 UTC (permalink / raw)
To: shemminger; +Cc: herbert, patrakov, netdev, akpm
In-Reply-To: <20060424152341.094b72d8@localhost.localdomain>
From: Stephen Hemminger <shemminger@osdl.org>
Date: Mon, 24 Apr 2006 15:23:41 -0700
> This follows after the earlier two patches.
>
> Change the initialization of the class device portion of the net device
> to be done earlier, so that any races before registration completes are
> harmless. Add a mutex to avoid changes to netdevice during the
> class device registration.
>
> Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Once Greg KH puts in the necessary infrastructure patches, I'll put
this one in too.
^ permalink raw reply
* Re: [PATCH]: suspicious unlikely usage in tcp_transmit_skb()
From: David S. Miller @ 2006-04-26 9:42 UTC (permalink / raw)
To: hzhong; +Cc: netdev
In-Reply-To: <444D5E73.7020803@gmail.com>
From: Hua Zhong <hzhong@gmail.com>
Date: Mon, 24 Apr 2006 16:25:39 -0700
> Hi,
>
> I am developing a profiling tool to check if likely/unlikely usages are wise. I find that the following one is always a miss:
>
> # Hit # miss Function:Filename@Line
> ! 0 50505 tcp_transmit_skb():net/ipv4/tcp_output.c@468
>
> There is a chance that my tool is buggy, but I just want to confirm with you whether this does look suspicious and what your opinion is.
>
> Signed-off-by: Hua Zhong <hzhong@gmail.com>
Your patch is semantically correct but does not apply, because your
email client has turned all of the tab characters in the patch into
spaces. This corrupts the patch and makes it unusable.
This problem is hit by pretty much every single gmail user that tries
to send a patch for the first time. I wish gmail would not mangle
ascii text by default.
Please fix this and repost, retaining a proper changelog entry and
signed off line, thank you.
^ permalink raw reply
* Re: [PATCH] bridge: allow full size vlan packets (repost)
From: David S. Miller @ 2006-04-26 9:39 UTC (permalink / raw)
To: shemminger; +Cc: netdev
In-Reply-To: <20060425110812.78df807f@localhost.localdomain>
From: Stephen Hemminger <shemminger@osdl.org>
Date: Tue, 25 Apr 2006 11:08:12 -0700
> Need to allow for VLAN header when bridging.
>
> Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Applied, thanks Stephen.
^ permalink raw reply
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-26 7:59 UTC (permalink / raw)
To: kelly; +Cc: netdev, rusty
In-Reply-To: <200604261147.34221.kelly@au.ibm.com>
Ok I have comments already just glancing at the initial patch.
With the 32-bit descriptors in the channel, you indeed end up
with a fixed sized pool with a lot of hard-to-finesse sizing
and lookup problems to solve.
So what I wanted to do was finesse the entire issue by simply
side-stepping it initially. Use a normal buffer with a tail
descriptor, when you enqueue you give a tail descriptor pointer.
Yes, it's weirder to handle this in hardware, but it's not
impossible and using real pointers means two things:
1) You can design a simple netif_receive_skb() channel that works
today, encapsulation of channel buffers into an SKB is like
15 lines of code and no funny lookups.
2) People can start porting the input path of drivers right now and
retain full functionality and test anything they want. This is
important for getting the drivers stable as fast as possible.
And it also means we can tackle the buffer pool issue of the 32-bit
descriptors later, if we actually want to do things that way, I
think we probably don't.
To be honest, I don't think using a 32-bit descriptor is so critical
even from a hardware implementation perspective. Yes, on 64-bit
you're dealing with a 64-bit quantity so the number of entries in the
channel are halfed from what a 32-bit arch uses.
Yes I say this for 2 reasons:
1) We have no idea whether it's critical to have "~512" entries
in the channel which is about what a u32 queue entry type
affords you on x86 with 4096 byte page size.
2) Furthermore, it is sized by page size, and most 64-bit platforms
use an 8K base page size anyways, so the number of queue entries
ends of being the same. Yes, I know some 64-bit platforms use
a 4K page size, please see #1 :-)
I really dislike the pools of buffers, partly because they are fixed
size (or dynamically sized and even more expensive to implement), but
moreso because there is all of this absolutely stupid state management
you eat just to get at the real data. That's pointless, we're trying
to make this as light as possible. Just use real pointers and
describe the packet with a tail descriptor.
We can use a u64 or whatever in a hardware implementation.
Next, you can't even begin to work on the protocol channels before you
do one very important piece of work. Integration of all of the ipv4
and ipv6 protocol hash tables into a central code, it's a total
prerequisite. Then you modify things to use a generic
inet_{,listen_}lookup() or inet6_{,listen_}lookup() that takes a
protocol number as well as saddr/daddr/sport/dport and searches
from a central table.
So I think I'll continue working on my implementation, it's more
transitional and that's how we have to do this kind of work.
^ permalink raw reply
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-26 7:33 UTC (permalink / raw)
To: kelly; +Cc: netdev, rusty
In-Reply-To: <200604261147.34221.kelly@au.ibm.com>
From: Kelly Daly <kelly@au1.ibm.com>
Date: Wed, 26 Apr 2006 11:47:34 +0000
> Noting Dave's recent release of his implementation, we thought we'd
> better get this "out there" so we can do some early
> comparison/combining and come up with the best possible
> implementation.
Thanks for publishing your work.
I'm actually not that upset that I duplicated the work a little
bit because trying to start implementing things forced me to
think in a more focued way about this stuff.
I'll look over your patches, thanks.
^ permalink raw reply
* Re: [PATCH] e1000: skb truesize fix
From: David S. Miller @ 2006-04-26 7:31 UTC (permalink / raw)
To: auke-jan.h.kok
Cc: stable, jgarzik, netdev, john.ronciak, jesse.brandeburg,
Jeffrey.t.kirsher, auke
In-Reply-To: <20060426061229.25966.83974.stgit@gitlost.site>
From: "Kok, Auke" <auke-jan.h.kok@intel.com>
Date: Tue, 25 Apr 2006 23:12:30 -0700
> This patch was already merged in Jeff Garzik's netdev upstream branch but
> needs to go into 12.6.16.y and 2.6.17rc* as it fixes a critical buffersize
> skb bug that is exposed by an earlier patch by Dave Miller and Herbert
> Xiu. I'm therefore resending it:
>
> Please apply to 2.6.16.y and queue for 2.6.17-rc.
>
> These changes are available through git. Jeff, please pull from:
>
> git://lost.foo-projects.org/~ahkok/git/linux-2.6 skb_truesize
>
> these patches are against
> linux-2.6.git#master 4d5c34ec7b007cfb0771a36996b009f194acbb2f
ACK on the -stable submission. Jeff can take care of 2.6.17-x
^ permalink raw reply
* Re: [PATCH]: suspicious unlikely usage in tcp_transmit_skb()
From: David S. Miller @ 2006-04-26 7:26 UTC (permalink / raw)
To: shemminger; +Cc: hzhong, netdev
In-Reply-To: <20060425151635.549d3400@localhost.localdomain>
From: Stephen Hemminger <shemminger@osdl.org>
Date: Tue, 25 Apr 2006 15:16:35 -0700
> On Tue, 25 Apr 2006 14:46:49 -0700 (PDT)
> "David S. Miller" <davem@davemloft.net> wrote:
>
> > From: Stephen Hemminger <shemminger@osdl.org>
> > Date: Tue, 25 Apr 2006 10:01:49 -0700
> >
> > > > # Hit # miss Function:Filename@Line
> > > > ! 0 50505 tcp_transmit_skb():net/ipv4/tcp_output.c@468
> > ...
> > > How about just taking off the likely/unlikely in this case.
> >
> > Why remove it when we'll now get a 50505 to 0 hit rate?
>
> Depends on the data stream, but I guess if we are seeing high loss
> we really don't care about the CPU branch prediction.
I disagree, this is a hard error condition, and happens only when
af_ops->queue_xmit() cannot send the packet successfully. Under
any normal circumstances whatsoever, it will succeed.
Are you actually looking at the right piece of code? :-)
^ permalink raw reply
* iptables doubt
From: varun @ 2006-04-26 7:09 UTC (permalink / raw)
To: netdev
Hi all,
Ive been trying to understand iptables kernel code and
basically how it functions. In doing so i have a few questions.
In the file ip_tables.c there is call do_replace() which
is used as the start point entry from sockopt.
That is this gets called everytime a user entrers
policies from user. Here that data is given to me in the form of
void __user *user.
This iam copying to kernel space and dereferencing into
ipt_replace and so on. Am i right?
The first question is user seems to send a size as 860
when trying to add the first policy. Does that mean that user is
maintaining the offset of the policies added?
tmp.size shows as 768 which is (4 default policies x
sizeof(struct ipt_standard)) + sizeof(struct ipt_error)
Am i correct in understanding? If so why should user
space kernel policy offset?
Next thing is i added one extra field (int
num)in the struct ipt_entry_target . This is added after the unsigned
char data[0] field.
struct ipt_entry_target
{
union {
struct {
u_int16_t target_size;
/* Used by userspace */
char name[IPT_FUNCTION_MAXNAMELEN-1];
u_int8_t revision;
} user;
struct {
u_int16_t target_size;
/* Used inside the kernel */
struct ipt_target *target;
} kernel;
/* Total length */
u_int16_t target_size;
} u;
unsigned char data[0];
unsigned int uniqueId; /*I added this*/
};
Iam using this field to give a global id from my kernel for every
policy added excluding the default ones added by kernel. So if someone
calls for iptables -F or iptables -t filter -D .... then this number
should not be assigned to the structure.
I want to know where is the correct place to add this value to
structure without effecting the functionality.
Iam also aware that making this change in structure will result in
segmentation fault un userspace. Ill handel it seperately.
Can this be done? Please help me in this regard.
How can i know from the kernel structures if the policy is for -A or -D
or -F ?
Varun
^ permalink raw reply
* [PATCH] e1000: Update truesize with the length of the packet for packet split
From: Kok, Auke @ 2006-04-26 6:16 UTC (permalink / raw)
To: stable, Garzik, Jeff
Cc: netdev, Miller, David, Ronciak, John, Brandeburg, Jesse,
Kirsher, Jeff, Kok, Auke
In-Reply-To: <20060426061229.25966.83974.stgit@gitlost.site>
Update skb with the real packet size.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
---
drivers/net/e1000/e1000_main.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index add8dc4..c99e878 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3768,6 +3768,7 @@ e1000_clean_rx_irq_ps(struct e1000_adapt
ps_page->ps_page[j] = NULL;
skb->len += length;
skb->data_len += length;
+ skb->truesize += length;
}
copydone:
--
Auke Kok <auke-jan.h.kok@intel.com>
Intel Pro(R) Ethernet Driver Group
LAN Access Division / Digital Enterprise Group
^ permalink raw reply related
* [PATCH] e1000: skb truesize fix
From: Kok, Auke @ 2006-04-26 6:12 UTC (permalink / raw)
To: stable, Garzik, Jeff
Cc: netdev, Miller, David, Ronciak, John, Brandeburg, Jesse,
Kirsher, Jeff, Kok, Auke
Hi,
This patch was already merged in Jeff Garzik's netdev upstream branch but
needs to go into 12.6.16.y and 2.6.17rc* as it fixes a critical buffersize
skb bug that is exposed by an earlier patch by Dave Miller and Herbert
Xiu. I'm therefore resending it:
Please apply to 2.6.16.y and queue for 2.6.17-rc.
These changes are available through git. Jeff, please pull from:
git://lost.foo-projects.org/~ahkok/git/linux-2.6 skb_truesize
these patches are against
linux-2.6.git#master 4d5c34ec7b007cfb0771a36996b009f194acbb2f
Cheers,
Auke Kok
---
drivers/net/e1000/e1000_main.c | 1 +
1 files changed, 1 insertion(+)
--
Auke Kok <auke-jan.h.kok@intel.com>
Intel Pro(R) Ethernet Driver Group
LAN Access Division / Digital Enterprise Group
^ permalink raw reply
* Re: Fw: Bug: PPP dropouts in >=2.6.16
From: Sven Schuster @ 2006-04-26 6:04 UTC (permalink / raw)
To: Nuri Jawad; +Cc: Andi Kleen, Jesse Brandeburg, Andrew Morton, netdev
In-Reply-To: <Pine.LNX.4.64.0604260126020.12542@pc>
[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]
Hi,
On Wed, Apr 26, 2006 at 02:36:18AM +0200, Nuri Jawad told us:
> >no problems here with pppoe, kernel is 2.6.17-rc1-mm1, ppp 2.4.4-b1.
>
> Did you create a high load on the system in the manner I described?
> The bug once only appeared after about 6 hours here when line + CPU had
> been mostly idle. But that was the longest time between failures. Can you
> test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say
> for sure if CPU load is a factor, load on the connection seems to be.
well, machine is mostly idle beside downloads now and then or
software compilations (kernel mostly) or periodic mail fetching
including virus and spam scanning. This is my box at home (on
which I'm currently writing this email). I'm currently compiling
2.6.16.9 and will test with this release later on. I will get
some periodic ping running to check for connection failures and
put some load on the machine. Will come back with the results
later, but don't hold your breath waiting for me, kernel compile
takes more than two hours on my box :-)
Regards,
Sven
>
> Regards,
> Nuri
>
--
Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 i686 athlon i386 GNU/Linux
07:56:45 up 3 days, 11:30, 2 users, load average: 2.79, 1.32, 0.68
[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply
* [PATCH 3/3] Rough VJ Channel Implementation - vj_udp.patch
From: Kelly Daly @ 2006-04-26 11:47 UTC (permalink / raw)
To: netdev; +Cc: rusty, davem
Signed-off-by: Kelly Daly <kelly@au.ibm.com>
Hacked udp.c to receive directly to VJ Channel socket.
Breaks normal UDP - sockets don't speak non-VJ anymore!
----
diff -r 47031a1f466c linux-2.6.16/include/linux/udp.h
--- linux-2.6.16/include/linux/udp.h Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/include/linux/udp.h Mon Apr 24 19:50:46 2006
@@ -51,6 +51,8 @@
* when the socket is uncorked.
*/
__u16 len; /* total length of pending frames */
+ struct vj_channel *chan; /* VJ net channel */
+ int vj_reg_flag; /* is the vj channel registered */
};
static inline struct udp_sock *udp_sk(const struct sock *sk)
diff -r 47031a1f466c linux-2.6.16/net/ipv4/udp.c
--- linux-2.6.16/net/ipv4/udp.c Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/net/ipv4/udp.c Mon Apr 24 19:50:46 2006
@@ -1,3 +1,4 @@
+
/*
* INET An implementation of the TCP/IP protocol suite for the LINUX
* operating system. INET is implemented using the BSD Socket
@@ -89,6 +90,7 @@
#include <linux/igmp.h>
#include <linux/in.h>
#include <linux/errno.h>
+#include <linux/err.h>
#include <linux/timer.h>
#include <linux/mm.h>
#include <linux/config.h>
@@ -109,6 +111,7 @@
#include <net/inet_common.h>
#include <net/checksum.h>
#include <net/xfrm.h>
+#include <linux/vjchan.h>
/*
* Snmp MIB for the UDP layer
@@ -127,6 +130,7 @@
struct hlist_node *node;
struct sock *sk2;
struct inet_sock *inet = inet_sk(sk);
+ struct vj_flowid flowid;
write_lock_bh(&udp_hash_lock);
if (snum == 0) {
@@ -195,6 +199,17 @@
sk_add_node(sk, h);
sock_prot_inc_use(sk->sk_prot);
}
+
+ /* copied from udp_v4_lookup_longway */
+ flowid.saddr = inet->daddr;
+ flowid.daddr = inet->rcv_saddr;
+ flowid.sport = inet->dport;
+ flowid.dport = htons(inet->num);
+ flowid.ifindex = sk->sk_bound_dev_if;
+ flowid.proto = IPPROTO_UDP;
+ vj_register_chan(udp_sk(sk)->chan, &flowid);
+ udp_sk(sk)->vj_reg_flag = 1;
+
write_unlock_bh(&udp_hash_lock);
return 0;
@@ -771,18 +786,158 @@
__udp_checksum_complete(skb);
}
+static inline unsigned short int vj_udp_csum(struct vj_buffer *buffer)
+{
+ struct iphdr *ip = (struct iphdr *)(buffer->data + buffer->header_len);
+ int udpoff = buffer->header_len + (ip->ihl * 4);
+ struct udphdr *up = (struct udphdr *)(buffer->data + udpoff);
+
+ if (up->check == 0)
+ return 0;
+
+ return csum_tcpudp_magic(ip->saddr,
+ ip->daddr,
+ (buffer->data_len - (ip->ihl * 4)),
+ IPPROTO_UDP,
+ csum_partial((buffer->data + udpoff),
+ (buffer->data_len - (ip->ihl * 4)),
+ 0));
+}
+
+/*
+ * Is a socket 'connection oriented' ?
+ */
+static inline int connection_based(struct sock *sk)
+{
+ return sk->sk_type == SOCK_SEQPACKET || sk->sk_type == SOCK_STREAM;
+}
+
+/* returns 1 if if we need to keep waiting, <= 0 indicates stop waiting */
+static int wait_for_vj_buffer(struct sock *sk, long *timeo_p)
+{
+ int error;
+ wait_queue_head_t *wq = &udp_sk(sk)->chan->wq;
+ DEFINE_WAIT(wait);
+
+ prepare_to_wait(wq, &wait, TASK_INTERRUPTIBLE);
+ vj_inc_wakecnt(udp_sk(sk)->chan);
+
+ error = sock_error(sk);
+ if (error)
+ goto out;
+ if (vj_peek_next_buffer(udp_sk(sk)->chan)) {
+ error = 1;
+ goto out;
+ }
+ if (sk->sk_shutdown & RCV_SHUTDOWN) {
+ error = 0;
+ goto out;
+ }
+ if (connection_based(sk) && !(sk->sk_state == TCP_ESTABLISHED ||
+ sk->sk_state == TCP_LISTEN)) {
+ error = -ENOTCONN;
+ goto out;
+ }
+ if (signal_pending(current)) {
+ error = sock_intr_errno(*timeo_p);
+ goto out;
+ }
+
+ error = 1;
+
+ *timeo_p = schedule_timeout(*timeo_p);
+out:
+ finish_wait(wq, &wait);
+ return error;
+}
+
+/* almost a direct copy of skb_recv_datagram to get all req'd information while using a vj buffer instead of skb */
+struct vj_buffer *vj_recv_datagram(struct sock *sk, unsigned flags,
+ int noblock, int *err)
+{
+ struct vj_buffer *buffer;
+ long timeo;
+ *err = sock_error(sk);
+
+ if (*err)
+ return NULL;
+
+ timeo = sock_rcvtimeo(sk, noblock);
+ do {
+//we can just grab the buffer and return it seeing as either way will be a "peek". Then after we consume we can figure out if (flags & MSG_PEEK) and move to the next buffer at that time... we need to consume the buffer, write barrier before we move on to avoid a race condition.
+
+ buffer = vj_peek_next_buffer(udp_sk(sk)->chan);
+ if (buffer)
+ return buffer;
+
+ /* User doesn't want to wait */
+ *err = -EAGAIN;
+ if (!timeo) {
+ return NULL;
+ }
+ } while ((*err = wait_for_vj_buffer(sk, &timeo)) > 0);
+
+ return NULL;
+}
+
+static int vj_copy_datagram_iovec(struct vj_buffer *buffer, int offset,
+ struct iovec *to, int len)
+{
+// offset to be taken from buffer->header_len (which contains eth hdr + ip hdr)
+ if(memcpy_toiovec(to, buffer->data + offset, len))
+ return -EFAULT;
+ return 0;
+}
+
+/* FIXME: original code did timestamp in netif_rx */
+static __inline__ void vj_sock_recv_timestamp(struct msghdr *msg,
+ struct sock *sk)
+{
+ do_gettimeofday(&sk->sk_stamp);
+ put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP, sizeof(struct timeval), &sk->sk_stamp);
+}
+
+/* Returns offset in buffer past ip hdr, or 0 if something wrong. */
+static unsigned check_ip_packet(struct vj_buffer *buffer)
+{
+ struct iphdr *iph;
+
+ iph = (struct iphdr *)(buffer->data + buffer->header_len);
+
+ if (buffer->data_len < sizeof(*iph))
+ return 0;
+
+ if (iph->ihl < 5 || iph->version != 4)
+ return 0;
+
+ if (iph->ihl * 4 > ntohs(iph->tot_len)) //less than 0 data?
+ return 0;
+
+ if (ntohs(iph->tot_len) > buffer->data_len) { //truncated
+ return 0;
+ } else if (ntohs(iph->tot_len) < buffer->data_len) { //padded - trim it
+ buffer->data_len = ntohs(iph->tot_len);
+ }
+
+ if (ip_fast_csum((u8 *)iph, iph->ihl) != 0)
+ return 0;
+
+ return buffer->header_len + iph->ihl*4;
+}
+
/*
* This should be easy, if there is something there we
* return it, otherwise we block.
*/
-
static int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len, int noblock, int flags, int *addr_len)
{
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
- struct sk_buff *skb;
- int copied, err;
+ struct vj_buffer *buffer;
+ struct iphdr *ip;
+ struct udphdr *udph;
+ int copied, err, udpoff;
/*
* Check any passed addresses
@@ -794,63 +949,71 @@
return ip_recv_error(sk, msg, len);
try_again:
- skb = skb_recv_datagram(sk, flags, noblock, &err);
- if (!skb)
+ buffer = vj_recv_datagram(sk, flags, noblock, &err);
+ if (!buffer)
goto out;
-
- copied = skb->len - sizeof(struct udphdr);
+
+ ip = (struct iphdr *)(buffer->data + buffer->header_len);
+ udpoff = check_ip_packet(buffer);
+ if (udpoff == 0)
+ goto bad_packet;
+
+ udph = (struct udphdr *)(buffer->data + udpoff);
+
+ buffer->data_len = ntohs(ip->tot_len);
+
+ if (((ip->ihl * 4) + ntohs(udph->len)) > buffer->data_len)
+ goto bad_packet;
+ buffer->data_len = (ip->ihl * 4) + ntohs(udph->len);
+
+ copied = buffer->data_len - ((ip->ihl * 4) + sizeof(struct udphdr));
+
if (copied > len) {
copied = len;
msg->msg_flags |= MSG_TRUNC;
}
- if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else if (msg->msg_flags&MSG_TRUNC) {
- if (__udp_checksum_complete(skb))
- goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else {
- err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
-
- if (err == -EINVAL)
- goto csum_copy_err;
- }
-
- if (err)
- goto out_free;
-
- sock_recv_timestamp(msg, sk, skb);
+/* FIXME: if card is calculating csum, should be using that rather
+ * than calculating here */
+ if (vj_udp_csum(buffer) != 0) //bad checksum
+ goto bad_packet;
+
+ err = vj_copy_datagram_iovec(buffer, udpoff + sizeof(struct udphdr), msg->msg_iov, copied);
+
+ if (err) {
+ vj_done_with_buffer(udp_sk(sk)->chan);
+ return err;
+ }
+
+ vj_sock_recv_timestamp(msg, sk);
/* Copy the address. */
if (sin)
{
sin->sin_family = AF_INET;
- sin->sin_port = skb->h.uh->source;
- sin->sin_addr.s_addr = skb->nh.iph->saddr;
+ sin->sin_port = udph->source;
+ sin->sin_addr.s_addr = ip->saddr;
memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
}
+
+#if 0 /* FIXME: implement this! */
if (inet->cmsg_flags)
ip_cmsg_recv(msg, skb);
+#endif
err = copied;
if (flags & MSG_TRUNC)
- err = skb->len - sizeof(struct udphdr);
+ err = buffer->data_len - (ip->ihl * 4) - sizeof(struct udphdr);
+ if (!(flags & MSG_PEEK))
+ vj_done_with_buffer(udp_sk(sk)->chan);
-out_free:
- skb_free_datagram(sk, skb);
out:
return err;
-csum_copy_err:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
-
- skb_kill_datagram(sk, skb, flags);
-
- if (noblock)
- return -EAGAIN;
+bad_packet:
+ vj_done_with_buffer(udp_sk(sk)->chan);
+ if(noblock)
+ return -EAGAIN;
goto try_again;
}
@@ -858,10 +1021,15 @@
int udp_disconnect(struct sock *sk, int flags)
{
struct inet_sock *inet = inet_sk(sk);
+ struct udp_sock *up = udp_sk(sk);
/*
* 1003.1g - break association.
*/
-
+ if (up->vj_reg_flag) {
+ vj_unregister_chan(up->chan);
+ up->vj_reg_flag = 0;
+ }
+
sk->sk_state = TCP_CLOSE;
inet->daddr = 0;
inet->dport = 0;
@@ -879,6 +1047,14 @@
static void udp_close(struct sock *sk, long timeout)
{
+ struct udp_sock *up = udp_sk(sk);
+
+ if (up->vj_reg_flag) {
+ vj_unregister_chan(up->chan);
+ up->vj_reg_flag = 0;
+ }
+ vj_free_chan(up->chan);
+
sk_common_release(sk);
}
@@ -1293,6 +1469,46 @@
return 0;
}
+unsigned int vj_datagram_poll(struct file *file, struct socket *sock, poll_table *wait)
+{
+ struct sock *sk = sock->sk;
+ unsigned int mask;
+
+ poll_wait(file, &udp_sk(sk)->chan->wq, wait);
+ vj_inc_wakecnt(udp_sk(sk)->chan);
+
+ mask = 0;
+
+ /* exceptional events? */
+ if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+ mask |= POLLERR;
+ if (sk->sk_shutdown == SHUTDOWN_MASK)
+ mask |= POLLHUP;
+
+
+ /* readable? */
+ if (vj_peek_next_buffer(udp_sk(sk)->chan) ||
+ (sk->sk_shutdown & RCV_SHUTDOWN))
+ mask |= POLLIN | POLLRDNORM;
+
+ /* Connection-based need to check for termination and startup */
+ if (connection_based(sk)) {
+ if (sk->sk_state == TCP_CLOSE)
+ mask |= POLLHUP;
+ /* connection hasn't started yet? */
+ if (sk->sk_state == TCP_SYN_SENT)
+ return mask;
+ }
+
+ /* writable? */
+ if (sock_writeable(sk))
+ mask |= POLLOUT | POLLWRNORM | POLLWRBAND;
+ else
+ set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
+
+ return mask;
+}
+
/**
* udp_poll - wait for a UDP event.
* @file - file struct
@@ -1308,41 +1524,47 @@
*/
unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
{
- unsigned int mask = datagram_poll(file, sock, wait);
+ unsigned int mask = vj_datagram_poll(file, sock, wait);
struct sock *sk = sock->sk;
/* Check for false positives due to checksum errors */
if ( (mask & POLLRDNORM) &&
!(file->f_flags & O_NONBLOCK) &&
!(sk->sk_shutdown & RCV_SHUTDOWN)){
- struct sk_buff_head *rcvq = &sk->sk_receive_queue;
- struct sk_buff *skb;
-
- spin_lock_bh(&rcvq->lock);
- while ((skb = skb_peek(rcvq)) != NULL) {
- if (udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- __skb_unlink(skb, rcvq);
- kfree_skb(skb);
- } else {
- skb->ip_summed = CHECKSUM_UNNECESSARY;
+ struct vj_buffer *buffer;
+
+ while ((buffer = vj_peek_next_buffer(udp_sk(sk)->chan)) != NULL) {
+//test that this fixes the csum
+ check_ip_packet(buffer);
+ if (vj_udp_csum(buffer) == 0)
break;
- }
- }
- spin_unlock_bh(&rcvq->lock);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ vj_done_with_buffer(udp_sk(sk)->chan);
+ }
/* nothing to see, move along */
- if (skb == NULL)
+ if (buffer == NULL)
mask &= ~(POLLIN | POLLRDNORM);
}
return mask;
}
+
+static int udp_init(struct sock *sk)
+{
+ udp_sk(sk)->chan = vj_alloc_chan(0);
+ udp_sk(sk)->vj_reg_flag = 0;
+ if (!udp_sk(sk)->chan)
+ return -ENOMEM;
+ return 0;
+}
+
struct proto udp_prot = {
.name = "UDP",
.owner = THIS_MODULE,
+ .init = udp_init,
.close = udp_close,
.connect = ip4_datagram_connect,
.disconnect = udp_disconnect,
^ permalink raw reply
* [PATCH 2/3] Rough VJ Channel Implementation - vj_ne2k.patch
From: Kelly Daly @ 2006-04-26 11:47 UTC (permalink / raw)
To: netdev; +Cc: rusty, davem
Today 11:25:13
Signed-off-by: Kelly Daly <kelly@au.ibm.com>
Hacked NE2K driver using VJ Channels on receive.
Takes packet data and dumps it into a VJ buffer instead of skb.
Not implemented on transmit.
Useful for testing under QEMU.
-----
diff -r 47031a1f466c linux-2.6.16/drivers/net/8390.c
--- linux-2.6.16/drivers/net/8390.c Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/drivers/net/8390.c Mon Apr 24 19:50:46 2006
@@ -74,6 +74,8 @@
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
+
+#include <linux/vjchan.h>
#define NS8390_CORE
#include "8390.h"
@@ -718,31 +720,30 @@
}
else if ((pkt_stat & 0x0F) == ENRSR_RXOK)
{
- struct sk_buff *skb;
-
- skb = dev_alloc_skb(pkt_len+2);
- if (skb == NULL)
- {
+
+//NOT make skb - make a buffer!
+ struct vj_buffer *vjbuffer;
+ int desc_num;
+
+ vjbuffer = vj_get_buffer(&desc_num);
+ if (vjbuffer == NULL) {
+//fail
if (ei_debug > 1)
- printk(KERN_DEBUG "%s: Couldn't allocate a sk_buff of size %d.\n",
- dev->name, pkt_len);
+ printk(KERN_DEBUG "%s: Couldn't allocate a vj buffer.\n",
+ dev->name);
ei_local->stat.rx_dropped++;
break;
}
- else
- {
- skb_reserve(skb,2); /* IP headers on 16 byte boundaries */
- skb->dev = dev;
- skb_put(skb, pkt_len); /* Make room */
- ei_block_input(dev, pkt_len, skb, current_offset + sizeof(rx_frame));
- skb->protocol=eth_type_trans(skb,dev);
- netif_rx(skb);
- dev->last_rx = jiffies;
- ei_local->stat.rx_packets++;
- ei_local->stat.rx_bytes += pkt_len;
- if (pkt_stat & ENRSR_PHY)
- ei_local->stat.multicast++;
- }
+ vjbuffer->data_len = pkt_len;
+ vjbuffer->ifindex = dev->ifindex;
+ ei_block_input(dev, pkt_len, vjbuffer->data, current_offset + sizeof(rx_frame));
+ vj_netif_rx(vjbuffer, desc_num, eth_vj_type_trans(vjbuffer));
+ dev->last_rx = jiffies;
+ ei_local->stat.rx_packets++;
+ ei_local->stat.rx_bytes += pkt_len;
+ if (pkt_stat & ENRSR_PHY)
+ ei_local->stat.multicast++;
+
}
else
{
diff -r 47031a1f466c linux-2.6.16/drivers/net/8390.h
--- linux-2.6.16/drivers/net/8390.h Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/drivers/net/8390.h Mon Apr 24 19:50:46 2006
@@ -10,7 +10,6 @@
#include <linux/config.h>
#include <linux/if_ether.h>
#include <linux/ioport.h>
-#include <linux/skbuff.h>
#define TX_PAGES 12 /* Two Tx slots */
@@ -49,7 +48,7 @@
void (*reset_8390)(struct net_device *);
void (*get_8390_hdr)(struct net_device *, struct e8390_pkt_hdr *, int);
void (*block_output)(struct net_device *, int, const unsigned char *, int);
- void (*block_input)(struct net_device *, int, struct sk_buff *, int);
+ void (*block_input)(struct net_device *, int, char *, int);
unsigned long rmem_start;
unsigned long rmem_end;
void __iomem *mem;
diff -r 47031a1f466c linux-2.6.16/drivers/net/ne2k-pci.c
--- linux-2.6.16/drivers/net/ne2k-pci.c Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/drivers/net/ne2k-pci.c Mon Apr 24 19:50:46 2006
@@ -172,7 +172,7 @@
static void ne2k_pci_get_8390_hdr(struct net_device *dev, struct e8390_pkt_hdr *hdr,
int ring_page);
static void ne2k_pci_block_input(struct net_device *dev, int count,
- struct sk_buff *skb, int ring_offset);
+ char *data, int ring_offset);
static void ne2k_pci_block_output(struct net_device *dev, const int count,
const unsigned char *buf, const int start_page);
static struct ethtool_ops ne2k_pci_ethtool_ops;
@@ -503,10 +503,9 @@
the packet out through the "remote DMA" dataport using outb. */
static void ne2k_pci_block_input(struct net_device *dev, int count,
- struct sk_buff *skb, int ring_offset)
+ char *buf, int ring_offset)
{
long nic_base = dev->base_addr;
- char *buf = skb->data;
/* This *shouldn't* happen. If it does, it's the last thing you'll see */
if (ei_status.dmaing) {
^ permalink raw reply
* [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: Kelly Daly @ 2006-04-26 11:47 UTC (permalink / raw)
To: netdev; +Cc: rusty, davem
Hey guys... I've been working with Rusty on a VJ Channel implementation.
Noting Dave's recent release of his implementation, we thought we'd better
get this "out there" so we can do some early comparison/combining and
come up with the best possible implementation.
There are three patches in total:
1) vj_core.patch - core files for VJ to userspace
2) vj_udp.patch - badly hacked up UDP receive implementation - basically just to test what logic may be like!
3) vj_ne2k.patch - modified NE2K and 8390 used for testing on QEMU
Notes:
* channels can have global or local buffers (local for userspace. Could be used directly by intelligent NIC)
* UDP receive breaks real UDP - doesn't talk anything except VJ Channels anymore. Needs integration with normal sources.
* Userspace test app (below) uses VJ protocol family to mmap space for local buffers, if it receives buffers in kernel space sends a request for that buffer to be copied to local buffer.
* Default channel converts to skb and feeds through normal receive path.
TODO:
* send not yet implemented
* integrate non vj
* LOTS of fixmes
Cheers,
Kelly
Test userspace app:
/* Van Jacobson net channels implementation for Linux
Copyright (C) 2006 Kelly Daly <kdaly@au.ibm.com> IBM Corporation
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/mman.h>
#include <sys/poll.h>
#include <netinet/in.h>
#include "linux-2.6.16/include/linux/types.h"
#include "linux-2.6.16/include/linux/vjchan.h"
//flowid
#define SADDR 0
#define DADDR 0
#define SPORT 0
#define DPORT 60000
#define IFINDEX 0
#define PF_VJCHAN 27
static struct vj_buffer *get_buffer(struct vj_channel_ring *ring, int desc_num)
{
printf("desc_num %i\n", desc_num);
return (void *)ring + (desc_num + 1) * getpagesize();
}
/* return the next buffer, but do not move on */
static struct vj_buffer *vj_peek_next_buffer(struct vj_channel_ring *ring)
{
if (ring->c.head == ring->p.tail)
return NULL;
return get_buffer(ring, ring->q[ring->c.head]);
}
/* move on to next buffer */
static void vj_done_with_buffer(struct vj_channel_ring *ring)
{
ring->c.head = (ring->c.head+1)%VJ_NET_CHANNEL_ENTRIES;
printf("done_with_buffer\n\n");
}
int main(int argc, char *argv[])
{
int sk, cls, bnd, pll;
void * mmapped;
struct vj_flowid flowid;
struct vj_channel_ring *ring;
struct vj_buffer *buf;
struct pollfd pfd;
printf("\nstart of vjchannel socket test app\n");
sk = socket(PF_VJCHAN, SOCK_DGRAM, IPPROTO_UDP);
if (sk == -1) {
perror("Unable to open socket!");
return -1;
}
printf("socket open with ret code %i\n\n", sk);
//create flowid!!!
flowid.saddr = SADDR;
flowid.daddr = DADDR;
flowid.sport = SPORT;
flowid.dport = htons(DPORT);
flowid.ifindex = IFINDEX;
flowid.proto = IPPROTO_UDP;
printf("flowid created\n");
bnd = bind(sk, (struct sockaddr *)&flowid, sizeof(struct vj_flowid));
if (bnd == -1) {
perror("Unable to bind socket!");
return -1;
}
printf("socket bound with ret code %i\n\n", bnd);
ring = mmap(0, (getpagesize() * (VJ_NET_CHANNEL_ENTRIES+1)), PROT_READ|PROT_WRITE, MAP_SHARED, sk, 0);
if (ring == MAP_FAILED) {
perror ("Unable to mmap socket!");
return -1;
}
printf("socket mmapped to address %lu\n\n", (unsigned long)mmapped);
pfd.fd = sk;
pfd.events = POLLIN;
for (;;) {
pll = poll(&pfd, 1, -1);
if (pll < 0) {
perror("polling failed!");
return -1;
}
//consume
buf = vj_peek_next_buffer(ring);
printf("buf %p\n", buf);
//print data, not headers
printf(" Buffer Length = %i\n", buf->data_len);
printf(" Header Length = %i\n", buf->header_len);
printf(" Buffer Data: '%.*s'\n", buf->data_len - 28, buf->data + buf->header_len + 28);
vj_done_with_buffer(ring);
}
cls = close(sk);
if (cls != 0) {
perror("Unable to close socket!");
return -2;
}
printf("socket closed with ret code %i\n\n", cls);
return 0;
}
-------------------------
Signed-off-by: Kelly Daly <kelly@au.ibm.com>
Basic infrastructure for Van Jacobson net channels: lockless ringbuffer for buffer transport. Entries in ring buffer are descriptors for global or local buffers: ring and local buffers are mmapped into userspace.
Channels are registered with the core by flowid, and a thread services the default channel for any non-matching packets. Drivers get (global) buffers from vj_get_buffer, and dispatch them through vj_netif_rx.
As userspace mmap cannot reach global buffers, select() copies global buffers into local buffers if required.
diff -r 47031a1f466c linux-2.6.16/include/linux/socket.h
--- linux-2.6.16/include/linux/socket.h Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/include/linux/socket.h Mon Apr 24 19:50:46 2006
@@ -186,6 +187,7 @@
#define AF_PPPOX 24 /* PPPoX sockets */
#define AF_WANPIPE 25 /* Wanpipe API Sockets */
#define AF_LLC 26 /* Linux LLC */
+#define AF_VJCHAN 27 /* VJ Channel */
#define AF_TIPC 30 /* TIPC sockets */
#define AF_BLUETOOTH 31 /* Bluetooth sockets */
#define AF_MAX 32 /* For now.. */
@@ -219,7 +221,8 @@
#define PF_PPPOX AF_PPPOX
#define PF_WANPIPE AF_WANPIPE
#define PF_LLC AF_LLC
+#define PF_VJCHAN AF_VJCHAN
#define PF_TIPC AF_TIPC
#define PF_BLUETOOTH AF_BLUETOOTH
#define PF_MAX AF_MAX
diff -r 47031a1f466c linux-2.6.16/net/Kconfig
--- linux-2.6.16/net/Kconfig Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/net/Kconfig Mon Apr 24 19:50:46 2006
@@ -65,6 +65,12 @@
source "net/ipv6/Kconfig"
endif # if INET
+
+config VJCHAN
+ bool "Van Jacobson Net Channel Support (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ ---help---
+ This adds a userspace-accessible packet receive interface. Say N.
menuconfig NETFILTER
bool "Network packet filtering (replaces ipchains)"
diff -r 47031a1f466c linux-2.6.16/net/Makefile
--- linux-2.6.16/net/Makefile Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/net/Makefile Mon Apr 24 19:50:46 2006
@@ -46,6 +46,7 @@
obj-$(CONFIG_IP_SCTP) += sctp/
obj-$(CONFIG_IEEE80211) += ieee80211/
obj-$(CONFIG_TIPC) += tipc/
+obj-$(CONFIG_VJCHAN) += vjchan/
ifeq ($(CONFIG_NET),y)
obj-$(CONFIG_SYSCTL) += sysctl_net.o
diff -r 47031a1f466c linux-2.6.16/include/linux/vjchan.h
--- /dev/null Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/include/linux/vjchan.h Mon Apr 24 19:50:46 2006
@@ -0,0 +1,79 @@
+#ifndef _LINUX_VJCHAN_H
+#define _LINUX_VJCHAN_H
+
+/* num entries in channel q: set so consumer is at offset 1024. */
+#define VJ_NET_CHANNEL_ENTRIES 254
+/* identifies non-local buffers (ie. need kernel to copy to a local) */
+#define VJ_HIGH_BIT 0x80000000
+
+struct vj_producer {
+ __u16 tail; /* next element to add */
+ __u8 wakecnt; /* do wakeup if != consumer wakecnt */
+ __u8 pad;
+ __u16 old_head; /* last cleared buffer posn +1 */
+ __u16 pad2;
+};
+
+struct vj_consumer {
+ __u16 head; /* next element to remove */
+ __u8 wakecnt; /* increment to request wakeup */
+};
+
+/* mmap returns one of these, followed by 254 pages with a buffer each */
+struct vj_channel_ring {
+ struct vj_producer p; /* producer's header */
+ __u32 q[VJ_NET_CHANNEL_ENTRIES];
+ struct vj_consumer c; /* consumer's header */
+};
+
+struct vj_buffer {
+ __u32 data_len; /* length of actual data in buffer */
+ __u32 header_len; /* offset eth + ip header (true for now) */
+ __u32 ifindex; /* interface the packet came in on. */
+ char data[0];
+};
+
+/* Currently assumed IPv4 */
+struct vj_flowid
+{
+ __u32 saddr, daddr;
+ __u16 sport, dport;
+ __u32 ifindex;
+ __u16 proto;
+};
+
+#ifdef __KERNEL__
+struct net_device;
+struct sk_buff;
+
+struct vj_descriptor {
+ unsigned long address; /* address of net_channel_buffer */
+ unsigned long buffer_len; /* max length including header */
+};
+
+/* Everything about a vj_channel */
+struct vj_channel
+{
+ struct vj_channel_ring *ring;
+ wait_queue_head_t wq;
+ struct list_head list;
+ struct vj_flowid flowid;
+ int num_local_buffers;
+ struct vj_descriptor *descs;
+ unsigned long * used_descs;
+};
+
+void vj_inc_wakecnt(struct vj_channel *chan);
+struct vj_buffer *vj_get_buffer(int *desc_num);
+void vj_netif_rx(struct vj_buffer *buffer, int desc_num, unsigned short proto);
+int vj_xmit(struct sk_buff *skb, struct net_device *dev);
+struct vj_channel *vj_alloc_chan(int num_buffers);
+void vj_register_chan(struct vj_channel *chan, const struct vj_flowid *flowid);
+void vj_unregister_chan(struct vj_channel *chan);
+void vj_free_chan(struct vj_channel *chan);
+struct vj_buffer *vj_peek_next_buffer(struct vj_channel *chan);
+void vj_done_with_buffer(struct vj_channel *chan);
+unsigned short eth_vj_type_trans(struct vj_buffer *buffer);
+int vj_need_local_buffer(struct vj_channel *chan);
+#endif
+#endif /* _LINUX_VJCHAN_H */
diff -r 47031a1f466c linux-2.6.16/net/vjchan/Makefile
--- /dev/null Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/net/vjchan/Makefile Mon Apr 24 19:50:46 2006
@@ -0,0 +1,3 @@
+#obj-m += vjtest.o
+obj-y += vjnet.o
+obj-y += af_vjchan.o
diff -r 47031a1f466c linux-2.6.16/net/vjchan/af_vjchan.c
--- /dev/null Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/net/vjchan/af_vjchan.c Mon Apr 24 19:50:46 2006
@@ -0,0 +1,198 @@
+/* Van Jacobson net channels implementation for Linux
+ Copyright (C) 2006 Kelly Daly <kdaly@au.ibm.com> IBM Corporation
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+*/
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/socket.h>
+#include <linux/vjchan.h>
+#include <net/sock.h>
+
+struct vjchan_sock
+{
+ struct sock sk;
+ struct vj_channel *chan;
+ int vj_reg_flag;
+};
+
+static inline struct vjchan_sock *vj_sk(struct sock *sk)
+{
+ return (struct vjchan_sock *)sk;
+}
+
+static struct proto vjchan_proto = {
+ .name = "VJCHAN",
+ .owner = THIS_MODULE,
+ .obj_size = sizeof(struct vjchan_sock),
+};
+
+int vjchan_release(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+
+ sock_orphan(sk);
+ sock->sk = NULL;
+ sock_put(sk);
+ return 0;
+}
+
+int vjchan_bind(struct socket *sock, struct sockaddr *addr, int sockaddr_len)
+{
+ struct sock *sk = sock->sk;
+ struct vjchan_sock *vjsk;
+ struct vj_flowid *flowid = (struct vj_flowid *)addr;
+
+ /* FIXME: avoid clashing with normal sockets, replace zeroes. */
+ vjsk = vj_sk(sk);
+ vj_register_chan(vjsk->chan, flowid);
+ vjsk->vj_reg_flag = 1;
+
+ return 0;
+}
+
+int vjchan_getname(struct socket *sock, struct sockaddr *addr,
+ int *sockaddr_len, int peer)
+{
+ /* FIXME: Implement */
+ return 0;
+}
+
+unsigned int vjchan_poll(struct file *file, struct socket *sock,
+ struct poll_table_struct *wait)
+{
+ struct sock *sk = sock->sk;
+ struct vj_channel *chan = vj_sk(sk)->chan;
+
+ poll_wait(file, &chan->wq, wait);
+ vj_inc_wakecnt(chan);
+
+ if (vj_peek_next_buffer(chan) && vj_need_local_buffer(chan) == 0)
+ return POLLIN | POLLRDNORM;
+
+ return 0;
+}
+
+/* We map the ring first, then one page per buffer. */
+int vjchan_mmap(struct file *file, struct socket *sock,
+ struct vm_area_struct *vma)
+{
+ struct sock *sk = sock->sk;
+ struct vj_channel *chan = vj_sk(sk)->chan;
+ int i, vip;
+ unsigned long pos;
+
+ if (vma->vm_end - vma->vm_start !=
+ (1 + chan->num_local_buffers)*PAGE_SIZE)
+ return -EINVAL;
+
+ pos = vma->vm_start;
+ vip = vm_insert_page(vma, pos, virt_to_page(chan->ring));
+ pos += PAGE_SIZE;
+ for (i = 0; i < chan->num_local_buffers; i++) {
+ vip = vm_insert_page(vma, pos, virt_to_page(chan->descs[i].address));
+ pos += PAGE_SIZE;
+ }
+ return 0;
+}
+
+const struct proto_ops vjchan_ops = {
+ .family = PF_VJCHAN,
+ .owner = THIS_MODULE,
+ .release = vjchan_release,
+ .bind = vjchan_bind,
+ .socketpair = sock_no_socketpair,
+ .accept = sock_no_accept,
+ .getname = vjchan_getname,
+ .poll = vjchan_poll,
+ .ioctl = sock_no_ioctl,
+ .shutdown = sock_no_shutdown,
+ .setsockopt = sock_common_setsockopt,
+ .getsockopt = sock_common_getsockopt,
+ .sendmsg = sock_no_sendmsg,
+ .recvmsg = sock_no_recvmsg,
+ .mmap = vjchan_mmap,
+ .sendpage = sock_no_sendpage
+};
+
+static void vjchan_destruct(struct sock *sk)
+{
+ struct vjchan_sock *vjsk;
+
+ vjsk = vj_sk(sk);
+ if (vjsk->vj_reg_flag) {
+ vj_unregister_chan(vjsk->chan);
+ vjsk->vj_reg_flag = 0;
+ }
+ vj_free_chan(vjsk->chan);
+
+}
+
+static int vjchan_create(struct socket *sock, int protocol)
+{
+ struct sock *sk;
+ struct vjchan_sock *vjsk;
+ int err;
+
+ if (!capable(CAP_NET_RAW))
+ return -EPERM;
+ if (sock->type != SOCK_DGRAM
+ && sock->type != SOCK_RAW
+ && sock->type != SOCK_PACKET)
+ return -ESOCKTNOSUPPORT;
+
+ sock->state = SS_UNCONNECTED;
+
+ err = -ENOBUFS;
+ sk = sk_alloc(PF_VJCHAN, GFP_KERNEL, &vjchan_proto, 1);
+ if (sk == NULL)
+ goto out;
+
+ sock->ops = &vjchan_ops;
+
+ sock_init_data(sock, sk);
+ sk->sk_family = PF_VJCHAN;
+ sk->sk_destruct = vjchan_destruct;
+
+ vjsk = vj_sk(sk);
+ vjsk->chan = vj_alloc_chan(VJ_NET_CHANNEL_ENTRIES);
+ vjsk->vj_reg_flag = 0;
+ if (!vjsk->chan)
+ return -ENOMEM;
+ return 0;
+out:
+ return err;
+}
+
+static struct net_proto_family vjchan_family_ops = {
+ .family = PF_VJCHAN,
+ .create = vjchan_create,
+ .owner = THIS_MODULE,
+};
+
+static void __exit vjchan_exit(void)
+{
+ sock_unregister(PF_VJCHAN);
+}
+
+static int __init vjchan_init(void)
+{
+ return sock_register(&vjchan_family_ops);
+}
+
+module_init(vjchan_init);
+module_exit(vjchan_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NETPROTO(PF_VJCHAN);
diff -r 47031a1f466c linux-2.6.16/net/vjchan/vjnet.c
--- /dev/null Thu Mar 23 06:32:12 2006
+++ linux-2.6.16/net/vjchan/vjnet.c Mon Apr 24 19:50:46 2006
@@ -0,0 +1,550 @@
+/* Van Jacobson net channels implementation for Linux
+ Copyright (C) 2006 Kelly Daly <kdaly@au.ibm.com> IBM Corporation
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+*/
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+#include <linux/kthread.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/etherdevice.h>
+#include <linux/spinlock.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <linux/vjchan.h>
+
+#define BUFFER_DATA_LEN 2048
+#define NUM_GLOBAL_DESCRIPTORS 1024
+
+/* All our channels. FIXME: Lockless funky hash structure please... */
+static LIST_HEAD(channels);
+static spinlock_t chan_lock = SPIN_LOCK_UNLOCKED;
+
+/* Default channel, also holds global buffers (userspace-mapped
+ * channels have local buffers, which they prefer to use). */
+static struct vj_channel *default_chan;
+
+/* need to increment for wake in udp.c wait_for_vj_buffer */
+void vj_inc_wakecnt(struct vj_channel *chan)
+{
+ chan->ring->c.wakecnt++;
+ pr_debug("*** incremented wakecnt - should allow wake up\n");
+}
+EXPORT_SYMBOL(vj_inc_wakecnt);
+
+static int is_empty(struct vj_channel_ring *ring)
+{
+ if (ring->c.head == ring->p.tail)
+ return 1;
+ return 0;
+}
+
+static struct vj_buffer *get_buffer(unsigned int desc_num,
+ struct vj_channel *chan)
+{
+ struct vj_buffer *buf;
+
+ if ((desc_num & VJ_HIGH_BIT) || (chan->num_local_buffers == 0)) {
+ desc_num &= ~VJ_HIGH_BIT;
+ BUG_ON(desc_num >= default_chan->num_local_buffers);
+ buf = (struct vj_buffer*)default_chan->descs[desc_num].address;
+ } else {
+ BUG_ON(desc_num >= chan->num_local_buffers);
+ buf = (struct vj_buffer *)chan->descs[desc_num].address;
+ }
+
+ pr_debug(" received desc_num is %i\n", desc_num);
+ pr_debug("get_buffer %p (%s) %i: %p (len=%li ifind=%i hlen=%li) %#02X %#02X %#02X %#02X %#02X %#02X %#02X %#02X\n",
+ current, current->comm, desc_num, buf, buf->data_len, buf->ifindex, buf->header_len + (sizeof(struct iphdr *) * 4),
+ buf->data[0], buf->data[1], buf->data[2], buf->data[3], buf->data[4], buf->data[5], buf->data[6], buf->data[7]);
+
+ return buf;
+}
+
+static void release_buffer(struct vj_channel *chan, unsigned int descnum)
+{
+ if (descnum & VJ_HIGH_BIT) {
+ BUG_ON(test_bit(descnum & ~VJ_HIGH_BIT,
+ default_chan->used_descs) == 0);
+ clear_bit(descnum & ~VJ_HIGH_BIT, default_chan->used_descs);
+ } else {
+ BUG_ON(test_bit(descnum, chan->used_descs) == 0);
+ clear_bit(descnum, chan->used_descs);
+ }
+}
+
+/* Free all descriptors for the current channel between where we last
+ * freed to and where the consumer has not yet consumed. chan->c.head
+ * is not cleared because it may not have been consumed, therefore
+ * chan->p.old_head is not cleared. If chan->p.old_head ==
+ * chan->c.head then nothing more has been consumed since we last
+ * freed the descriptors.
+ *
+ * Because we're using local and global channels we need to select the
+ * bitmap according to the channel. Local channels may be pointing to
+ * local or global buffers, so we need to select the bitmap according
+ * to the buffer type */
+
+/* Free descriptors consumer has consumed since last free */
+static void free_descs_for_channel(struct vj_channel *chan)
+{
+ struct vj_channel_ring *ring = chan->ring;
+ int desc_num;
+
+ while (ring->p.old_head != ring->c.head) {
+ printk("ring->p.old_head %i, ring->c.head %i\n", ring->p.old_head, ring->c.head);
+ desc_num = ring->q[ring->p.old_head];
+
+ printk("desc_num %i\n", desc_num);
+
+ /* FIXME: Security concerns: make sure this descriptor
+ * really used by this vjchannel. Userspace could
+ * have changed it. */
+ release_buffer(chan, desc_num);
+ ring->p.old_head = (ring->p.old_head + 1) % VJ_NET_CHANNEL_ENTRIES;
+ printk("ring->p.old_head %i, ring->c.head %i\n\n", ring->p.old_head, ring->c.head);
+ }
+}
+
+/* return -1 if no descriptor found and none can be freed */
+static int get_free_descriptor(struct vj_channel *chan)
+{
+ int free_desc, bitval;
+
+ BUG_ON(chan->num_local_buffers == 0);
+ do {
+ free_desc = find_first_zero_bit(chan->used_descs,
+ chan->num_local_buffers);
+ pr_debug("free_desc = %i\n", free_desc);
+ if (free_desc >= chan->num_local_buffers) {
+ /* no descriptors, refresh bitmap and try again! */
+ free_descs_for_channel(chan);
+ free_desc = find_first_zero_bit(chan->used_descs,
+ chan->num_local_buffers);
+ if (free_desc >= chan->num_local_buffers)
+ /* still no descriptors */
+ return -1;
+ }
+ bitval = test_and_set_bit(free_desc, chan->used_descs);
+ pr_debug("bitval = %i\n", bitval);
+ } while (bitval == 1); //keep going until we get a FREE free bit!
+
+ /* We set high bit to indicate a global channel. */
+ if (chan == default_chan)
+ free_desc |= VJ_HIGH_BIT;
+ return free_desc;
+}
+
+/* This function puts a buffer into a local address space for a
+ * channel that is unable to use a kernel address space. If address
+ * high bit is set then the buffer is in kernel space - get a free
+ * local buffer and copy it across. Set local buf to used (done when
+ * finding free buffer), kernel buf to unused. */
+/* FIXME: Loop, do as many as possible at once. */
+int vj_need_local_buffer(struct vj_channel *chan)
+{
+ struct vj_channel_ring *ring = chan->ring;
+ u32 new_desc, k_desc;
+
+ k_desc = ring->q[ring->c.head];
+
+ if (ring->q[ring->c.head] & VJ_HIGH_BIT) {
+ struct vj_buffer *buf, *kbuf;
+
+ kbuf = get_buffer(k_desc, chan);
+ new_desc = get_free_descriptor(chan);
+ if (new_desc == -1)
+ return -ENOBUFS;
+ buf = get_buffer(new_desc, chan);
+ memcpy (buf, kbuf, sizeof(struct vj_buffer)
+ + kbuf->data_len + kbuf->header_len);
+/* clear the old descriptor and set q to new one */
+ k_desc &= ~VJ_HIGH_BIT;
+ clear_bit(k_desc, default_chan->used_descs);
+ ring->q[ring->c.head] = new_desc;
+ }
+ return 0;
+}
+EXPORT_SYMBOL(vj_need_local_buffer);
+
+struct vj_buffer *vj_get_buffer(int *desc_num)
+{
+ *desc_num = get_free_descriptor(default_chan);
+
+ if (*desc_num == -1) {
+ printk("no free bits!\n");
+ return NULL;
+ }
+
+ return get_buffer(*desc_num, default_chan);
+}
+EXPORT_SYMBOL(vj_get_buffer);
+
+static void enqueue_buffer(struct vj_channel *chan, struct vj_buffer *buffer, int desc_num)
+{
+ u16 tail, nxt;
+ int i;
+
+ pr_debug("*** in enqueue buffer\n");
+ pr_debug(" desc_num = %i\n", desc_num);
+ pr_debug(" Buffer Data Length = %lu\n", buffer->data_len);
+ pr_debug(" Buffer Header Length = %lu\n", buffer->header_len);
+ pr_debug(" Buffer Data:\n");
+ for (i = 0; i < buffer->data_len; i++) {
+ pr_debug("%i ", buffer->data[i]);
+ if (i % 20 == 0)
+ pr_debug("\n");
+ }
+ pr_debug("\n");
+
+ tail = chan->ring->p.tail;
+ nxt = (tail + 1) % VJ_NET_CHANNEL_ENTRIES;
+
+ pr_debug("nxt = %i and chan->c.head = %i\n", nxt, chan->ring->c.head);
+ if (nxt != chan->ring->c.head) {
+ chan->ring->q[tail] = desc_num;
+
+ smp_wmb();
+ chan->ring->p.tail=nxt;
+ pr_debug("chan->p.wakecnt = %i and chan->c.wakecnt = %i\n", chan->ring->p.wakecnt, chan->ring->c.wakecnt);
+ free_descs_for_channel(chan);
+ if (chan->ring->p.wakecnt != chan->ring->c.wakecnt) {
+ ++chan->ring->p.wakecnt;
+ /* consume whatever is available */
+ pr_debug("WAKE UP, CONSUMER!!!\n\n");
+ wake_up(&chan->wq);
+ }
+ } else //if can't add it to chan, may as well allow it to be reused
+ release_buffer(chan, desc_num);
+}
+
+/* FIXME: If we're going to do wildcards here, we need to do ordering between different partial matches... */
+static struct vj_channel *find_channel(u32 saddr, u32 daddr, u16 proto, u16 sport, u16 dport, u32 ifindex)
+{
+ struct vj_channel *i;
+
+ pr_debug("args saddr %u, daddr %u, sport %u, dport %u, ifindex %u, proto %u\n", saddr, daddr, sport, dport, ifindex, proto);
+
+ list_for_each_entry(i, &channels, list) {
+ pr_debug("saddr %u, daddr %u, sport %u, dport %u, ifindex %u, proto %u\n", i->flowid.saddr, i->flowid.daddr, i->flowid.sport, i->flowid.dport, i->flowid.ifindex, i->flowid.proto);
+
+ if ((!i->flowid.saddr || i->flowid.saddr == saddr) &&
+ (!i->flowid.daddr || i->flowid.daddr == daddr) &&
+ (!i->flowid.proto || i->flowid.proto == proto) &&
+ (!i->flowid.sport || i->flowid.sport == sport) &&
+ (!i->flowid.dport || i->flowid.dport == dport) &&
+ (!i->flowid.ifindex || i->flowid.ifindex == ifindex)) {
+ pr_debug("Found channel %p\n", i);
+ return i;
+ }
+ }
+ pr_debug("using default channel %p\n", default_chan);
+ return default_chan;
+}
+
+void vj_netif_rx(struct vj_buffer *buffer, int desc_num,
+ unsigned short proto)
+{
+ struct vj_channel *chan;
+ struct iphdr *ip;
+ int iphl, offset, real_data_len;
+ u16 *ports;
+ unsigned long flags;
+
+ offset = sizeof(struct iphdr) + sizeof(struct udphdr);
+ real_data_len = buffer->data_len - offset;
+
+
+ pr_debug("data_len = %lu, offset = %i, real data? = %i\n\n\n", buffer->data_len, offset, real_data_len);
+ /* this is always 18 when there's 18 or less characters in buffer->data */
+
+ pr_debug("rx) desc_num = %i\n\n", desc_num);
+
+ spin_lock_irqsave(&chan_lock, flags);
+ if (proto == __constant_htons(ETH_P_IP)) {
+
+ ip = (struct iphdr *)(buffer->data + buffer->header_len);
+ ports = (u16 *)(ip + 1);
+ iphl = ip->ihl * 4;
+
+ if ((buffer->data_len < (iphl + 4)) ||
+ (iphl != sizeof(struct iphdr))) {
+ pr_debug("Bad data, default chan\n");
+ pr_debug("buffer data_len = %li, header len = %li, ip->ihl = %i\n", buffer->data_len, buffer->header_len, ip->ihl);
+ chan = default_chan;
+ } else {
+ chan = find_channel(ip->saddr, ip->daddr,
+ ip->protocol, ports[0],
+ ports[1], buffer->ifindex);
+
+ }
+ } else
+ chan = default_chan;
+ enqueue_buffer(chan, buffer, desc_num);
+
+ spin_unlock_irqrestore(&chan_lock, flags);
+}
+EXPORT_SYMBOL(vj_netif_rx);
+
+/*
+ * Determine the packet's protocol ID. The rule here is that we
+ * assume 802.3 if the type field is short enough to be a length.
+ * This is normal practice and works for any 'now in use' protocol.
+ */
+
+unsigned short eth_vj_type_trans(struct vj_buffer *buffer)
+{
+ struct ethhdr *eth;
+ unsigned char *rawp;
+
+ eth = (struct ethhdr *)buffer->data;
+ buffer->header_len = ETH_HLEN;
+
+ BUG_ON(buffer->header_len > buffer->data_len);
+
+ buffer->data_len -= buffer->header_len;
+ if (ntohs(eth->h_proto) >= 1536)
+ return eth->h_proto;
+
+ rawp = buffer->data;
+
+ /*
+ * This is a magic hack to spot IPX packets. Older Novell breaks
+ * the protocol design and runs IPX over 802.3 without an 802.2 LLC
+ * layer. We look for FFFF which isn't a used 802.2 SSAP/DSAP. This
+ * won't work for fault tolerant netware but does for the rest.
+ */
+ if (*(unsigned short *)rawp == 0xFFFF)
+ return htons(ETH_P_802_3);
+
+ /*
+ * Real 802.2 LLC
+ */
+ return htons(ETH_P_802_2);
+}
+EXPORT_SYMBOL(eth_vj_type_trans);
+
+static void send_to_netif_rx(struct vj_buffer *buffer)
+{
+ struct sk_buff *skb;
+ struct net_device *dev;
+ int i;
+
+ dev = dev_get_by_index(buffer->ifindex);
+ if (!dev)
+ return;
+ skb = dev_alloc_skb(buffer->data_len + 2);
+ if (skb == NULL) {
+ dev_put(dev);
+ return;
+ }
+
+ skb_reserve(skb, 2);
+ skb->dev = dev;
+
+ skb_put(skb, buffer->data_len);
+ memcpy(skb->data, buffer->data, buffer->data_len);
+
+ pr_debug(" *** C buffer data_len = %lu and skb->len = %i\n", buffer->data_len, skb->len);
+ for (i = 0; i < 10; i++)
+ pr_debug("%i\n", skb->data[i]);
+
+ skb->protocol = eth_type_trans(skb, skb->dev);
+
+ netif_receive_skb(skb);
+}
+
+/* handles default_chan (buffers that nobody else wants) */
+static int default_thread(void *unused)
+{
+ int consumed = 0;
+ int woken = 0;
+ struct vj_buffer *buffer;
+ wait_queue_t wait;
+
+ /* When we get woken up, don't want to be removed from waitqueue! */
+//no more wait.task struct task_struct * task is now void *private
+ wait.private = current;
+ wait.func = default_wake_function;
+ INIT_LIST_HEAD(&wait.task_list);
+
+ add_wait_queue(&default_chan->wq, &wait);
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ while (!kthread_should_stop()) {
+ /* FIXME: if we do this before prepare_to_wait, avoids wmb */
+ default_chan->ring->c.wakecnt++;
+ smp_wmb();
+
+ while (!is_empty(default_chan->ring)) {
+ smp_read_barrier_depends();
+ buffer = get_buffer(default_chan->ring->q[default_chan->ring->c.head], default_chan);
+ pr_debug("calling send_to_netif_rx\n");
+ send_to_netif_rx(buffer);
+ smp_rmb();
+ default_chan->ring->c.head = (default_chan->ring->c.head+1)%VJ_NET_CHANNEL_ENTRIES;
+ consumed++;
+ }
+
+ schedule();
+ woken++;
+ set_current_state(TASK_INTERRUPTIBLE);
+ }
+ remove_wait_queue(&default_chan->wq, &wait);
+
+ __set_current_state(TASK_RUNNING);
+
+ pr_debug("consumer finished! consumed %i and woke %i\n", consumed, woken);
+ return 0;
+}
+
+/* return the next buffer, but do not move on */
+struct vj_buffer *vj_peek_next_buffer(struct vj_channel *chan)
+{
+ struct vj_channel_ring *ring = chan->ring;
+
+ if (is_empty(ring))
+ return NULL;
+ return get_buffer(ring->q[ring->c.head], chan);
+}
+EXPORT_SYMBOL(vj_peek_next_buffer);
+
+/* move on to next buffer */
+void vj_done_with_buffer(struct vj_channel *chan)
+{
+ struct vj_channel_ring *ring = chan->ring;
+
+ ring->c.head = (ring->c.head+1)%VJ_NET_CHANNEL_ENTRIES;
+
+ pr_debug("done_with_buffer\n\n");
+}
+EXPORT_SYMBOL(vj_done_with_buffer);
+
+struct vj_channel *vj_alloc_chan(int num_buffers)
+{
+ int i;
+ struct vj_channel *chan = kmalloc(sizeof(*chan), GFP_KERNEL);
+
+ if (!chan)
+ return NULL;
+
+ chan->ring = (void *)get_zeroed_page(GFP_KERNEL);
+ if (chan->ring == NULL)
+ goto free_chan;
+
+ init_waitqueue_head(&chan->wq);
+ chan->ring->p.tail = chan->ring->p.wakecnt = chan->ring->p.old_head = chan->ring->c.head = chan->ring->c.wakecnt = 0;
+
+ chan->num_local_buffers = num_buffers;
+ if (chan->num_local_buffers == 0)
+ return chan;
+
+ chan->used_descs = kzalloc(BITS_TO_LONGS(chan->num_local_buffers)
+ * sizeof(long), GFP_KERNEL);
+ if (chan->used_descs == NULL)
+ goto free_ring;
+ chan->descs = kmalloc(sizeof(*chan->descs)*num_buffers, GFP_KERNEL);
+ if (chan->descs == NULL)
+ goto free_used_descs;
+ for (i = 0; i < chan->num_local_buffers; i++) {
+ chan->descs[i].buffer_len = PAGE_SIZE;
+ chan->descs[i].address = get_zeroed_page(GFP_KERNEL);
+ if (chan->descs[i].address == 0)
+ goto free_descs;
+ }
+
+ return chan;
+
+free_descs:
+ for (--i; i >= 0; i--)
+ free_page(chan->descs[i].address);
+ kfree(chan->descs);
+free_used_descs:
+ kfree(chan->used_descs);
+free_ring:
+ free_page((unsigned long)chan->ring);
+free_chan:
+ kfree(chan);
+ return NULL;
+}
+EXPORT_SYMBOL(vj_alloc_chan);
+
+void vj_register_chan(struct vj_channel *chan, const struct vj_flowid *flowid)
+{
+ pr_debug("%p %s: registering channel %p\n",
+ current, current->comm, chan);
+ chan->flowid = *flowid;
+ spin_lock_irq(&chan_lock);
+ list_add(&chan->list, &channels);
+ spin_unlock_irq(&chan_lock);
+}
+EXPORT_SYMBOL(vj_register_chan);
+
+void vj_unregister_chan(struct vj_channel *chan)
+{
+ pr_debug("%p %s: unregistering channel %p\n",
+ current, current->comm, chan);
+ spin_lock_irq(&chan_lock);
+ list_del(&chan->list);
+ spin_unlock_irq(&chan_lock);
+}
+EXPORT_SYMBOL(vj_unregister_chan);
+
+void vj_free_chan(struct vj_channel *chan)
+{
+ pr_debug("%p %s: freeing channel %p\n",
+ current, current->comm, chan);
+ /* FIXME: Mark any buffer still in channel as free! */
+ kfree(chan);
+}
+EXPORT_SYMBOL(vj_free_chan);
+
+
+
+/* not using at the mo - working on rx, not tx */
+int vj_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ struct vj_buffer *buffer;
+ /* first element in dev priv data must be addr of net_channel */
+// struct net_channel *chan = *(struct net_channel **) netdev_priv(dev) + 1;
+ int desc_num;
+
+ buffer = vj_get_buffer(&desc_num);
+ buffer->data_len = skb->len;
+ memcpy(buffer->data, skb->data, buffer->data_len);
+// enqueue_buffer(chan, buffer, desc_num);
+
+ kfree(skb);
+ return 0;
+}
+EXPORT_SYMBOL(vj_xmit);
+
+static int __init init(void)
+{
+ default_chan = vj_alloc_chan(NUM_GLOBAL_DESCRIPTORS);
+ if (!default_chan)
+ return -ENOMEM;
+
+ kthread_run(default_thread, NULL, "kvj_net");
+ return 0;
+}
+
+module_init(init);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("VJ Channel Networking Module.");
+MODULE_AUTHOR("Kelly Daly <kelly@au1.ibm.com>");
^ permalink raw reply
* Re: Fw: Bug: PPP dropouts in >=2.6.16
From: Nuri Jawad @ 2006-04-26 0:36 UTC (permalink / raw)
To: Sven Schuster; +Cc: Andi Kleen, Jesse Brandeburg, Andrew Morton, netdev
In-Reply-To: <20060424074148.GB23340@zion.homelinux.com>
> no problems here with pppoe, kernel is 2.6.17-rc1-mm1, ppp 2.4.4-b1.
Did you create a high load on the system in the manner I described?
The bug once only appeared after about 6 hours here when line + CPU had
been mostly idle. But that was the longest time between failures. Can you
test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say
for sure if CPU load is a factor, load on the connection seems to be.
After using 2.6.15.7 for another 5 days now with some more stress
testing, I can assure that 2.6.15 definitely does not produce any
dropouts on this machine.
For now I'll try to reproduce the effects on my second box (AMD64/nf4).
I'd be happy if someone could give me some hints on which patches I could
try to revert as the changes to ppp between the two versions look fairly
harmless. For the first time in 8.5 years, I cannot use a 'stable' kernel
release and there is really nothing special about this system.
Regards,
Nuri
^ permalink raw reply
* sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1)
From: Guenther Thomsen @ 2006-04-26 0:06 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: John W. Linville, netdev
In-Reply-To: <20060417111846.5a5deccc@localhost.localdomain>
On Monday 17 April 2006 11:18, Stephen Hemminger wrote:
> I don't know what you are doing different, but my 2 port SysKonnect
> card is working fine. Running SMP AMD64 and 2.6.17 latest.
>
> Showing full speed on both ports.
I missed that e-mail, sorry.
I just gave it another try, this time with 2.6.16.11 . One port works
fine (so far, I just did very limited testing with ttcp). The second port
does negotiate IP address via DHCP, but the packgages it receives
seem to be garbled:
--8<--
0x0000: 0000 6175 6469 7428 3131 3435 3939 3430 ..audit(11459940
0x0010: 3031 2e39 3738 3a33 3829 3a20 7573 6572 01.978:38):.user
0x0020: 2070 6964 3d33 3230 3920 7569 643d .pid=3209.uid=
12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42
12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43
12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60:
0x0000: 0000 6175 6469 7428 3131 3435 3939 3436 ..audit(11459946
0x0010: 3031 2e33 3639 3a34 3729 3a20 7573 6572 01.369:47):.user
0x0020: 2070 6964 3d33 3239 3820 7569 643d .pid=3298.uid=
12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42
12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42
12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60:
0x0000: 0000 d675 0d00 0000 0000 0200 0000 0000 ...u............
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 ffff ffff 0000 0000 1300 0000 ..............
12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
[..]
13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 <nop,nop,timestamp[|tcp]>
13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) len=42
13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
-->8--
On a different host connected to the same switch, traffic looks more like:
--8<--
2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, length 48
12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a
12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b
12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60:
0x0000: 0001 1164 ee9b 0000 0000 0000 0000 0000 ...d............
0x0010: 0000 0000 0000 0000 0000 0000 2f6b 8c87 ............/k..
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c
12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d
12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff
12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
-->8--
I noticed that the interrupt count is very low too (the interrupt count
as shown in /proc/interrupts is much higher):
--8<--
[root@penguin1 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D8
inet addr:192.168.65.65 Bcast:192.168.65.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0
TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4680823977 (4.3 GiB) TX bytes:4332319475 (4.0 GiB)
Interrupt:169
eth1 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D9
inet addr:192.168.64.199 Bcast:192.168.64.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2193 errors:0 dropped:0 overruns:0 frame:0
TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:180137 (175.9 KiB) TX bytes:1856 (1.8 KiB)
Interrupt:169
-->8--
I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet
device was configured properly and I got some traffic through. Once
I started copying large files (some 5GB were successfully copied) over
NFS using a (very) fast NFS server though, traffic received by eth1 got
corrupted again:
--8<--
[root@penguin1 ~]# tcpdump -n -i eth1 -s 0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240
14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown (0x210d), length 98:
0x0000: 0000 6175 6469 7428 3131 3436 3030 3030 ..audit(11460000
0x0010: 3032 2e31 3836 3a36 3329 3a20 7573 6572 02.186:63):.user
0x0020: 2070 6964 3d33 3336 3120 7569 643d 3020 .pid=3361.uid=0.
0x0030: 6175 6964 3d34 3239 3439 3637 3239 3520 auid=4294967295.
0x0040: 6d73 673d 2750 414d 2073 6574 6372 6564 msg='PAM.setcred
0x0050: 3a20 7573 :.us
14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254
14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
-->8--
The ".audit ... PAM.sedcred" string is interesting. This is most likely
not traffic from the net, but a text inside the host's RAM. Did some
pointer get mangled?
I recompiled the kernel, now with RHFC4's gcc32. The result is similiar
(only after some data was copied using NFS, the second interface goes
bad):
--8<--
[root@penguin1 ~]# tcpdump -n -s 0 -i eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8801
15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199
15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254
15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8802
15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8803
15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199
15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254
15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8804
12 packets captured
12 packets received by filter
0 packets dropped by kernel
-->8--
No suspect text and no zero filled packets, only truncated ones now,
but that's bad enough to stop NFS and cause bad packet loss:
--8<--
64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms
64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms
64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms
64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms
64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms
64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms
64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms
64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms
64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms
--- 192.168.64.199 ping statistics ---
346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms
rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151
-->8--
Considering the recent NFS changes, I tried to get the system into this
state using just ttcp. With some determination, three more hosts and
a few million packets, I succeeded. This time eth0 truncated packets
and traffic slowed to a crawl (~1 good packet every 2s).
Some progress has been made, but it's not quite solid yet.
best regards
Guenther
^ permalink raw reply
* Re: [PATCH 2/5] sky2: add fake idle irq timer
From: Stephen Hemminger @ 2006-04-25 22:45 UTC (permalink / raw)
To: Francois Romieu; +Cc: Jeff Garzik, netdev
In-Reply-To: <20060425223900.GB18035@electric-eye.fr.zoreil.com>
On Wed, 26 Apr 2006 00:39:00 +0200
Francois Romieu <romieu@fr.zoreil.com> wrote:
> Stephen Hemminger <shemminger@osdl.org> :
> [...]
> > > Any objection against moving mod_timer() from sky2_poll() to sky2_idle()
> > > so as to keep poll() path unmodified ?
> > >
> >
> > If traffic is moving, then I want the timer to keep getting rescheduled
> > farther out.
>
> If my version of the driver is not stale, the timer will not be
> rescheduled when work_done >= work_limit.
I am trying to work around possible lost IRQ's, not netdev scheduler
screw up's. If workdone >= work_limit, then it will already be
called back later when it return's 1.
^ permalink raw reply
* Re: [PATCH 2/5] sky2: add fake idle irq timer
From: Francois Romieu @ 2006-04-25 22:39 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Jeff Garzik, netdev
In-Reply-To: <20060425143042.29d636a8@localhost.localdomain>
Stephen Hemminger <shemminger@osdl.org> :
[...]
> > Any objection against moving mod_timer() from sky2_poll() to sky2_idle()
> > so as to keep poll() path unmodified ?
> >
>
> If traffic is moving, then I want the timer to keep getting rescheduled
> farther out.
If my version of the driver is not stale, the timer will not be
rescheduled when work_done >= work_limit. I.e. the optimization
tends to vanish when the load goes up. Before this point is reached,
the long path of mod_timer() can be taken up to HZ per second.
I have no idea which balance works best.
--
Ueimor
^ permalink raw reply
* Re: [PATCH]: suspicious unlikely usage in tcp_transmit_skb()
From: Stephen Hemminger @ 2006-04-25 22:16 UTC (permalink / raw)
To: David S. Miller; +Cc: hzhong, netdev
In-Reply-To: <20060425.144649.129407913.davem@davemloft.net>
On Tue, 25 Apr 2006 14:46:49 -0700 (PDT)
"David S. Miller" <davem@davemloft.net> wrote:
> From: Stephen Hemminger <shemminger@osdl.org>
> Date: Tue, 25 Apr 2006 10:01:49 -0700
>
> > > # Hit # miss Function:Filename@Line
> > > ! 0 50505 tcp_transmit_skb():net/ipv4/tcp_output.c@468
> ...
> > How about just taking off the likely/unlikely in this case.
>
> Why remove it when we'll now get a 50505 to 0 hit rate?
Depends on the data stream, but I guess if we are seeing high loss
we really don't care about the CPU branch prediction.
^ permalink raw reply
* Re: [PATCH]: suspicious unlikely usage in tcp_transmit_skb()
From: David S. Miller @ 2006-04-25 21:46 UTC (permalink / raw)
To: shemminger; +Cc: hzhong, netdev
In-Reply-To: <20060425100149.636d6a1d@localhost.localdomain>
From: Stephen Hemminger <shemminger@osdl.org>
Date: Tue, 25 Apr 2006 10:01:49 -0700
> > # Hit # miss Function:Filename@Line
> > ! 0 50505 tcp_transmit_skb():net/ipv4/tcp_output.c@468
...
> How about just taking off the likely/unlikely in this case.
Why remove it when we'll now get a 50505 to 0 hit rate?
^ permalink raw reply
* Re: [PATCH 2/5] sky2: add fake idle irq timer
From: Stephen Hemminger @ 2006-04-25 21:30 UTC (permalink / raw)
To: Francois Romieu; +Cc: Jeff Garzik, netdev
In-Reply-To: <20060425212329.GA18035@electric-eye.fr.zoreil.com>
On Tue, 25 Apr 2006 23:23:29 +0200
Francois Romieu <romieu@fr.zoreil.com> wrote:
> Stephen Hemminger <shemminger@osdl.org> :
> [...]
> > --- sky2-2.6.17.orig/drivers/net/sky2.c 2006-04-25 10:48:47.000000000 -0700
> > +++ sky2-2.6.17/drivers/net/sky2.c 2006-04-25 10:53:32.000000000 -0700
> > @@ -2086,6 +2086,20 @@
> > }
> > }
> >
> > +/* If idle then force a fake soft NAPI poll once a second
> > + * to work around cases where sharing an edge triggered interrupt.
> > + */
> > +static void sky2_idle(unsigned long arg)
> > +{
> > + struct net_device *dev = (struct net_device *) arg;
> > +
> > + local_irq_disable();
> > + if (__netif_rx_schedule_prep(dev))
> > + __netif_rx_schedule(dev);
> > + local_irq_enable();
> > +}
> > +
> > +
> > static int sky2_poll(struct net_device *dev0, int *budget)
> > {
> > struct sky2_hw *hw = ((struct sky2_port *) netdev_priv(dev0))->hw;
> > @@ -2134,6 +2148,8 @@
> > sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
> > }
> >
> > + mod_timer(&hw->idle_timer, jiffies + HZ);
> > +
> > local_irq_disable();
> > __netif_rx_complete(dev0);
>
>
> Any objection against moving mod_timer() from sky2_poll() to sky2_idle()
> so as to keep poll() path unmodified ?
>
If traffic is moving, then I want the timer to keep getting rescheduled
farther out.
^ permalink raw reply
* Re: [PATCH 2/5] sky2: add fake idle irq timer
From: Francois Romieu @ 2006-04-25 21:23 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Jeff Garzik, netdev
In-Reply-To: <20060425175951.444629000@localhost.localdomain>
Stephen Hemminger <shemminger@osdl.org> :
[...]
> --- sky2-2.6.17.orig/drivers/net/sky2.c 2006-04-25 10:48:47.000000000 -0700
> +++ sky2-2.6.17/drivers/net/sky2.c 2006-04-25 10:53:32.000000000 -0700
> @@ -2086,6 +2086,20 @@
> }
> }
>
> +/* If idle then force a fake soft NAPI poll once a second
> + * to work around cases where sharing an edge triggered interrupt.
> + */
> +static void sky2_idle(unsigned long arg)
> +{
> + struct net_device *dev = (struct net_device *) arg;
> +
> + local_irq_disable();
> + if (__netif_rx_schedule_prep(dev))
> + __netif_rx_schedule(dev);
> + local_irq_enable();
> +}
> +
> +
> static int sky2_poll(struct net_device *dev0, int *budget)
> {
> struct sky2_hw *hw = ((struct sky2_port *) netdev_priv(dev0))->hw;
> @@ -2134,6 +2148,8 @@
> sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
> }
>
> + mod_timer(&hw->idle_timer, jiffies + HZ);
> +
> local_irq_disable();
> __netif_rx_complete(dev0);
Any objection against moving mod_timer() from sky2_poll() to sky2_idle()
so as to keep poll() path unmodified ?
--
Ueimor
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox