* Re: problem with flowi structure
From: David Miller @ 2010-10-02 20:18 UTC (permalink / raw)
To: nicola.padovano
Cc: jengelh, eric.dumazet, aijazbaig1, netfilter-devel, netdev
In-Reply-To: <AANLkTinObV+176prSdCL0RoVx7a74quUM5H58xR2_WLG@mail.gmail.com>
From: Nicola Padovano <nicola.padovano@gmail.com>
Date: Sat, 2 Oct 2010 10:08:56 +0200
> i.e. like a wildcard?
Please do not top-post, because when you top-post people need
to scroll down and read the context you're replying to, then
scroll back up to read your reply.
If, instead of top-posting, you reply after the context you're
replying to, people need to do less work to read your email.
Thanks.
^ permalink raw reply
* Re: [PATCH 1/2] net: Fix the condition passed to sk_wait_event()
From: David Miller @ 2010-10-02 20:26 UTC (permalink / raw)
To: tomer_iisc; +Cc: netdev, linux-kernel
In-Reply-To: <590816.4455.qm@web53707.mail.re2.yahoo.com>
Your patch is still corrupted, your email client is splitting up
long lines.
Please, save us a lot of time by test emailing the patch to yourself,
and then trying to apply the patch as you receive it. Do this until
you're fixed all of the formatting problems and then you can send it
here.
Do not resend the patch by simply replying again to this thread,
send a fresh posting so that "Re: " doesn't show up in the subject
and this way I can apply it directly without any editing.
Thank you.
^ permalink raw reply
* Re: [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event()
From: David Miller @ 2010-10-02 20:27 UTC (permalink / raw)
To: tomer_iisc; +Cc: netdev, linux-kernel
In-Reply-To: <667106.4951.qm@web53706.mail.re2.yahoo.com>
From: Nagendra Tomar <tomer_iisc@yahoo.com>
Date: Sat, 2 Oct 2010 01:22:16 -0700 (PDT)
> Resending ...
It's still corrupted, see my other reply.
^ permalink raw reply
* Re: [PATCH 8/8] net: Implement socketat.
From: Daniel Lezcano @ 2010-10-02 21:13 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: hadi, Eric W. Biederman, linux-kernel, Linux Containers, netdev,
netfilter-devel, linux-fsdevel, Linus Torvalds, Michael Kerrisk,
Ulrich Drepper, Al Viro, David Miller, Serge E. Hallyn,
Pavel Emelyanov, Ben Greear, Matt Helsley, Jonathan Corbet,
Sukadev Bhattiprolu, Jan Engelhardt, Patrick McHardy
In-Reply-To: <4C9B3F9C.8080506@parallels.com>
On 09/23/2010 01:53 PM, Pavel Emelyanov wrote:
> On 09/23/2010 03:40 PM, jamal wrote:
>
>> On Thu, 2010-09-23 at 15:33 +0400, Pavel Emelyanov wrote:
>>
>>
>>> This particular usecase is unneeded once you have the "enter" ability.
>>>
>> Is that cheaper from a syscall count/cost?
>>
> Why does it matter? You told, that the usage scenario was to
> add routes to container. If I do 2 syscalls instead of 1, is
> it THAT worse?
>
>
>> i.e do I have to enter every time i want to write/read this fd?
>>
> No - you enter once, create a socket and do whatever you need
> withing the enterned namespace.
>
Just to clarify this point. You enter the namespace, create the socket
and go back to the initial namespace (or create a new one). Further
operations can be made against this fd because it is the network
namespace stored in the sock struct which is used, not the current
process network namespace which is used at the socket creation only.
We can actually already do that by unsharing and then create a socket.
This socket will pin the namespace and can be used as a control socket
for the namespace (assuming the socket domain will be ok for all the
operations).
Jamal, I don't know what kind of application you want to use but if I
assume you want to create a process controlling 1024 netns, let's try to
identificate what happen with setns and with socketat :
With setns:
* open /proc/self/ns/net (1)
* unshare the netns
* open /proc/self/ns/net (2)
* setns (1)
* create a virtual network device
* move the virtual device to (2) (using the set netns by fd)
* unshare the netns
...
With socketat:
* open a socket (1)
* unshare the netns
* open a netlink with socketat(1) => (2)
* create a virtual device using (2) (at this point it is init_net_ns)
* move the virtual device to the current netns (using the set netns
by pid)
* open a socket (3)
* unshare the netns
...
We have the same number of file descriptors kept opened. Except, with
setns we can bind mount the directory somewhere, that will pin the
namespace and then we can close the /proc/self/ns/net file descriptors
and reopen them later.
If your application has to do a lot of specific network processing,
during its life cycle, in different namespaces, the socketat syscall
will be better because it will reduce the number of syscalls but at the
cost of keeping the file descriptors opened (potentially a big number).
Otherwise, setns should fit your needs.
>> How does poll/select work in that enter scenario?
>>
> Just like it used to before the enter.
>
>
>> cheers,
>> jamal
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply
* [PATCH 1/2] net: Fix the condition passed to sk_wait_event()
From: Nagendra Tomar @ 2010-10-02 23:49 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, davem
This patch fixes the sk_wait_event() condition in the sk_stream_wait_connect()
function. With this change, we correctly check for the TCPF_ESTABLISHED and
TCPF_CLOSE_WAIT states and avoid potentially returning success when there
might be an error on the socket.
Signed-off-by: Nagendra Singh Tomar <tomer_iisc@xxxxxxxxx>
---
--- linux-2.6.35.7/net/core/stream.c.orig 2010-03-24 09:30:00.000000000 +0530
+++ linux-2.6.35.7/net/core/stream.c 2010-03-24 09:30:17.000000000 +0530
@@ -73,9 +73,8 @@ int sk_stream_wait_connect(struct sock *
prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
sk->sk_write_pending++;
done = sk_wait_event(sk, timeo_p,
- !sk->sk_err &&
- !((1 << sk->sk_state) &
- ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
+ ((1 << sk->sk_state) &
+ (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
finish_wait(sk_sleep(sk), &wait);
sk->sk_write_pending--;
} while (!done);
---
^ permalink raw reply
* [PATCH 2/2] net: Fix the condition passed to sk_wait_event()
From: Nagendra Tomar @ 2010-10-02 23:51 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, davem
This patch fixes the condition (3rd arg) passed to sk_wait_event() in
sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
causes the following soft lockup in tcp_sendmsg() when the global tcp
memory pool has exhausted.
>>> snip <<<
localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel:
localhost kernel: Call Trace:
localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83
>>> snip <<<
What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout().
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.
Signed-off-by: Nagendra Singh Tomar <tomer_iisc@xxxxxxxxx>
---
--- linux-2.6.35.7/net/core/stream.c.orig 2010-03-24 09:31:00.000000000 +0530
+++ linux-2.6.35.7/net/core/stream.c 2010-03-24 09:31:08.000000000 +0530
@@ -143,10 +143,9 @@ int sk_stream_wait_memory(struct sock *s
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk->sk_write_pending++;
- sk_wait_event(sk, ¤t_timeo, !sk->sk_err &&
- !(sk->sk_shutdown & SEND_SHUTDOWN) &&
- sk_stream_memory_free(sk) &&
- vm_wait);
+ sk_wait_event(sk, ¤t_timeo, sk->sk_err ||
+ (sk->sk_shutdown & SEND_SHUTDOWN) ||
+ (sk_stream_memory_free(sk) && !vm_wait));
sk->sk_write_pending--;
if (vm_wait) {
---
^ permalink raw reply
* Re: [PATCH 1/2] net: Fix the condition passed to sk_wait_event()
From: Nagendra Tomar @ 2010-10-02 23:54 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20101002.132659.71124680.davem@davemloft.net>
Dave,
I had done the exercise of sending the patch to myself and applying it (copy-pasting just the patch). One thing that I see is the long line in the description. If you are referring to that, I've fixed it and submitted it again. If not this, I'm at loss.
Thanks,
Tomar
--- On Sun, 3/10/10, David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Subject: Re: [PATCH 1/2] net: Fix the condition passed to sk_wait_event()
> To: tomer_iisc@yahoo.com
> Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
> Date: Sunday, 3 October, 2010, 1:56
>
> Your patch is still corrupted, your email client is
> splitting up
> long lines.
>
> Please, save us a lot of time by test emailing the patch to
> yourself,
> and then trying to apply the patch as you receive it.
> Do this until
> you're fixed all of the formatting problems and then you
> can send it
> here.
>
> Do not resend the patch by simply replying again to this
> thread,
> send a fresh posting so that "Re: " doesn't show up in the
> subject
> and this way I can apply it directly without any editing.
>
> Thank you.
>
^ permalink raw reply
* Re: [PATCH 1/2] net: Fix the condition passed to sk_wait_event()
From: David Miller @ 2010-10-03 0:06 UTC (permalink / raw)
To: tomer_iisc; +Cc: netdev, linux-kernel
In-Reply-To: <970819.25375.qm@web53706.mail.re2.yahoo.com>
From: Nagendra Tomar <tomer_iisc@yahoo.com>
Date: Sat, 2 Oct 2010 16:54:23 -0700 (PDT)
> I had done the exercise of sending the patch to myself and
> applying it (copy-pasting just the patch). One thing that I see
> is the long line in the description. If you are referring to
> that, I've fixed it and submitted it again. If not this, I'm at
> loss.
This new submission looks good, thank you.
^ permalink raw reply
* Re: [PATCH 1/2] net: Fix the condition passed to sk_wait_event()
From: Nagendra Tomar @ 2010-10-03 1:30 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, davem
Dave,
Thinking more about it, we need to check for sk->sk_err, thus the
existing code behaves fine. Just that we might incur an additional sleep
even while we know that the socket already has an error, but that should
be ok.
We only need the other patch. Pls ignore this, and sorry for the confusion.
Thanks,
Tomar
--- On Sun, 3/10/10, Nagendra Tomar <tomer_iisc@yahoo.com> wrote:
> From: Nagendra Tomar <tomer_iisc@yahoo.com>
> Subject: [PATCH 1/2] net: Fix the condition passed to sk_wait_event()
> To: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org, davem@davemloft.net
> Date: Sunday, 3 October, 2010, 5:19
> This patch fixes the sk_wait_event()
> condition in the sk_stream_wait_connect()
> function. With this change, we correctly check for the
> TCPF_ESTABLISHED and
> TCPF_CLOSE_WAIT states and avoid potentially returning
> success when there
> might be an error on the socket.
>
> Signed-off-by: Nagendra Singh Tomar
> <tomer_iisc@xxxxxxxxx>
> ---
> --- linux-2.6.35.7/net/core/stream.c.orig
> 2010-03-24 09:30:00.000000000 +0530
> +++ linux-2.6.35.7/net/core/stream.c
> 2010-03-24 09:30:17.000000000 +0530
> @@ -73,9 +73,8 @@ int sk_stream_wait_connect(struct sock *
>
> prepare_to_wait(sk_sleep(sk), &wait,
> TASK_INTERRUPTIBLE);
>
> sk->sk_write_pending++;
> done =
> sk_wait_event(sk, timeo_p,
> -
> !sk->sk_err
> &&
> -
> !((1 <<
> sk->sk_state) &
> -
>
> ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
> +
> ((1 <<
> sk->sk_state) &
> +
>
> (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
>
> finish_wait(sk_sleep(sk), &wait);
>
> sk->sk_write_pending--;
> } while (!done);
>
> ---
>
>
>
>
>
^ permalink raw reply
* Re: Ask For Comment: add routines for exchanging data between sock buffer and scatter list
From: Hillf Danton @ 2010-10-03 3:10 UTC (permalink / raw)
To: David Miller, netdev; +Cc: linux-kernel, axboe, robert.w.love, James.Bottomley
In-Reply-To: <20101002.132056.226767861.davem@davemloft.net>
There seems no routines provided for exchanging data
directly between sock buffer and scatter list in
both scatterlist.c and skbuff.c, so comes this work.
And it is hard to determine into which file these
routines should be added, then a header file is added.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
---
diff -Npur o/linux-2.6.36-rc4/include/skb_sg.h
m/linux-2.6.36-rc4/include/skb_sg.h
--- o/linux-2.6.36-rc4/include/skb_sg.h 1970-01-01 08:00:00.000000000 +0800
+++ m/linux-2.6.36-rc4/include/skb_sg.h 2010-10-03 10:03:54.000000000 +0800
@@ -0,0 +1,113 @@
+/*
+ Definition for exchanging data between sock buffer and scatter list
+
+ Copyright (C) Oct 2010 Hillf Danton <dhillf@gmail.com>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ MA 02110-1301 USA
+*/
+
+#ifndef __LINUX_SKB_SG_H
+#define __LINUX_SKB_SG_H
+
+#include <linux/scatterlist.h>
+#include <linux/skbuff.h>
+
+/*
+ * sg_fill_skb_page_desc - fill skb frags with info in sg list
+ * @index the start index to fill
+ *
+ * return the number of filled frags
+ */
+
+static int sg_fill_skb_page_desc(struct sk_buff *skb, int index,
+ struct scatterlist *sg)
+{
+ int old = index;
+ struct page *page;
+
+ for (; sg && index < MAX_SKB_FRAGS; index++) {
+ page = sg_page(sg);
+ get_page(page);
+ skb_add_rx_frag(skb, index, page, sg->offset, sg->length);
+ sg = sg_next(sg);
+ }
+
+ return index - old;
+}
+
+/*
+ * skb_copy_bits_to_sg - copy data from skb to sg list
+ * @len length of data to be copied
+ *
+ * return the number of copied bytes
+ */
+
+static int skb_copy_bits_to_sg(struct sk_buff *skb, int offset_in_skb,
+ struct scatterlist *sg, int offset_in_sg,
+ int len)
+{
+ int old = len;
+ struct sg_mapping_iter miter;
+
+ if (offset_in_skb >= skb->len)
+ return 0;
+
+ if (len > skb->len - offset_in_skb)
+ old = len = skb->len - offset_in_skb;
+
+ /* skip offset in sg */
+ while (sg && offset_in_sg >= sg->length) {
+ offset_in_sg -= sg->length;
+ sg = sg_next(sg);
+ }
+ if (! sg)
+ return 0;
+
+ /* and go thru sg list */
+ while (len > 0 && sg) {
+ int this_len;
+ int err;
+
+ sg_miter_start(&miter, sg, 1, SG_MITER_ATOMIC|SG_MITER_TO_SG);
+
+ if (offset_in_sg) {
+ /* we have to count this residual */
+ miter.__offset = offset_in_sg;
+ offset_in_sg = 0;
+ }
+
+ if (! sg_miter_next(&miter))
+ break;
+
+ this_len = min(miter.length, len);
+
+ err = skb_copy_bits(skb, offset_in_skb, miter.addr, this_len);
+
+ sg_miter_stop(&miter);
+
+ if (err)
+ break;
+
+ offset_in_skb += this_len;
+ len -= this_len;
+
+ sg = sg_next(sg);
+ }
+
+ return old - len;
+}
+
+#endif /* __LINUX_SKB_SG_H */
^ permalink raw reply
* Re: [PATCH 1/2] net: Fix the condition passed to sk_wait_event()
From: Eric Dumazet @ 2010-10-03 4:53 UTC (permalink / raw)
To: David Miller; +Cc: tomer_iisc, netdev, linux-kernel
In-Reply-To: <20101002.170654.193719563.davem@davemloft.net>
Le samedi 02 octobre 2010 à 17:06 -0700, David Miller a écrit :
> From: Nagendra Tomar <tomer_iisc@yahoo.com>
> Date: Sat, 2 Oct 2010 16:54:23 -0700 (PDT)
>
> > I had done the exercise of sending the patch to myself and
> > applying it (copy-pasting just the patch). One thing that I see
> > is the long line in the description. If you are referring to
> > that, I've fixed it and submitted it again. If not this, I'm at
> > loss.
>
> This new submission looks good, thank you.
Yes, but the email address in the "Signed-odd-by: ...." is mangled :(
^ permalink raw reply
* Re: [patch v3 00/12] IPVS: SIP Persistence Engine
From: Simon Horman @ 2010-10-03 5:07 UTC (permalink / raw)
To: Julian Anastasov
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <alpine.LFD.2.00.1010021155520.2055@ja.ssi.bg>
On Sat, Oct 02, 2010 at 12:00:14PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Sat, 2 Oct 2010, Simon Horman wrote:
>
> >This patch series adds load-balancing of UDP SIP based on Call-ID to
> >IPVS as well as a frame-work for extending IPVS to handle alternate
> >persistence requirements.
> >
> >REVISIONS
> >
> >This is v3 of the patch series with addresses serveral problems
> >raised by Julian Anastasov on the lvs-devel mailing list.
>
> No other obvious problems in v3 1-12. So,
> Acked-by: Julian Anastasov <ja@ssi.bg>
Thanks.
> May be next days I'll test my changes on top of PE v3
I think that could save a bit of porting work later
and perhaps you might notice some more bugs too :-)
^ permalink raw reply
* [PROBLEM] linux-2.6.36-rc5 crash with gianfar ethernet at full line rate traffic
From: emin ak @ 2010-10-03 6:20 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Kumar Gala
In-Reply-To: <AANLkTi=Kvi3u5bRp5DtRH-Pr6ALew60cPgeVEZ8V-Dnu@mail.gmail.com>
Hi all,
My problem is kernel crash under full line rate random packet length
ip network traffic.
I'am using default unmodified kernel and default SMP kernel
configuration, MPC8572DS development board and also using a hardware
packet generator.
My test is ip forwarding between eth0 and eth1, and Hardware packet
generator produces full duplex, full line rate traffic with random
packet length and random payload . After a few millions of packets
passed, kernel produces this bellow two different crash messages . I
have retry this scenario many times, crash occurs sometimes on
skb_put, but mostly occurs on ip_rcv function. I have aplied same
test to latest stable linux 2.6.35.6 kernel. Same errors produced.
Any comment and help are appreciated.
Here is crash logs:
Thanks.
Emin
First type of crash:
root@mpc8572ds:~# skb_over_panic: text:c0226280 len:1171 put:1171
head:eed6d000 data:eed63040 tail:0xeed6d4d3 end:0xeed63660 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:127!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2 MPC8572 DS
last sysfs file: /sys/devices/pci0002:03/0002:03:00.0/subsystem_device
Modules linked in:
NIP: c023bdcc LR: c023bdcc CTR: c01f3ff8
REGS: effe7d70 TRAP: 0700 Not tainted (2.6.36-rc5)
MSR: 00029000 <EE,ME,CE> CR: 22028024 XER: 20000000
TASK = ef83e9a0[9] 'ksoftirqd/1' THREAD: ef856000 CPU: 1
GPR00: c023bdcc effe7e20 ef83e9a0 0000007c 00021000 ffffffff c01f7b98 c03ccf1c
GPR08: c03c69d4 c03f94b4 00c4e000 00000004 20028048 1001a108 ef211000 efb52d90
GPR16: efb52e38 efb52870 00000000 ef211800 00000008 00000009 efb52800 00000037
GPR24: ef24e180 ef2be040 00000000 ef211948 efb52b80 00000493 ef015940 ef386600
NIP [c023bdcc] skb_put+0x8c/0x94
LR [c023bdcc] skb_put+0x8c/0x94
Call Trace:
[effe7e20] [c023bdcc] skb_put+0x8c/0x94 (unreliable)
[effe7e30] [c0226280] gfar_clean_rx_ring+0x104/0x4b8
[effe7e90] [c02269dc] gfar_poll+0x3a8/0x60c
[effe7f60] [c024928c] net_rx_action+0xf8/0x1a4
[effe7fa0] [c0042524] __do_softirq+0xe0/0x178
[effe7ff0] [c000e59c] call_do_softirq+0x14/0x24
[ef857f50] [c0004840] do_softirq+0x90/0xa0
[ef857f70] [c00430e4] run_ksoftirqd+0xb4/0x164
[ef857fb0] [c00586b4] kthread+0x7c/0x80
[ef857ff0] [c000e9a8] kernel_thread+0x4c/0x68
Instruction dump:
81030098 2f800000 409e000c 3d20c037 3809a19c 3c60c037 7c8802a6 7d695b78
3863b010 90010008 4cc63182 4be016c5 <0fe00000> 48000000 9421fff0 7c0802a6
Kernel panic - not syncing: Fatal exception in interrupt
---------------
second type of crash:
Faulting instruction address: 0xc026c1dc
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2 MPC8572 DS
last sysfs file: /sys/devices/pci0002:03/0002:03:00.0/subsystem_device
Modules linked in:
NIP: c026c1dc LR: c026bfac CTR: 00000000
REGS: effebd00 TRAP: 0300 Not tainted (2.6.36-rc5)
MSR: 00029000 <EE,ME,CE> CR: 42028042 XER: 00000000
DEAR: 0000cad8, ESR: 00000000
TASK = ef83cde0[3] 'ksoftirqd/0' THREAD: ef84a000 CPU: 0
GPR00: 00000005 effebdb0 ef83cde0 00000000 000001b9 00000000 c1008060 00000000
GPR08: 02c3f605 0000ca00 000005b9 0000ca00 b653a6c7 7af823f0 ef217000 efbab590
GPR16: efbab638 efbab070 00000000 ef217800 00000008 00000018 efbab000 00000028
GPR24: c03f971c c0410000 c0400000 c03f94b4 effea000 ef316e40 00000000 eecb685e
NIP [c026c1dc] ip_rcv+0x3f8/0x808
LR [c026bfac] ip_rcv+0x1c8/0x808
Call Trace:
[effebdb0] [c026c204] ip_rcv+0x420/0x808 (unreliable)
[effebde0] [c02482dc] __netif_receive_skb+0x2f8/0x324
[effebe10] [c02483a4] netif_receive_skb+0x9c/0xb0
[effebe30] [c0226308] gfar_clean_rx_ring+0x18c/0x4b8
[effebe90] [c02269dc] gfar_poll+0x3a8/0x60c
[effebf60] [c024928c] net_rx_action+0xf8/0x1a4
[effebfa0] [c0042524] __do_softirq+0xe0/0x178
[effebff0] [c000e59c] call_do_softirq+0x14/0x24
[ef84bf50] [c0004840] do_softirq+0x90/0xa0
[ef84bf70] [c00430e4] run_ksoftirqd+0xb4/0x164
[ef84bfb0] [c00586b4] kthread+0x7c/0x80
[ef84bff0] [c000e9a8] kernel_thread+0x4c/0x68
Instruction dump:
8148003c 318a0001 7d690194 91680038 9188003c 4bfffd78 7fa3eb78 48002a29
2f830000 40beff50 817d0048 5569003c <a00900d8> 2f800005 419e0034 2f800003
Kernel panic - not syncing: Fatal exception in interrupt
^ permalink raw reply
* Re: To GRO or not to GRO...
From: "Oleg A. Arkhangelsky" @ 2010-10-03 7:18 UTC (permalink / raw)
To: Richard Scobie; +Cc: netdev
In-Reply-To: <4CA7BD60.3010501@sauce.co.nz>
03.10.2010, 03:17, "Richard Scobie" <richard@sauce.co.nz>:
> In the README for ixgbe-2.1.4, is says:
>
> " Disable GRO when routing/bridging
> ---------------------------------
> Due to a known kernel issue, GRO must be turned off when
> routing/bridging.
> GRO can be turned off via ethtool."
>
This information is true for LRO, not GRO.
I believe that this appeared when ixgbe was converted from LRO
to GRO and all occurrences of LRO was blindly replaced by GRO
everywhere.
--
wbr, Oleg.
^ permalink raw reply
* Re: To GRO or not to GRO...
From: David Miller @ 2010-10-03 7:31 UTC (permalink / raw)
To: sysoleg; +Cc: richard, netdev
In-Reply-To: <152341286090326@web67.yandex.ru>
From: "\"Oleg A. Arkhangelsky\"" <sysoleg@yandex.ru>
Date: Sun, 03 Oct 2010 11:18:46 +0400
>
>
> 03.10.2010, 03:17, "Richard Scobie" <richard@sauce.co.nz>:
>
>> In the README for ixgbe-2.1.4, is says:
>>
>> " Disable GRO when routing/bridging
>> ---------------------------------
>> Due to a known kernel issue, GRO must be turned off when
>> routing/bridging.
>> GRO can be turned off via ethtool."
>>
>
> This information is true for LRO, not GRO.
Right.
> I believe that this appeared when ixgbe was converted from LRO
> to GRO and all occurrences of LRO was blindly replaced by GRO
> everywhere.
Someome please submit a patch to fix this, thanks.
^ permalink raw reply
* Re: sysctl_{tcp,udp,sctp}_mem overflow on 16TB system.
From: Maciej Żenczykowski @ 2010-10-03 8:20 UTC (permalink / raw)
To: Willy Tarreau
Cc: Robin Holt, David S. Miller, Alexey Kuznetsov,
Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, linux-kernel,
netdev, linux-decnet-user, linux-sctp
In-Reply-To: <20101001203022.GA28486@1wt.eu>
Isn't INT_MAX/2 just 1GB, which is only ~0.9 seconds at 10 Gbps?
^ permalink raw reply
* Fwd: [multipathtcp] Call for contribution to middlebox survey
From: Alexander Zimmermann @ 2010-10-03 9:28 UTC (permalink / raw)
To: Netdev
In-Reply-To: <985BFFF5-B9DB-4F68-8837-24E434FD08AD@sfc.wide.ad.jp>
[-- Attachment #1: Type: text/plain, Size: 5194 bytes --]
Hi folks,
the Michio Honda from IETF Multipath TCP WG needs some help...
Alex
Anfang der weitergeleiteten E-Mail:
> Von: Michio Honda <micchie@sfc.wide.ad.jp>
> Datum: 3. Oktober 2010 01:30:57 MESZ
> An: Multipath TCP Mailing List <multipathtcp@ietf.org>, <tcpm@ietf.org>
> Kopie: Mark Handley <m.handley@cs.ucl.ac.uk>
> Betreff: [multipathtcp] Call for contribution to middlebox survey
>
> Hi,
>
> We are surveying middleboxes affecting TCP in the Internet, and we'd like you to contribute to this work by running 1 python script at your available networks, because we want data of as many paths as possible.
> This script generates test TCP traffic to a server node, and detects various middlebox behavior, for example, it detects how unknown TCP options are treated and if sequence number is rewritten.
>
> - Overview of script
> This generates test TCP traffic by using raw socket or pcap.
> Destinations of the test traffic are port 80, 443 and 34343 on vinson3.sfc.wide.ad.jp, which is located in Japan.
> The total amount of test traffic is approximately 90 connections (not parallel), and each of them uses approximately maximum 2048Byte.
>
> - System requirement
> Our script works on Mac OSX 10.5 or 10.6, Linux (kernel 2.6) and FreeBSD (7.0 or higher). This also requires python 2.5 or higher, and libpcap
> NOTE. if you try in a virtual machine on Windows, please connect the guest OS via not NAT but bridge.
>
> How to run experiment is described below per-OS basis.
>
> After the experiment, you will find 3 log files (logxxxxxxxxx.txt) in the same directory as the experiment.
> Please send them to us (micchie@sfc.wide.ad.jp) and tell me your network information as much as you know (e.g., product name of the broadband router, ISP name, product name of firewall appliance etc...)
> In addition, let us know if you have hesitation to open these information.
> This experiment doesn't collect traffic information other than those our script generated.
>
> ***** How to run the experiment (Mac OSX) *****
>
> 1. Filtering RST TCP segment from OS
> Execute a following command by root:
> ipfw add 101 deny tcp from any to vinson3.sfc.wide.ad.jp dst-port 34343,80,443 tcpflags rst
>
> NOTE: if you are already running ipfw, please add equivalent rules
> After the experiment, you can revert by "ipfw delete 101"
>
> 2. Executing script
> Download script from http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz by command line)
>
> In the for_distrib directory, execute a following command by root:
> sh run-bsd2.sh
> (This will take approximately 30 min.)
>
>
> ***** How to run the experiment (Linux) *****
>
> 1. Filtering RST TCP segment from OS
> Execute following command by root:
> /sbin/iptables -A OUTPUT -p tcp -d vinson3.sfc.wide.ad.jp --tcp-flags RST RST -m multiport --dports 34343,80,443 -j DROP
>
> NOTE: if you are already running iptables, please add equivalent rules
> After the experiment, you can revert by opposite commands - using -D instead of -A
>
> 2. Executing script
> Download script from http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz)
>
> In the for_distrib directory, execute a following command by root:
> sh run-linux2.sh
> (This will take approximately 30 min.)
>
>
> ***** How to run the script (FreeBSD) *****
>
> 1. Filtering RST TCP segment from OS
> If you are using neither ipfw nor pf:
> Load pf kernel module with a following command by root:
> kldload /boot/kernel/pf.ko
>
> Add following 2 lines to /etc/pf.conf (please replace IFNAME to your outgoing interface name (e.g., em0):
> pass out all
> block out quick on IFNAME proto tcp to vinson3.sfc.wide.ad.jp port {34343,80,443} flags R/R
>
> Execute following command by root:
> pfctl -e -f /etc/pf.conf
>
> If you are already running pf, please add equivalent rules
> After the experiment, you can revert settings by cleaning up /etc/pf.conf and executing "pfctl -d" by root
>
> If you are already using ipfw:
> Please add a following rule to ipfw configuration:
> deny tcp from any to vinson3.sfc.wide.ad.jp dst-port 34343,80,443 tcpflags rst
>
> 2. Executing script
> Download script from http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz)
>
> In the for_distrib directory, execute a following command by root:
> sh run-bsd2.sh
> (This will take approximately 30 min.)
>
>
> Best regards,
> - Michio
>
> _______________________________________________
> multipathtcp mailing list
> multipathtcp@ietf.org
> https://www.ietf.org/mailman/listinfo/multipathtcp
//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
[-- Attachment #2: Signierter Teil der Nachricht --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply
* Re: [PATCH net-next V3] net: dynamic ingress_queue allocation
From: Jarek Poplawski @ 2010-10-03 9:42 UTC (permalink / raw)
To: Eric Dumazet; +Cc: hadi, David Miller, netdev
In-Reply-To: <1286035915.2582.2472.camel@edumazet-laptop>
On Sat, Oct 02, 2010 at 06:11:55PM +0200, Eric Dumazet wrote:
> Le samedi 02 octobre 2010 ?? 11:32 +0200, Jarek Poplawski a écrit :
> > On Fri, Oct 01, 2010 at 03:56:28PM +0200, Eric Dumazet wrote:
...
> > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > > index b802078..8635110 100644
> > > --- a/net/sched/sch_api.c
> > > +++ b/net/sched/sch_api.c
...
> > > @@ -690,6 +693,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> > > (new && new->flags & TCQ_F_INGRESS)) {
> > > num_q = 1;
> > > ingress = 1;
> > > + if (!dev_ingress_queue(dev))
> > > + return -ENOENT;
> >
> > Is this test really needed here?
>
> To avoid a NULL dereference some lines later.
> Do I have a guarantee its not NULL here ?
Do you have any scenario for NULL here? ;-)
Of course, it's your patch and responsibility, and I'll not guarantee,
but you could at least add a TODO comment, to check it later.
> > > @@ -1044,7 +1050,8 @@ replay:
> > > return -ENOENT;
> > > q = qdisc_leaf(p, clid);
> > > } else { /*ingress */
> > > - q = dev->ingress_queue.qdisc_sleeping;
> > > + if (dev_ingress_queue_create(dev))
> > > + q = dev_ingress_queue(dev)->qdisc_sleeping;
> >
> > I wonder if doing dev_ingress_queue_create() just before qdisc_create()
> > (and the test here) isn't more readable.
>
> Sorry, I dont understand. I want to create ingress_queue only if user
> wants it. If we setup (egress) trafic shaping, no need to setup
> ingress_queue.
I mean doing both creates in one place:
> @@ -1123,11 +1130,14 @@ replay:
> create_n_graft:
...
> + if (clid == TC_H_INGRESS) {
+ if (dev_ingress_queue_create(dev))
> + q = qdisc_create(dev, dev_ingress_queue(dev), p,
> + tcm->tcm_parent, tcm->tcm_parent,
> + tca, &err);
> + else
> + err = -ENOENT;
> + } else {
> struct netdev_queue *dev_queue;
...
> Here is the V3 then.
>
> [PATCH net-next V3] net: dynamic ingress_queue allocation
>
> ingress being not used very much, and net_device->ingress_queue being
> quite a big object (128 or 256 bytes), use a dynamic allocation if
> needed (tc qdisc add dev eth0 ingress ...)
>
> dev_ingress_queue(dev) helper should be used only with RTNL taken.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> V3: add rcu notations & address Jarek comments
> include/linux/netdevice.h | 2 -
> include/linux/rtnetlink.h | 8 ++++++
> net/core/dev.c | 34 ++++++++++++++++++++++-------
> net/sched/sch_api.c | 42 ++++++++++++++++++++++++------------
> net/sched/sch_generic.c | 12 ++++++----
> 5 files changed, 71 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index ceed347..92d81ed 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -986,7 +986,7 @@ struct net_device {
> rx_handler_func_t *rx_handler;
> void *rx_handler_data;
>
> - struct netdev_queue ingress_queue; /* use two cache lines */
> + struct netdev_queue __rcu *ingress_queue;
>
> /*
> * Cache lines mostly used on transmit path
> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> index 68c436b..0bb7b48 100644
> --- a/include/linux/rtnetlink.h
> +++ b/include/linux/rtnetlink.h
> @@ -6,6 +6,7 @@
> #include <linux/if_link.h>
> #include <linux/if_addr.h>
> #include <linux/neighbour.h>
> +#include <linux/netdevice.h>
>
> /* rtnetlink families. Values up to 127 are reserved for real address
> * families, values above 128 may be used arbitrarily.
> @@ -769,6 +770,13 @@ extern int lockdep_rtnl_is_held(void);
> #define rtnl_dereference(p) \
> rcu_dereference_check(p, lockdep_rtnl_is_held())
>
> +static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
> +{
> + return rtnl_dereference(dev->ingress_queue);
I'd consider rcu_dereference_rtnl(). Btw, technically qdisc_lookup()
doesn't require rtnl, and there was time it was used without it
(on xmit path).
I think you should also add a comment here why this rcu is used, and
that it changes only once in dev's liftime.
Jarek P.
PS: checkpatched or not checkpatched, that is the question... ;-)
^ permalink raw reply
* [PATCH] net: Fix the condition passed to sk_wait_event()
From: Nagendra Tomar @ 2010-10-03 9:45 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, davem
Resending, since this is the only patch now. Thanks.
---
This patch fixes the condition (3rd arg) passed to sk_wait_event() in
sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
causes the following soft lockup in tcp_sendmsg() when the global tcp
memory pool has exhausted.
>>> snip <<<
localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel:
localhost kernel: Call Trace:
localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83
>>> snip <<<
What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout().
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.
Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
---
--- linux-2.6.35.7/net/core/stream.c.orig 2010-03-25 07:37:58.000000000 +0530
+++ linux-2.6.35.7/net/core/stream.c 2010-03-25 07:42:16.000000000 +0530
@@ -144,10 +144,10 @@ int sk_stream_wait_memory(struct sock *s
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk->sk_write_pending++;
- sk_wait_event(sk, ¤t_timeo, !sk->sk_err &&
- !(sk->sk_shutdown & SEND_SHUTDOWN) &&
- sk_stream_memory_free(sk) &&
- vm_wait);
+ sk_wait_event(sk, ¤t_timeo, sk->sk_err ||
+ (sk->sk_shutdown & SEND_SHUTDOWN) ||
+ (sk_stream_memory_free(sk) &&
+ !vm_wait));
sk->sk_write_pending--;
if (vm_wait) {
---
^ permalink raw reply
* Re: [Patch] Limit sysctl_tcp_mem and sysctl_udp_mem initializers to prevent integer overflows.
From: Robin Holt @ 2010-10-03 11:16 UTC (permalink / raw)
To: Eric Dumazet
Cc: Robin Holt, Andrew Morton, Willy Tarreau, linux-kernel, netdev,
David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
James Morris, Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <1286025736.2582.1827.camel@edumazet-laptop>
On Sat, Oct 02, 2010 at 03:22:16PM +0200, Eric Dumazet wrote:
> Le samedi 02 octobre 2010 à 06:24 -0500, Robin Holt a écrit :
...
> Strange, you mention sctp in changelog but I cant see the patch.
After looking at the patch, I realized it really belonged in a separate
change and sent that to the sctp mailing list without noticing I forgot
to Cc: lkml.
> We can switch infrastructure to use long "instead" of "int", now
> atomic_long_t primitives are available for free.
>
> Reported-by: Robin Holt <holt@sgi.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Robin Holt <holt@sgi.com>
^ permalink raw reply
* Re: sysctl_{tcp,udp,sctp}_mem overflow on 16TB system.
From: Robin Holt @ 2010-10-03 11:54 UTC (permalink / raw)
To: Maciej Żenczykowski
Cc: Willy Tarreau, Robin Holt, David S. Miller, Alexey Kuznetsov,
Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, linux-kernel,
netdev, linux-decnet-user, linux-sctp
In-Reply-To: <AANLkTin5wPvFQDFrupqGs_Jbh1rgrTjMupbPUFnvrBrv@mail.gmail.com>
On Sun, Oct 03, 2010 at 01:20:32AM -0700, Maciej Żenczykowski wrote:
> Isn't INT_MAX/2 just 1GB, which is only ~0.9 seconds at 10 Gbps?
Units matter. 1GB pages. We can limit to 2GB pages or 8TB.
Robin
^ permalink raw reply
* PROPOSAL..
From: Mrs Irina Gutavo @ 2010-10-03 11:40 UTC (permalink / raw)
I am Mrs Irina Gutavo a Cancer Patient,i hereby donate to you my £20
Million Pounds to set up a Charity foundation for my doctor recently
informed me that i have few weeks to live.
Please respond so i can have my Lawyer contact you with further details to
receive this inheritance.
Sincerely.
Mrs. Irina Gutavo.
^ permalink raw reply
* Re: [PATCH net-next V3] net: dynamic ingress_queue allocation
From: jamal @ 2010-10-03 13:10 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Eric Dumazet, David Miller, netdev
In-Reply-To: <20101003094221.GA2028@del.dom.local>
On Sun, 2010-10-03 at 11:42 +0200, Jarek Poplawski wrote:
> >
> > To avoid a NULL dereference some lines later.
> > Do I have a guarantee its not NULL here ?
>
> Do you have any scenario for NULL here? ;-)
This is why i called this part clever earlier ;-> It is
clever. There are several scenarios (i attempted to represent them
in the tests that Eric run):
1) ingress qdisc has been compiled in
flags & TCQ_F_INGRESS is true
a) user trying to add ingress qdisc first time
then q is null, new is not null and this would work
b) user trying to delete already added qdisc
then q is not null, new is null
2) ingress qdisc not compiled in
Repeat #1a above, and Eric's check will bail out ..
The one thing that may have been useful is to also try
a "replace" after #1a and maybe after #2
cheers,
jamal
^ permalink raw reply
* Re: [PATCH v12 12/17] Add mp(mediate passthru) device.
From: Michael S. Tsirkin @ 2010-10-03 13:12 UTC (permalink / raw)
To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mingo, davem, herbert, jdike
In-Reply-To: <c898c79a9a73f531d790d9983bf01b9aa05752b1.1285853725.git.xiaohui.xin@intel.com>
On Thu, Sep 30, 2010 at 10:04:30PM +0800, xiaohui.xin@intel.com wrote:
> From: Xin Xiaohui <xiaohui.xin@intel.com>
>
> The patch add mp(mediate passthru) device, which now
> based on vhost-net backend driver and provides proto_ops
> to send/receive guest buffers data from/to guest vitio-net
> driver.
>
> Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
> Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
> Reviewed-by: Jeff Dike <jdike@linux.intel.com>
So you plan to rewrite all this to make this code part of macvtap?
> ---
> drivers/vhost/mpassthru.c | 1380 +++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 1380 insertions(+), 0 deletions(-)
> create mode 100644 drivers/vhost/mpassthru.c
>
> diff --git a/drivers/vhost/mpassthru.c b/drivers/vhost/mpassthru.c
> new file mode 100644
> index 0000000..1a114d1
> --- /dev/null
> +++ b/drivers/vhost/mpassthru.c
> @@ -0,0 +1,1380 @@
> +/*
> + * MPASSTHRU - Mediate passthrough device.
> + * Copyright (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define DRV_NAME "mpassthru"
> +#define DRV_DESCRIPTION "Mediate passthru device driver"
> +#define DRV_COPYRIGHT "(C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G"
> +
> +#include <linux/compat.h>
> +#include <linux/module.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/major.h>
> +#include <linux/slab.h>
> +#include <linux/smp_lock.h>
> +#include <linux/poll.h>
> +#include <linux/fcntl.h>
> +#include <linux/init.h>
> +#include <linux/aio.h>
> +
> +#include <linux/skbuff.h>
> +#include <linux/netdevice.h>
> +#include <linux/etherdevice.h>
> +#include <linux/miscdevice.h>
> +#include <linux/ethtool.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/if.h>
> +#include <linux/if_arp.h>
> +#include <linux/if_ether.h>
> +#include <linux/crc32.h>
> +#include <linux/nsproxy.h>
> +#include <linux/uaccess.h>
> +#include <linux/virtio_net.h>
> +#include <linux/mpassthru.h>
> +#include <net/net_namespace.h>
> +#include <net/netns/generic.h>
> +#include <net/rtnetlink.h>
> +#include <net/sock.h>
> +
> +#include <asm/system.h>
> +
> +#define COPY_THRESHOLD (L1_CACHE_BYTES * 4)
> +#define COPY_HDR_LEN (L1_CACHE_BYTES < 64 ? 64 : L1_CACHE_BYTES)
> +
> +struct frag {
> + u16 offset;
> + u16 size;
> +};
> +
> +#define HASH_BUCKETS (8192*2)
> +
> +struct page_info {
> + struct list_head list;
> + struct page_info *next;
> + struct page_info *prev;
> + struct page *pages[MAX_SKB_FRAGS];
> + struct sk_buff *skb;
> + struct page_pool *pool;
> +
> + /* The pointer relayed to skb, to indicate
> + * it's a external allocated skb or kernel
> + */
> + struct skb_ext_page ext_page;
> + /* flag to indicate read or write */
> +#define INFO_READ 0
> +#define INFO_WRITE 1
> + unsigned flags;
> + /* exact number of locked pages */
> + unsigned pnum;
> +
> + /* The fields after that is for backend
> + * driver, now for vhost-net.
> + */
> + /* the kiocb structure related to */
> + struct kiocb *iocb;
> + /* the ring descriptor index */
> + unsigned int desc_pos;
> + /* the iovec coming from backend, we only
> + * need few of them */
> + struct iovec hdr[2];
> + struct iovec iov[2];
> +};
> +
> +static struct kmem_cache *ext_page_info_cache;
> +
> +struct page_pool {
> + /* the queue for rx side */
> + struct list_head readq;
> + /* the lock to protect readq */
> + spinlock_t read_lock;
> + /* record the orignal rlimit */
> + struct rlimit o_rlim;
> + /* record the locked pages */
> + int lock_pages;
> + /* the device according to */
> + struct net_device *dev;
> + /* the mp_port according to dev */
> + struct mp_port port;
> + /* the hash_table list to find each locked page */
> + struct page_info **hash_table;
> +};
> +
> +struct mp_struct {
> + struct mp_file *mfile;
> + struct net_device *dev;
> + struct page_pool *pool;
> + struct socket socket;
> +};
> +
> +struct mp_file {
> + atomic_t count;
> + struct mp_struct *mp;
> + struct net *net;
> +};
> +
> +struct mp_sock {
> + struct sock sk;
> + struct mp_struct *mp;
> +};
> +
> +/* The main function to allocate external buffers */
> +static struct skb_ext_page *page_ctor(struct mp_port *port,
> + struct sk_buff *skb,
> + int npages)
> +{
> + int i;
> + unsigned long flags;
> + struct page_pool *pool;
> + struct page_info *info = NULL;
> +
> + if (npages != 1)
> + BUG();
> + pool = container_of(port, struct page_pool, port);
> +
> + spin_lock_irqsave(&pool->read_lock, flags);
> + if (!list_empty(&pool->readq)) {
> + info = list_first_entry(&pool->readq, struct page_info, list);
> + list_del(&info->list);
> + }
> + spin_unlock_irqrestore(&pool->read_lock, flags);
> + if (!info)
> + return NULL;
> +
> + for (i = 0; i < info->pnum; i++)
> + get_page(info->pages[i]);
> + info->skb = skb;
> + return &info->ext_page;
> +}
> +
> +static struct page_info *mp_hash_lookup(struct page_pool *pool,
> + struct page *page);
> +static struct page_info *mp_hash_delete(struct page_pool *pool,
> + struct page_info *info);
> +
> +static struct skb_ext_page *mp_lookup(struct net_device *dev,
> + struct page *page)
> +{
> + struct mp_struct *mp =
> + container_of(dev->mp_port->sock->sk, struct mp_sock, sk)->mp;
> + struct page_pool *pool = mp->pool;
> + struct page_info *info;
> +
> + info = mp_hash_lookup(pool, page);
> + if (!info)
> + return NULL;
> + return &info->ext_page;
> +}
> +
> +static int page_pool_attach(struct mp_struct *mp)
> +{
> + int rc;
> + struct page_pool *pool;
> + struct net_device *dev = mp->dev;
> +
> + /* locked by mp_mutex */
> + if (mp->pool)
> + return -EBUSY;
> +
> + pool = kzalloc(sizeof(*pool), GFP_KERNEL);
> + if (!pool)
> + return -ENOMEM;
> + rc = netdev_mp_port_prep(dev, &pool->port);
> + if (rc)
> + goto fail;
> +
> + INIT_LIST_HEAD(&pool->readq);
> + spin_lock_init(&pool->read_lock);
> + pool->hash_table = kzalloc(sizeof(struct page_info *) * HASH_BUCKETS,
> + GFP_KERNEL);
> + if (!pool->hash_table)
> + goto fail;
> +
> + dev_hold(dev);
> + pool->dev = dev;
> + pool->port.ctor = page_ctor;
> + pool->port.sock = &mp->socket;
> + pool->port.hash = mp_lookup;
> + pool->lock_pages = 0;
> +
> + /* locked by mp_mutex */
> + dev->mp_port = &pool->port;
> + mp->pool = pool;
> +
> + return 0;
> +
> +fail:
> + kfree(pool);
> + dev_put(dev);
> +
> + return rc;
> +}
> +
> +struct page_info *info_dequeue(struct page_pool *pool)
> +{
> + unsigned long flags;
> + struct page_info *info = NULL;
> + spin_lock_irqsave(&pool->read_lock, flags);
> + if (!list_empty(&pool->readq)) {
> + info = list_first_entry(&pool->readq,
> + struct page_info, list);
> + list_del(&info->list);
> + }
> + spin_unlock_irqrestore(&pool->read_lock, flags);
> + return info;
> +}
> +
> +static int set_memlock_rlimit(struct page_pool *pool, int resource,
> + unsigned long cur, unsigned long max)
> +{
> + struct rlimit new_rlim, *old_rlim;
> + int retval;
> +
> + if (resource != RLIMIT_MEMLOCK)
> + return -EINVAL;
> + new_rlim.rlim_cur = cur;
> + new_rlim.rlim_max = max;
> +
> + old_rlim = current->signal->rlim + resource;
> +
> + /* remember the old rlimit value when backend enabled */
> + pool->o_rlim.rlim_cur = old_rlim->rlim_cur;
> + pool->o_rlim.rlim_max = old_rlim->rlim_max;
> +
> + if ((new_rlim.rlim_max > old_rlim->rlim_max) &&
> + !capable(CAP_SYS_RESOURCE))
> + return -EPERM;
> +
> + retval = security_task_setrlimit(resource, &new_rlim);
> + if (retval)
> + return retval;
> +
> + task_lock(current->group_leader);
> + *old_rlim = new_rlim;
> + task_unlock(current->group_leader);
> + return 0;
> +}
> +
> +static void mp_ki_dtor(struct kiocb *iocb)
> +{
> + struct page_info *info = (struct page_info *)(iocb->private);
> + int i;
> +
> + if (info->flags == INFO_READ) {
> + for (i = 0; i < info->pnum; i++) {
> + if (info->pages[i]) {
> + set_page_dirty_lock(info->pages[i]);
> + put_page(info->pages[i]);
> + }
> + }
> + mp_hash_delete(info->pool, info);
> + if (info->skb) {
> + info->skb->destructor = NULL;
> + kfree_skb(info->skb);
> + }
> + }
> + /* Decrement the number of locked pages */
> + info->pool->lock_pages -= info->pnum;
> + kmem_cache_free(ext_page_info_cache, info);
> +
> + return;
> +}
> +
> +static struct kiocb *create_iocb(struct page_info *info, int size)
> +{
> + struct kiocb *iocb = NULL;
> +
> + iocb = info->iocb;
> + if (!iocb)
> + return iocb;
> + iocb->ki_flags = 0;
> + iocb->ki_users = 1;
> + iocb->ki_key = 0;
> + iocb->ki_ctx = NULL;
> + iocb->ki_cancel = NULL;
> + iocb->ki_retry = NULL;
> + iocb->ki_eventfd = NULL;
> + iocb->ki_pos = info->desc_pos;
> + iocb->ki_nbytes = size;
> + iocb->ki_dtor(iocb);
> + iocb->private = (void *)info;
> + iocb->ki_dtor = mp_ki_dtor;
> +
> + return iocb;
> +}
> +
> +static int page_pool_detach(struct mp_struct *mp)
> +{
> + struct page_pool *pool;
> + struct page_info *info;
> + int i;
> +
> + /* locked by mp_mutex */
> + pool = mp->pool;
> + if (!pool)
> + return -ENODEV;
> +
> + while ((info = info_dequeue(pool))) {
> + for (i = 0; i < info->pnum; i++)
> + if (info->pages[i])
> + put_page(info->pages[i]);
> + create_iocb(info, 0);
> + kmem_cache_free(ext_page_info_cache, info);
> + }
> +
> + set_memlock_rlimit(pool, RLIMIT_MEMLOCK,
> + pool->o_rlim.rlim_cur,
> + pool->o_rlim.rlim_max);
> +
> + /* locked by mp_mutex */
> + pool->dev->mp_port = NULL;
> + dev_put(pool->dev);
> +
> + mp->pool = NULL;
> + kfree(pool->hash_table);
> + kfree(pool);
> + return 0;
> +}
> +
> +static void __mp_detach(struct mp_struct *mp)
> +{
> + mp->mfile = NULL;
> +
> + dev_change_flags(mp->dev, mp->dev->flags & ~IFF_UP);
> + page_pool_detach(mp);
> + dev_change_flags(mp->dev, mp->dev->flags | IFF_UP);
> +
> + /* Drop the extra count on the net device */
> + dev_put(mp->dev);
> +}
> +
> +static DEFINE_MUTEX(mp_mutex);
> +
> +static void mp_detach(struct mp_struct *mp)
> +{
> + mutex_lock(&mp_mutex);
> + __mp_detach(mp);
> + mutex_unlock(&mp_mutex);
> +}
> +
> +static struct mp_struct *mp_get(struct mp_file *mfile)
> +{
> + struct mp_struct *mp = NULL;
> + if (atomic_inc_not_zero(&mfile->count))
> + mp = mfile->mp;
> +
> + return mp;
> +}
> +
> +static void mp_put(struct mp_file *mfile)
> +{
> + if (atomic_dec_and_test(&mfile->count)) {
> + if (!rtnl_is_locked()) {
> + rtnl_lock();
> + mp_detach(mfile->mp);
> + rtnl_unlock();
> + } else
> + mp_detach(mfile->mp);
> + }
> +}
> +
> +static void iocb_tag(struct kiocb *iocb)
> +{
> + iocb->ki_flags = 1;
> +}
> +
> +/* The callback to destruct the external buffers or skb */
> +static void page_dtor(struct skb_ext_page *ext_page)
> +{
> + struct page_info *info;
> + struct page_pool *pool;
> + struct sock *sk;
> + struct sk_buff *skb;
> +
> + if (!ext_page)
> + return;
> + info = container_of(ext_page, struct page_info, ext_page);
> + if (!info)
> + return;
> + pool = info->pool;
> + skb = info->skb;
> +
> + if (info->flags == INFO_READ) {
> + create_iocb(info, 0);
> + return;
> + }
> +
> + /* For transmit, we should wait for the DMA finish by hardware.
> + * Queue the notifier to wake up the backend driver
> + */
> +
> + iocb_tag(info->iocb);
> + sk = pool->port.sock->sk;
> + sk->sk_write_space(sk);
> +
> + return;
> +}
> +
> +/* For small exteranl buffers transmit, we don't need to call
> + * get_user_pages().
> + */
> +static struct page_info *alloc_small_page_info(struct page_pool *pool,
> + struct kiocb *iocb, int total)
> +{
> + struct page_info *info =
> + kmem_cache_alloc(ext_page_info_cache, GFP_KERNEL);
> +
> + if (!info)
> + return NULL;
> + info->ext_page.dtor = page_dtor;
> + info->pool = pool;
> + info->flags = INFO_WRITE;
> + info->iocb = iocb;
> + info->pnum = 0;
> + return info;
> +}
> +
> +typedef u32 key_mp_t;
> +static inline key_mp_t mp_hash(struct page *page, int buckets)
> +{
> + key_mp_t k;
> +#if BITS_PER_LONG == 64
> + k = ((((unsigned long)page << 32UL) >> 32UL) /
> + sizeof(struct page)) % buckets ;
> +#elif BITS_PER_LONG == 32
> + k = ((unsigned long)page / sizeof(struct page)) % buckets;
> +#endif
> +
> + return k;
> +}
> +
> +static void mp_hash_insert(struct page_pool *pool,
> + struct page *page, struct page_info *page_info)
> +{
> + struct page_info *tmp;
> + key_mp_t key = mp_hash(page, HASH_BUCKETS);
> + if (!pool->hash_table[key]) {
> + pool->hash_table[key] = page_info;
> + return;
> + }
> +
> + tmp = pool->hash_table[key];
> + while (tmp->next)
> + tmp = tmp->next;
> +
> + tmp->next = page_info;
> + page_info->prev = tmp;
> + return;
> +}
> +
> +static struct page_info *mp_hash_delete(struct page_pool *pool,
> + struct page_info *info)
> +{
> + key_mp_t key = mp_hash(info->pages[0], HASH_BUCKETS);
> + struct page_info *tmp = NULL;
> +
> + tmp = pool->hash_table[key];
> + while (tmp) {
> + if (tmp == info) {
> + if (!tmp->prev) {
> + pool->hash_table[key] = tmp->next;
> + if (tmp->next)
> + tmp->next->prev = NULL;
> + } else {
> + tmp->prev->next = tmp->next;
> + if (tmp->next)
> + tmp->next->prev = tmp->prev;
> + }
> + return tmp;
> + }
> + tmp = tmp->next;
> + }
> + return tmp;
> +}
> +
> +static struct page_info *mp_hash_lookup(struct page_pool *pool,
> + struct page *page)
> +{
> + key_mp_t key = mp_hash(page, HASH_BUCKETS);
> + struct page_info *tmp = NULL;
> +
> + int i;
> + tmp = pool->hash_table[key];
> + while (tmp) {
> + for (i = 0; i < tmp->pnum; i++) {
> + if (tmp->pages[i] == page)
> + return tmp;
> + }
> + tmp = tmp->next;
> + }
> + return tmp;
> +}
> +
> +/* The main function to transform the guest user space address
> + * to host kernel address via get_user_pages(). Thus the hardware
> + * can do DMA directly to the external buffer address.
> + */
> +static struct page_info *alloc_page_info(struct page_pool *pool,
> + struct kiocb *iocb, struct iovec *iov,
> + int count, struct frag *frags,
> + int npages, int total)
> +{
> + int rc;
> + int i, j, n = 0;
> + int len;
> + unsigned long base, lock_limit;
> + struct page_info *info = NULL;
> +
> + lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
> + lock_limit >>= PAGE_SHIFT;
> +
> + if (pool->lock_pages + count > lock_limit && npages) {
> + printk(KERN_INFO "exceed the locked memory rlimit.");
> + return NULL;
> + }
> +
> + info = kmem_cache_alloc(ext_page_info_cache, GFP_KERNEL);
> +
> + if (!info)
> + return NULL;
> + info->skb = NULL;
> + info->next = info->prev = NULL;
> +
> + for (i = j = 0; i < count; i++) {
> + base = (unsigned long)iov[i].iov_base;
> + len = iov[i].iov_len;
> +
> + if (!len)
> + continue;
> + n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> +
> + rc = get_user_pages_fast(base, n, npages ? 1 : 0,
> + &info->pages[j]);
> + if (rc != n)
> + goto failed;
> +
> + while (n--) {
> + frags[j].offset = base & ~PAGE_MASK;
> + frags[j].size = min_t(int, len,
> + PAGE_SIZE - frags[j].offset);
> + len -= frags[j].size;
> + base += frags[j].size;
> + j++;
> + }
> + }
> +
> +#ifdef CONFIG_HIGHMEM
> + if (npages && !(dev->features & NETIF_F_HIGHDMA)) {
> + for (i = 0; i < j; i++) {
> + if (PageHighMem(info->pages[i]))
> + goto failed;
> + }
> + }
> +#endif
> +
> + info->ext_page.dtor = page_dtor;
> + info->ext_page.page = info->pages[0];
> + info->pool = pool;
> + info->pnum = j;
> + info->iocb = iocb;
> + if (!npages)
> + info->flags = INFO_WRITE;
> + else
> + info->flags = INFO_READ;
> +
> + if (info->flags == INFO_READ) {
> + if (frags[0].offset == 0 && iocb->ki_iovec[0].iov_len) {
> + frags[0].offset = iocb->ki_iovec[0].iov_len;
> + pool->port.vnet_hlen = iocb->ki_iovec[0].iov_len;
> + }
> + for (i = 0; i < j; i++)
> + mp_hash_insert(pool, info->pages[i], info);
> + }
> + /* increment the number of locked pages */
> + pool->lock_pages += j;
> + return info;
> +
> +failed:
> + for (i = 0; i < j; i++)
> + put_page(info->pages[i]);
> +
> + kmem_cache_free(ext_page_info_cache, info);
> +
> + return NULL;
> +}
> +
> +static void mp_sock_destruct(struct sock *sk)
> +{
> + struct mp_struct *mp = container_of(sk, struct mp_sock, sk)->mp;
> + kfree(mp);
> +}
> +
> +static void mp_sock_state_change(struct sock *sk)
> +{
> + if (sk_has_sleeper(sk))
> + wake_up_interruptible_sync_poll(sk->sk_sleep, POLLIN);
> +}
> +
> +static void mp_sock_write_space(struct sock *sk)
> +{
> + if (sk_has_sleeper(sk))
> + wake_up_interruptible_sync_poll(sk->sk_sleep, POLLOUT);
> +}
> +
> +static void mp_sock_data_ready(struct sock *sk, int coming)
> +{
> + struct mp_struct *mp = container_of(sk, struct mp_sock, sk)->mp;
> + struct page_pool *pool = NULL;
> + struct sk_buff *skb = NULL;
> + struct page_info *info = NULL;
> + int len;
> +
> + pool = mp->pool;
> + if (!pool)
> + return;
> +
> + while ((skb = skb_dequeue(&sk->sk_receive_queue)) != NULL) {
> + struct page *page;
> + int off;
> + int size = 0, i = 0;
> + struct skb_shared_info *shinfo = skb_shinfo(skb);
> + struct skb_ext_page *ext_page =
> + (struct skb_ext_page *)(shinfo->destructor_arg);
> + struct virtio_net_hdr_mrg_rxbuf hdr = {
> + .hdr.flags = 0,
> + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
> + };
> +
> + if (skb->ip_summed == CHECKSUM_COMPLETE)
> + printk(KERN_INFO "Complete checksum occurs\n");
> +
> + if (shinfo->frags[0].page == ext_page->page) {
> + info = container_of(ext_page,
> + struct page_info,
> + ext_page);
> + if (shinfo->nr_frags)
> + hdr.num_buffers = shinfo->nr_frags;
> + else
> + hdr.num_buffers = shinfo->nr_frags + 1;
> + } else {
> + info = container_of(ext_page,
> + struct page_info,
> + ext_page);
> + hdr.num_buffers = shinfo->nr_frags + 1;
> + }
> + skb_push(skb, ETH_HLEN);
> +
> + if (skb_is_gso(skb)) {
> + hdr.hdr.hdr_len = skb_headlen(skb);
> + hdr.hdr.gso_size = shinfo->gso_size;
> + if (shinfo->gso_type & SKB_GSO_TCPV4)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> + else if (shinfo->gso_type & SKB_GSO_TCPV6)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> + else if (shinfo->gso_type & SKB_GSO_UDP)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
> + else
> + BUG();
> + if (shinfo->gso_type & SKB_GSO_TCP_ECN)
> + hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
> +
> + } else
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE;
> +
> + if (skb->ip_summed == CHECKSUM_PARTIAL) {
> + hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> + hdr.hdr.csum_start =
> + skb->csum_start - skb_headroom(skb);
> + hdr.hdr.csum_offset = skb->csum_offset;
> + }
> +
> + off = info->hdr[0].iov_len;
> + len = memcpy_toiovec(info->iov, (unsigned char *)&hdr, off);
> + if (len) {
> + pr_debug("Unable to write vnet_hdr at addr '%p': '%d'\n",
> + info->iov, len);
> + goto clean;
> + }
> +
> + memcpy_toiovec(info->iov, skb->data, skb_headlen(skb));
> +
> + info->iocb->ki_left = hdr.num_buffers;
> + if (shinfo->frags[0].page == ext_page->page) {
> + size = shinfo->frags[0].size +
> + shinfo->frags[0].page_offset - off;
> + i = 1;
> + } else {
> + size = skb_headlen(skb);
> + i = 0;
> + }
> + create_iocb(info, off + size);
> + for (i = i; i < shinfo->nr_frags; i++) {
> + page = shinfo->frags[i].page;
> + info = mp_hash_lookup(pool, shinfo->frags[i].page);
> + create_iocb(info, shinfo->frags[i].size);
> + }
> + info->skb = skb;
> + shinfo->nr_frags = 0;
> + shinfo->destructor_arg = NULL;
> + continue;
> +clean:
> + kfree_skb(skb);
> + for (i = 0; i < info->pnum; i++)
> + put_page(info->pages[i]);
> + kmem_cache_free(ext_page_info_cache, info);
> + }
> + return;
> +}
> +
> +static inline struct sk_buff *mp_alloc_skb(struct sock *sk, size_t prepad,
> + size_t len, size_t linear,
> + int noblock, int *err)
> +{
> + struct sk_buff *skb;
> +
> + /* Under a page? Don't bother with paged skb. */
> + if (prepad + len < PAGE_SIZE || !linear)
> + linear = len;
> +
> + skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
> + err);
> + if (!skb)
> + return NULL;
> +
> + skb_reserve(skb, prepad);
> + skb_put(skb, linear);
> + skb->data_len = len - linear;
> + skb->len += len - linear;
> +
> + return skb;
> +}
> +
> +static int mp_skb_from_vnet_hdr(struct sk_buff *skb,
> + struct virtio_net_hdr *vnet_hdr)
> +{
> + unsigned short gso_type = 0;
> + if (vnet_hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> + switch (vnet_hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> + case VIRTIO_NET_HDR_GSO_TCPV4:
> + gso_type = SKB_GSO_TCPV4;
> + break;
> + case VIRTIO_NET_HDR_GSO_TCPV6:
> + gso_type = SKB_GSO_TCPV6;
> + break;
> + case VIRTIO_NET_HDR_GSO_UDP:
> + gso_type = SKB_GSO_UDP;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + if (vnet_hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
> + gso_type |= SKB_GSO_TCP_ECN;
> +
> + if (vnet_hdr->gso_size == 0)
> + return -EINVAL;
> + }
> +
> + if (vnet_hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> + if (!skb_partial_csum_set(skb, vnet_hdr->csum_start,
> + vnet_hdr->csum_offset))
> + return -EINVAL;
> + }
> +
> + if (vnet_hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> + skb_shinfo(skb)->gso_size = vnet_hdr->gso_size;
> + skb_shinfo(skb)->gso_type = gso_type;
> +
> + /* Header must be checked, and gso_segs computed. */
> + skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
> + skb_shinfo(skb)->gso_segs = 0;
> + }
> + return 0;
> +}
> +
> +static int mp_sendmsg(struct kiocb *iocb, struct socket *sock,
> + struct msghdr *m, size_t total_len)
> +{
> + struct mp_struct *mp = container_of(sock->sk, struct mp_sock, sk)->mp;
> + struct virtio_net_hdr vnet_hdr = {0};
> + int hdr_len = 0;
> + struct page_pool *pool;
> + struct iovec *iov = m->msg_iov;
> + struct page_info *info = NULL;
> + struct frag frags[MAX_SKB_FRAGS];
> + struct sk_buff *skb;
> + int count = m->msg_iovlen;
> + int total = 0, header, n, i, len, rc;
> + unsigned long base;
> +
> + pool = mp->pool;
> + if (!pool)
> + return -ENODEV;
> +
> + total = iov_length(iov, count);
> +
> + if (total < ETH_HLEN)
> + return -EINVAL;
> +
> + if (total <= COPY_THRESHOLD)
> + goto copy;
> +
> + n = 0;
> + for (i = 0; i < count; i++) {
> + base = (unsigned long)iov[i].iov_base;
> + len = iov[i].iov_len;
> + if (!len)
> + continue;
> + n += ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> + if (n > MAX_SKB_FRAGS)
> + return -EINVAL;
> + }
> +
> +copy:
> + hdr_len = sizeof(vnet_hdr);
> + if ((total - iocb->ki_iovec[0].iov_len) < 0)
> + return -EINVAL;
> +
> + rc = memcpy_fromiovecend((void *)&vnet_hdr, iocb->ki_iovec, 0, hdr_len);
> + if (rc < 0)
> + return -EINVAL;
> +
> + if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
> + vnet_hdr.csum_start + vnet_hdr.csum_offset + 2 >
> + vnet_hdr.hdr_len)
> + vnet_hdr.hdr_len = vnet_hdr.csum_start +
> + vnet_hdr.csum_offset + 2;
> +
> + if (vnet_hdr.hdr_len > total)
> + return -EINVAL;
> +
> + header = total > COPY_THRESHOLD ? COPY_HDR_LEN : total;
> +
> + skb = mp_alloc_skb(sock->sk, NET_IP_ALIGN, header,
> + iocb->ki_iovec[0].iov_len, 1, &rc);
> +
> + if (!skb)
> + goto drop;
> +
> + skb_set_network_header(skb, ETH_HLEN);
> + memcpy_fromiovec(skb->data, iov, header);
> +
> + skb_reset_mac_header(skb);
> + skb->protocol = eth_hdr(skb)->h_proto;
> +
> + rc = mp_skb_from_vnet_hdr(skb, &vnet_hdr);
> + if (rc)
> + goto drop;
> +
> + if (header == total) {
> + rc = total;
> + info = alloc_small_page_info(pool, iocb, total);
> + } else {
> + info = alloc_page_info(pool, iocb, iov, count, frags, 0, total);
> + if (info)
> + for (i = 0; i < info->pnum; i++) {
> + skb_add_rx_frag(skb, i, info->pages[i],
> + frags[i].offset, frags[i].size);
> + info->pages[i] = NULL;
> + }
> + }
> + if (!pool->lock_pages)
> + sock->sk->sk_state_change(sock->sk);
> +
> + if (info != NULL) {
> + info->desc_pos = iocb->ki_pos;
> + info->skb = skb;
> + skb_shinfo(skb)->destructor_arg = &info->ext_page;
> + skb->dev = mp->dev;
> + create_iocb(info, total);
> + dev_queue_xmit(skb);
> + return 0;
> + }
> +drop:
> + kfree_skb(skb);
> + if (info) {
> + for (i = 0; i < info->pnum; i++)
> + put_page(info->pages[i]);
> + kmem_cache_free(ext_page_info_cache, info);
> + }
> + mp->dev->stats.tx_dropped++;
> + return -ENOMEM;
> +}
> +
> +static int mp_recvmsg(struct kiocb *iocb, struct socket *sock,
> + struct msghdr *m, size_t total_len,
> + int flags)
> +{
> + struct mp_struct *mp = container_of(sock->sk, struct mp_sock, sk)->mp;
> + struct page_pool *pool;
> + struct iovec *iov = m->msg_iov;
> + int count = m->msg_iovlen;
> + int npages, payload;
> + struct page_info *info;
> + struct frag frags[MAX_SKB_FRAGS];
> + unsigned long base;
> + int i, len;
> + unsigned long flag;
> +
> + if (!(flags & MSG_DONTWAIT))
> + return -EINVAL;
> +
> + pool = mp->pool;
> + if (!pool)
> + return -EINVAL;
> +
> + /* Error detections in case invalid external buffer */
> + if (count > 2 && iov[1].iov_len < pool->port.hdr_len &&
> + mp->dev->features & NETIF_F_SG) {
> + return -EINVAL;
> + }
> +
> + npages = pool->port.npages;
> + payload = pool->port.data_len;
> +
> + /* If KVM guest virtio-net FE driver use SG feature */
> + if (count > 2) {
> + for (i = 2; i < count; i++) {
> + base = (unsigned long)iov[i].iov_base & ~PAGE_MASK;
> + len = iov[i].iov_len;
> + if (npages == 1)
> + len = min_t(int, len, PAGE_SIZE - base);
> + else if (base)
> + break;
> + payload -= len;
> + if (payload <= 0)
> + goto proceed;
> + if (npages == 1 || (len & ~PAGE_MASK))
> + break;
> + }
> + }
> +
> + if ((((unsigned long)iov[1].iov_base & ~PAGE_MASK)
> + - NET_SKB_PAD - NET_IP_ALIGN) >= 0)
> + goto proceed;
> +
> + return -EINVAL;
> +
> +proceed:
> + /* skip the virtnet head */
> + if (count > 1) {
> + iov++;
> + count--;
> + }
> +
> + if (!pool->lock_pages) {
> + set_memlock_rlimit(pool, RLIMIT_MEMLOCK,
> + iocb->ki_user_data * 4096 * 2,
> + iocb->ki_user_data * 4096 * 2);
> + }
> +
> + /* Translate address to kernel */
> + info = alloc_page_info(pool, iocb, iov, count, frags, npages, 0);
> + if (!info)
> + return -ENOMEM;
> + info->hdr[0].iov_base = iocb->ki_iovec[0].iov_base;
> + info->hdr[0].iov_len = iocb->ki_iovec[0].iov_len;
> + iocb->ki_iovec[0].iov_len = 0;
> + iocb->ki_left = 0;
> + info->desc_pos = iocb->ki_pos;
> +
> + if (count > 1) {
> + iov--;
> + count++;
> + }
> +
> + memcpy(info->iov, iov, sizeof(struct iovec) * count);
> +
> + spin_lock_irqsave(&pool->read_lock, flag);
> + list_add_tail(&info->list, &pool->readq);
> + spin_unlock_irqrestore(&pool->read_lock, flag);
> +
> + return 0;
> +}
> +
> +/* Ops structure to mimic raw sockets with mp device */
> +static const struct proto_ops mp_socket_ops = {
> + .sendmsg = mp_sendmsg,
> + .recvmsg = mp_recvmsg,
> +};
> +
> +static struct proto mp_proto = {
> + .name = "mp",
> + .owner = THIS_MODULE,
> + .obj_size = sizeof(struct mp_sock),
> +};
> +
> +static int mp_chr_open(struct inode *inode, struct file * file)
> +{
> + struct mp_file *mfile;
> + cycle_kernel_lock();
> +
> + pr_debug("mp: mp_chr_open\n");
> + mfile = kzalloc(sizeof(*mfile), GFP_KERNEL);
> + if (!mfile)
> + return -ENOMEM;
> + atomic_set(&mfile->count, 0);
> + mfile->mp = NULL;
> + mfile->net = get_net(current->nsproxy->net_ns);
> + file->private_data = mfile;
> + return 0;
> +}
> +
> +static int mp_attach(struct mp_struct *mp, struct file *file)
> +{
> + struct mp_file *mfile = file->private_data;
> + int err;
> +
> + netif_tx_lock_bh(mp->dev);
> +
> + err = -EINVAL;
> +
> + if (mfile->mp)
> + goto out;
> +
> + err = -EBUSY;
> + if (mp->mfile)
> + goto out;
> +
> + err = 0;
> + mfile->mp = mp;
> + mp->mfile = mfile;
> + mp->socket.file = file;
> + dev_hold(mp->dev);
> + sock_hold(mp->socket.sk);
> + atomic_inc(&mfile->count);
> +
> +out:
> + netif_tx_unlock_bh(mp->dev);
> + return err;
> +}
> +
> +static int do_unbind(struct mp_file *mfile)
> +{
> + struct mp_struct *mp = mp_get(mfile);
> +
> + if (!mp)
> + return -EINVAL;
> +
> + mp_detach(mp);
> + sock_put(mp->socket.sk);
> + mp_put(mfile);
> + return 0;
> +}
> +
> +static long mp_chr_ioctl(struct file *file, unsigned int cmd,
> + unsigned long arg)
> +{
> + struct mp_file *mfile = file->private_data;
> + struct mp_struct *mp;
> + struct net_device *dev;
> + void __user* argp = (void __user *)arg;
> + struct ifreq ifr;
> + struct sock *sk;
> + int ret;
> +
> + ret = -EINVAL;
> +
> + switch (cmd) {
> + case MPASSTHRU_BINDDEV:
> + ret = -EFAULT;
> + if (copy_from_user(&ifr, argp, sizeof ifr))
> + break;
> +
> + ifr.ifr_name[IFNAMSIZ-1] = '\0';
> +
> + ret = -ENODEV;
> +
> + rtnl_lock();
> + dev = dev_get_by_name(mfile->net, ifr.ifr_name);
> + if (!dev) {
> + rtnl_unlock();
> + break;
> + }
> +
> + mutex_lock(&mp_mutex);
> +
> + ret = -EBUSY;
> +
> + /* the device can be only bind once */
> + if (dev_is_mpassthru(dev))
> + goto err_dev_put;
> +
> + mp = mfile->mp;
> + if (mp)
> + goto err_dev_put;
> +
> + mp = kzalloc(sizeof(*mp), GFP_KERNEL);
> + if (!mp) {
> + ret = -ENOMEM;
> + goto err_dev_put;
> + }
> + mp->dev = dev;
> + ret = -ENOMEM;
> +
> + sk = sk_alloc(mfile->net, AF_UNSPEC, GFP_KERNEL, &mp_proto);
> + if (!sk)
> + goto err_free_mp;
> +
> + init_waitqueue_head(&mp->socket.wait);
> + mp->socket.ops = &mp_socket_ops;
> + sock_init_data(&mp->socket, sk);
> + sk->sk_sndbuf = INT_MAX;
> + container_of(sk, struct mp_sock, sk)->mp = mp;
> +
> + sk->sk_destruct = mp_sock_destruct;
> + sk->sk_data_ready = mp_sock_data_ready;
> + sk->sk_write_space = mp_sock_write_space;
> + sk->sk_state_change = mp_sock_state_change;
> + ret = mp_attach(mp, file);
> + if (ret < 0)
> + goto err_free_sk;
> +
> + ret = page_pool_attach(mp);
> + if (ret < 0)
> + goto err_free_sk;
> + dev_change_flags(mp->dev, mp->dev->flags & (~IFF_UP));
> + dev_change_flags(mp->dev, mp->dev->flags | IFF_UP);
> + sk->sk_state_change(sk);
> +out:
> + mutex_unlock(&mp_mutex);
> + rtnl_unlock();
> + break;
> +err_free_sk:
> + sk_free(sk);
> +err_free_mp:
> + kfree(mp);
> +err_dev_put:
> + dev_put(dev);
> + goto out;
> +
> + case MPASSTHRU_UNBINDDEV:
> + rtnl_lock();
> + ret = do_unbind(mfile);
> + rtnl_unlock();
> + break;
> +
> + default:
> + break;
> + }
> + return ret;
> +}
> +
> +static unsigned int mp_chr_poll(struct file *file, poll_table * wait)
> +{
> + struct mp_file *mfile = file->private_data;
> + struct mp_struct *mp = mp_get(mfile);
> + struct sock *sk;
> + unsigned int mask = 0;
> +
> + if (!mp)
> + return POLLERR;
> +
> + sk = mp->socket.sk;
> +
> + poll_wait(file, &mp->socket.wait, wait);
> +
> + if (!skb_queue_empty(&sk->sk_receive_queue))
> + mask |= POLLIN | POLLRDNORM;
> +
> + if (sock_writeable(sk) ||
> + (!test_and_set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags) &&
> + sock_writeable(sk)))
> + mask |= POLLOUT | POLLWRNORM;
> +
> + if (mp->dev->reg_state != NETREG_REGISTERED)
> + mask = POLLERR;
> +
> + mp_put(mfile);
> + return mask;
> +}
> +
> +static ssize_t mp_chr_aio_write(struct kiocb *iocb, const struct iovec *iov,
> + unsigned long count, loff_t pos)
> +{
> + struct file *file = iocb->ki_filp;
> + struct mp_struct *mp = mp_get(file->private_data);
> + struct sock *sk = mp->socket.sk;
> + struct sk_buff *skb;
> + int len, err;
> + ssize_t result = 0;
> +
> + if (!mp)
> + return -EBADFD;
> +
> + /* currently, async is not supported.
> + * but we may support real async aio from user application,
> + * maybe qemu virtio-net backend.
> + */
> + if (!is_sync_kiocb(iocb))
> + return -EFAULT;
> +
> + len = iov_length(iov, count);
> +
> + if (unlikely(len < ETH_HLEN))
> + return -EINVAL;
> +
> + skb = sock_alloc_send_skb(sk, len + NET_IP_ALIGN,
> + file->f_flags & O_NONBLOCK, &err);
> +
> + if (!skb)
> + return -ENOMEM;
> +
> + skb_reserve(skb, NET_IP_ALIGN);
> + skb_put(skb, len);
> +
> + if (skb_copy_datagram_from_iovec(skb, 0, iov, 0, len)) {
> + kfree_skb(skb);
> + return -EAGAIN;
> + }
> +
> + skb->protocol = eth_type_trans(skb, mp->dev);
> + skb->dev = mp->dev;
> +
> + dev_queue_xmit(skb);
> +
> + mp_put(file->private_data);
> + return result;
> +}
> +
> +static int mp_chr_close(struct inode *inode, struct file *file)
> +{
> + struct mp_file *mfile = file->private_data;
> +
> + /*
> + * Ignore return value since an error only means there was nothing to
> + * do
> + */
> + do_unbind(mfile);
> +
> + put_net(mfile->net);
> + kfree(mfile);
> +
> + return 0;
> +}
> +
> +#ifdef CONFIG_COMPAT
> +static long mp_chr_compat_ioctl(struct file *f, unsigned int ioctl,
> + unsigned long arg)
> +{
> + return mp_chr_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
> +}
> +#endif
> +
> +static const struct file_operations mp_fops = {
> + .owner = THIS_MODULE,
> + .llseek = no_llseek,
> + .write = do_sync_write,
> + .aio_write = mp_chr_aio_write,
> + .poll = mp_chr_poll,
> + .unlocked_ioctl = mp_chr_ioctl,
> +#ifdef CONFIG_COMPAT
> + .compat_ioctl = mp_chr_compat_ioctl,
> +#endif
> + .open = mp_chr_open,
> + .release = mp_chr_close,
> +};
> +
> +static struct miscdevice mp_miscdev = {
> + .minor = MISC_DYNAMIC_MINOR,
> + .name = "mp",
> + .nodename = "net/mp",
> + .fops = &mp_fops,
> +};
> +
> +static int mp_device_event(struct notifier_block *unused,
> + unsigned long event, void *ptr)
> +{
> + struct net_device *dev = ptr;
> + struct mp_port *port;
> + struct mp_struct *mp = NULL;
> + struct socket *sock = NULL;
> + struct sock *sk;
> +
> + port = dev->mp_port;
> + if (port == NULL)
> + return NOTIFY_DONE;
> +
> + switch (event) {
> + case NETDEV_UNREGISTER:
> + sock = dev->mp_port->sock;
> + mp = container_of(sock->sk, struct mp_sock, sk)->mp;
> + do_unbind(mp->mfile);
> + break;
> + case NETDEV_CHANGE:
> + sk = dev->mp_port->sock->sk;
> + sk->sk_state_change(sk);
> + break;
> + }
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block mp_notifier_block __read_mostly = {
> + .notifier_call = mp_device_event,
> +};
> +
> +static int mp_init(void)
> +{
> + int err = 0;
> +
> + ext_page_info_cache = kmem_cache_create("skb_page_info",
> + sizeof(struct page_info),
> + 0, SLAB_HWCACHE_ALIGN, NULL);
> + if (!ext_page_info_cache)
> + return -ENOMEM;
> +
> + err = misc_register(&mp_miscdev);
> + if (err) {
> + printk(KERN_ERR "mp: Can't register misc device\n");
> + kmem_cache_destroy(ext_page_info_cache);
> + } else {
> + printk(KERN_INFO "Registering mp misc device - minor = %d\n",
> + mp_miscdev.minor);
> + register_netdevice_notifier(&mp_notifier_block);
> + }
> + return err;
> +}
> +
> +void mp_exit(void)
> +{
> + unregister_netdevice_notifier(&mp_notifier_block);
> + misc_deregister(&mp_miscdev);
> + kmem_cache_destroy(ext_page_info_cache);
> +}
> +
> +/* Get an underlying socket object from mp file. Returns error unless file is
> + * attached to a device. The returned object works like a packet socket, it
> + * can be used for sock_sendmsg/sock_recvmsg. The caller is responsible for
> + * holding a reference to the file for as long as the socket is in use. */
> +struct socket *mp_get_socket(struct file *file)
> +{
> + struct mp_file *mfile = file->private_data;
> + struct mp_struct *mp;
> +
> + if (file->f_op != &mp_fops)
> + return ERR_PTR(-EINVAL);
> + mp = mp_get(mfile);
> + if (!mp)
> + return ERR_PTR(-EBADFD);
> + mp_put(mfile);
> + return &mp->socket;
> +}
> +EXPORT_SYMBOL_GPL(mp_get_socket);
> +
> +module_init(mp_init);
> +module_exit(mp_exit);
> +MODULE_AUTHOR(DRV_COPYRIGHT);
> +MODULE_DESCRIPTION(DRV_DESCRIPTION);
> +MODULE_LICENSE("GPL v2");
> --
> 1.7.3
^ permalink raw reply
* Re: [PATCHv3 net-next-2.6 3/5] XFRM,IPv6: Add IRO src/dst address remapping XFRM types and i/o handlers
From: Arnaud Ebalard @ 2010-10-03 13:41 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, eric.dumazet, yoshfuji, netdev
In-Reply-To: <20101002103205.GA3879@gondor.apana.org.au>
Hi Herbert,
Herbert Xu <herbert@gondor.apana.org.au> writes:
> On Sat, Oct 02, 2010 at 12:17:35PM +0200, Arnaud Ebalard wrote:
>>
>> and I see no reason not to keep the lock we have on the state until the
>> end of the function when the state is valid (when we break), instead of
>> releasing it to get it again later. Something like the following would
>> allow removing the spin_lock()/spin_unlock() calls from all mip6 input
>> handlers (mip6_{destopt,rthdr,iro_src,iro_dst}_input()):
>
> No I moved the state lock down precisely because it should not
> be taken at a higher level as that breaks asynchronous IPsec
> processing and the fact that it isn't needed in most places.
>
> If your code needs it then you should take it rather than impose
> it on real IPsec users.
Understood. Note that I am on your side with this: my primary concern
while pushing the feature is *to not break or slow down standard IPsec*.
I do not expect my code to be accepted or even read otherwise.
As for the current point raised by David on the position of the locks in
my input handlers, they are based on the position of the locks in the
*existing* RH2 (mip6_rthdr_input()) and HAO (mip6_destopt_input())
handlers. As they serve the same purpose (src/dst address check against
state's address) and the code is basically the same, I have no reason to
do things differently as what is currently upstream.
After your reply, I took a (too long) look at the history of
xfrm6_input_addr() to understand why it is as it is. If it can spare you
some time, here is what I think happened:
- Initially (commit fbd9a5b4, Aug 23 2006), the checks on the status of
state, the call to x->type->input() and the changes on state's
processing stats (x->curlft changes) were *globally* protected by a
call to spin_lock(). The same day, a related commit (3d126890) added
support for RH2/HAO input handler. No lock inside the handler. The
content of xfrm6_input_addr() was:
spin_lock(&x->lock);
<...snip...>
nh = x->type->input(x, skb);
if (nh <= 0) {
spin_unlock(&x->lock);
xfrm_state_put(x);
x = NULL;
continue;
}
x->curlft.bytes += skb->len;
x->curlft.packets++;
spin_unlock(&x->lock);
- Then, as you wrote, the state lock was moved in all input handlers
(commit 0ebea8ef, Nov 13 2007), including RH2/HAO ones:
@@ -128,12 +128,15 @@ static int mip6_destopt_input(struct xfrm_state *x, struct sk_buff *skb)
{
struct ipv6hdr *iph = ipv6_hdr(skb);
struct ipv6_destopt_hdr *destopt = (struct ipv6_destopt_hdr *)skb->data;
+ int err = destopt->nexthdr;
+ spin_lock(&x->lock);
if (!ipv6_addr_equal(&iph->saddr, (struct in6_addr *)x->coaddr) &&
!ipv6_addr_any((struct in6_addr *)x->coaddr))
- return -ENOENT;
+ err = -ENOENT;
+ spin_unlock(&x->lock);
- return destopt->nexthdr;
+ return err;
}
With that commit, I think a deadlock was introduced in MIPv6 code
because xfm6_input_addr() was left unchanged, i.e. x->type->input()
was called with the lock held. Am I correct?
- The code of xfrm6_input_addr() was then optimized by commit a002c6fd
in such a way that x->type->input() was then put outside the
protection of the lock, which (if I am not mistaken) removed the
deadlock:
spin_lock(&x->lock);
if ((!i || (x->props.flags & XFRM_STATE_WILDRECV)) &&
likely(x->km.state == XFRM_STATE_VALID) &&
!xfrm_state_check_expire(x)) {
spin_unlock(&x->lock);
if (x->type->input(x, skb) > 0) {
/* found a valid state */
break;
}
} else
spin_unlock(&x->lock);
I don't know if this is was intentional.
But the main question remains on the position of the lock. Here,
checks are done on the status of the state, lock is released,
reacquired in the input handler to do additional check and then
released again, to be reacquired later in the function to act on
statistics. Is my reading of the code correct?
Herbert, you certainly have a better understanding of XFRM code than I
have and can probably tell if the locking behavior above is valid or
buggy. Yoshifuji-san, David or Eric may also have good ideas on that.
As a side note (I think I was not explicit enough in my previous email),
I think the possible changes to xfrm_input_addr() and MIPv6 handlers we
are discussing are not expected to impact standard IPsec code because
there are 2 different cases in which states input handlers are called
(i.e. x->type->input()):
- xfrm_input(): for standard IPsec case (incl. async resumption). This
is only for esp, ah, ipcomp and tunneling.
- xfrm6_input_addr(): for MIPv6 extension header, i.e. RH2 and HAO in
destopt.
and we are discussing the second.
David, as for my patches, if this is ok for you, I will keep the code of
my input handlers aligned on the code of RH2/HAO handlers and will modify
it later based on the possible corrections made on those upstream.
Don't hesitate to slap me if I made some mistakes in my analysis ;-)
Cheers,
a+
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox