Netdev List
 help / color / mirror / Atom feed
* Re: r8169, enabling TX checksumming breaks things?
From: David Dillow @ 2009-09-23 14:02 UTC (permalink / raw)
  To: Denys Fedoryschenko; +Cc: romieu, netdev
In-Reply-To: <200909230915.27854.denys@visp.net.lb>

On Wed, 2009-09-23 at 09:15 +0300, Denys Fedoryschenko wrote:
> Hi
> 
> Is it expected that:
> 1)TX checksumming is off by default
> 2)If i try to enable it over ethtool -K eth0 tx on , TCP sessions on proxy 
> getting stuck, even in tcpdump looks everything fine and packets reaching 
> destination, i don't understand what is a reason of failure.
> Maybe if this feature supposed to not work - user must not be able just to 
> turn it on?

It is broken for large swaths of the hardware -- I have patches that got
it and TSO working on my hardware, and they provide a framework to see
about getting it working on yours.

Basically, the fields are in different places depending on the chip
revision. I'll try to dig those out tonight and send them along so we
can experiment.

^ permalink raw reply

* [PATCH] skge: Make sure both ports initialize correctly
From: Mike McCormack @ 2009-09-23 13:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

If allocation of the second ports fails, make sure that hw->ports
 is not 2 otherwise we'll crash trying to access the second port.

This fix is copied from a similar fix in the sky2 driver (ca519274...),
but is untested, as I don't have a skge card.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
---
 drivers/net/skge.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 62e852e..21b816f 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -3982,14 +3982,17 @@ static int __devinit skge_probe(struct pci_dev *pdev,
 	}
 	skge_show_addr(dev);
 
-	if (hw->ports > 1 && (dev1 = skge_devinit(hw, 1, using_dac))) {
-		if (register_netdev(dev1) == 0)
+	if (hw->ports > 1) {
+		dev1 = skge_devinit(hw, 1, using_dac);
+		if (dev1 && register_netdev(dev1) == 0)
 			skge_show_addr(dev1);
 		else {
 			/* Failure to register second port need not be fatal */
 			dev_warn(&pdev->dev, "register of second port failed\n");
 			hw->dev[1] = NULL;
-			free_netdev(dev1);
+			hw->ports = 1;
+			if (dev1)
+				free_netdev(dev1);
 		}
 	}
 	pci_set_drvdata(pdev, hw);
-- 
1.5.6.5


^ permalink raw reply related

* Re: [PATCH] net: Fix sock_wfree() race
From: Eric Dumazet @ 2009-09-23 13:44 UTC (permalink / raw)
  To: David Miller; +Cc: albcamus, parag.lkml, linux-kernel, netdev
In-Reply-To: <20090911.125242.244008840.davem@davemloft.net>

David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Fri, 11 Sep 2009 11:43:37 -0700 (PDT)
> 
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Wed, 09 Sep 2009 00:49:31 +0200
>>
>>> [PATCH] net: Fix sock_wfree() race
>>>
>>> Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
>>> (net: No more expensive sock_hold()/sock_put() on each tx)
>>> opens a window in sock_wfree() where another cpu
>>> might free the socket we are working on.
>>>
>>> Fix is to call sk->sk_write_space(sk) only
>>> while still holding a reference on sk.
>>>
>>> Since doing this call is done before the 
>>> atomic_sub(truesize, &sk->sk_wmem_alloc), we should pass truesize as 
>>> a bias for possible sk_wmem_alloc evaluations.
>>>
>>> Reported-by: Jike Song <albcamus@gmail.com>
>>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> Applied to net-next-2.6, thanks.  I'll queue up your simpler
>> version for -stable.
> 
> Eric, I have to revert, as you didn't update the callbacks
> of several protocols such as SCTP and RDS in this change.
> 
> Let me know when you have a fixed version of this patch :-)

Sorry for the delay David. But this is complex. I am not
sure we can do a clean and safe thing, not counting
the added bloat.

If we do :

void sock_wfree(struct sk_buff *skb)
{
        struct sock *sk = skb->sk;
        int res;

        if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
                sk->sk_write_space(sk, skb->truesize);

        res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
        /*
         * if sk_wmem_alloc reached 0, we are last user and should
         * free this sock, as sk_free() call could not do it.
         */
        if (res == 0)
                __sk_free(sk);
}


There is still a possibility multiple cpus call sock_wfree()
for the same socket, and that they all call sk_write_space()
with their bias, yet the protocol still has a possible too
big estimation of sk_wmem_alloc

We could miss to wakeup a blocked writer in case low sk->sk_sndbuf
values are setup. (One could argue that with small sk_sndbuf
values we should not have many packets in flight : Keep in mind
sk_sndbuf can be lowered by the user)


With second patch we instead have :

void sock_wfree(struct sk_buff *skb)
{
	struct sock *sk = skb->sk;
	unsigned int len = skb->truesize;

	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
		/*
		 * Keep a reference on sk_wmem_alloc, this will be released
		 * after sk_write_space() call
		 */
		atomic_sub(len - 1, &sk->sk_wmem_alloc);
		sk->sk_write_space(sk);
		len = 1;
	}
	/*
	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
	 * could not do because of in-flight packets
	 */
	if (atomic_sub_return(len, &sk->sk_wmem_alloc) == 0)
		__sk_free(sk);
}

The accumulated transient error on sk_wmem_alloc is then < num_online_cpus(),
that should be OK even for very small sk_sndbuf values.

Of course TCP doesnt have to pay the price of sk_write_space() and the second
atomic operation re-added by this fix.

Here is the patch for reference :

[PATCH] net: Fix sock_wfree() race

Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.

A fix is to call sk->sk_write_space(sk) while still
holding a reference on sk.


Reported-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/sock.c |   19 ++++++++++++-------
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 30d5446..e1f034e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1228,17 +1228,22 @@ void __init sk_init(void)
 void sock_wfree(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
-	int res;
+	unsigned int len = skb->truesize;
 
-	/* In case it might be waiting for more memory. */
-	res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
-	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
+	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
+		/*
+		 * Keep a reference on sk_wmem_alloc, this will be released
+		 * after sk_write_space() call
+		 */
+		atomic_sub(len - 1, &sk->sk_wmem_alloc);
 		sk->sk_write_space(sk);
+		len = 1;
+	}
 	/*
-	 * if sk_wmem_alloc reached 0, we are last user and should
-	 * free this sock, as sk_free() call could not do it.
+	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
+	 * could not do because of in-flight packets
 	 */
-	if (res == 0)
+	if (atomic_sub_return(len, &sk->sk_wmem_alloc) == 0)
 		__sk_free(sk);
 }
 EXPORT_SYMBOL(sock_wfree);


^ permalink raw reply related

* [PATCH] Phonet: fix race for port number in concurrent bind()
From: Rémi Denis-Courmont @ 2009-09-23 13:17 UTC (permalink / raw)
  To: netdev; +Cc: Rémi Denis-Courmont

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

Allocating a port number to a socket and hashing that socket shall be
an atomic operation with regards to other port allocation. Otherwise,
we could allocate a port that is already being allocated to another
socket.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
---
 net/phonet/socket.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 7a4ee39..07aa9f0 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -113,6 +113,8 @@ void pn_sock_unhash(struct sock *sk)
 }
 EXPORT_SYMBOL(pn_sock_unhash);
 
+static DEFINE_MUTEX(port_mutex);
+
 static int pn_socket_bind(struct socket *sock, struct sockaddr *addr, int len)
 {
 	struct sock *sk = sock->sk;
@@ -140,9 +142,11 @@ static int pn_socket_bind(struct socket *sock, struct sockaddr *addr, int len)
 		err = -EINVAL; /* attempt to rebind */
 		goto out;
 	}
+	WARN_ON(sk_hashed(sk));
+	mutex_lock(&port_mutex);
 	err = sk->sk_prot->get_port(sk, pn_port(handle));
 	if (err)
-		goto out;
+		goto out_port;
 
 	/* get_port() sets the port, bind() sets the address if applicable */
 	pn->sobject = pn_object(saddr, pn_port(pn->sobject));
@@ -150,6 +154,8 @@ static int pn_socket_bind(struct socket *sock, struct sockaddr *addr, int len)
 
 	/* Enable RX on the socket */
 	sk->sk_prot->hash(sk);
+out_port:
+	mutex_unlock(&port_mutex);
 out:
 	release_sock(sk);
 	return err;
@@ -357,8 +363,6 @@ const struct proto_ops phonet_stream_ops = {
 };
 EXPORT_SYMBOL(phonet_stream_ops);
 
-static DEFINE_MUTEX(port_mutex);
-
 /* allocate port for a socket */
 int pn_sock_get_port(struct sock *sk, unsigned short sport)
 {
@@ -370,9 +374,7 @@ int pn_sock_get_port(struct sock *sk, unsigned short sport)
 
 	memset(&try_sa, 0, sizeof(struct sockaddr_pn));
 	try_sa.spn_family = AF_PHONET;
-
-	mutex_lock(&port_mutex);
-
+	WARN_ON(!mutex_is_locked(&port_mutex));
 	if (!sport) {
 		/* search free port */
 		int port, pmin, pmax;
@@ -401,8 +403,6 @@ int pn_sock_get_port(struct sock *sk, unsigned short sport)
 		else
 			sock_put(tmpsk);
 	}
-	mutex_unlock(&port_mutex);
-
 	/* the port must be in use already */
 	return -EADDRINUSE;
 
-- 
1.6.0.4


^ permalink raw reply related

* [PATCH] Phonet: error on broadcast sending (unimplemented)
From: Rémi Denis-Courmont @ 2009-09-23 13:17 UTC (permalink / raw)
  To: netdev; +Cc: Rémi Denis-Courmont
In-Reply-To: <1253711831-7947-1-git-send-email-remi@remlab.net>

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

If we ever implement this, then we can stop returning an error.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
---
 include/linux/phonet.h |    1 +
 net/phonet/af_phonet.c |    6 ++++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/phonet.h b/include/linux/phonet.h
index 1ef5a07..e5126cf 100644
--- a/include/linux/phonet.h
+++ b/include/linux/phonet.h
@@ -38,6 +38,7 @@
 #define PNPIPE_IFINDEX		2
 
 #define PNADDR_ANY		0
+#define PNADDR_BROADCAST	0xFC
 #define PNPORT_RESOURCE_ROUTING	0
 
 /* Values for PNPIPE_ENCAP option */
diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
index a662e62..f60c0c2 100644
--- a/net/phonet/af_phonet.c
+++ b/net/phonet/af_phonet.c
@@ -168,6 +168,12 @@ static int pn_send(struct sk_buff *skb, struct net_device *dev,
 		goto drop;
 	}
 
+	/* Broadcast sending is not implemented */
+	if (pn_addr(dst) == PNADDR_BROADCAST) {
+		err = -EOPNOTSUPP;
+		goto drop;
+	}
+
 	skb_reset_transport_header(skb);
 	WARN_ON(skb_headroom(skb) & 1); /* HW assumes word alignment */
 	skb_push(skb, sizeof(struct phonethdr));
-- 
1.6.0.4


^ permalink raw reply related

* Re: [PATCH 1/3] iwmc3200top: Add Intel Wireless MultiCom 3200 top driver.
From: Tomas Winkler @ 2009-09-23 12:29 UTC (permalink / raw)
  To: Johannes Berg
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linville-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA, yi.zhu-ral2JQCrhuEAvxtiuMwx3w,
	inaky.perez-gonzalez-ral2JQCrhuEAvxtiuMwx3w,
	cindy.h.kao-ral2JQCrhuEAvxtiuMwx3w,
	guy.cohen-ral2JQCrhuEAvxtiuMwx3w,
	ron.rindjunsky-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <1253691283.4458.38.camel-YfaajirXv2244ywRPIzf9A@public.gmane.org>

On Wed, Sep 23, 2009 at 10:34 AM, Johannes Berg
<johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> wrote:
> On Wed, 2009-09-23 at 10:23 +0300, Tomas Winkler wrote:
>
>> From HW perspective your assumption is not exactly correct. All the
>> devices are visible on the SDIO bus but they are not operational
>> (probe won't succeed) until TOP download the firmware and kicks the
>> devices. From SW perspective to create another bus layer is an option.
>> I'm not sure if it's not more complicated one.
>
> Ah, ok, so it is quite different. Not sure how sdio probing works, so I
> guess I can't say much here.

This is not about SDIO probing this is rather unusual HW design.
Anyhow all comments and ideas are welcome.

Thanks
Tomas
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: fanotify as syscalls
From: Arjan van de Ven @ 2009-09-23 11:32 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Davide Libenzi, Andreas Gruenbacher, Jamie Lokier, Eric Paris,
	Linus Torvalds, Evgeniy Polyakov, David Miller,
	Linux Kernel Mailing List, linux-fsdevel@vger.kernel.org,
	netdev@vger.kernel.org, viro@zeniv.linux.org.uk,
	alan@linux.intel.com, hch@infradead.org
In-Reply-To: <200909230939.34003.tvrtko.ursulin@sophos.com>

On Wed, 23 Sep 2009 09:39:33 +0100
Tvrtko Ursulin <tvrtko.ursulin@sophos.com> wrote:

> Lived with it because there was no other option. We used LSM while it
> was available for modules but then it was taken away. 

... at which point you could have submitted your LSM module for
inclusion... you'd be the first (and only?) Anti Virus vendor that
would be in the mainline kernel.. speaking of competitive advantage,
coming out of the box in all distributions.

sadly this road hasn't been chosen....



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* Re: fanotify as syscalls
From: hch @ 2009-09-23 11:20 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Davide Libenzi, Andreas Gruenbacher, Jamie Lokier, Eric Paris,
	Linus Torvalds, Evgeniy Polyakov, David Miller,
	Linux Kernel Mailing List, linux-fsdevel@vger.kernel.org,
	netdev@vger.kernel.org, viro@zeniv.linux.org.uk,
	alan@linux.intel.com, hch@infradead.org
In-Reply-To: <200909230939.34003.tvrtko.ursulin@sophos.com>

On Wed, Sep 23, 2009 at 09:39:33AM +0100, Tvrtko Ursulin wrote:
> Lived with it because there was no other option. We used LSM while it was 
> available for modules but then it was taken away. 
> 
> And not all vendors even use syscall interception, not even across platforms, 
> of which you sound so sure about. You can't even scan something which is not 
> in your namespace if you are at the syscall level. And you can't catch things 
> like kernel nfsd. No, syscall interception is not really appropriate at all.

The "Anti-Malware" industry is just snake oil anyway.  I think the
proper approach to support it is just to add various no-op exports claim
to do something and all the people requiring anti-virus on Linux will be
just as happy with it.


^ permalink raw reply

* Re: [PATCH][RESEND 3] IPv6: 6rd tunnel mode
From: Alexandre Cassen @ 2009-09-23 11:07 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki; +Cc: netdev
In-Reply-To: <20090923184314.a2a2701d.yoshfuji@linux-ipv6.org>

On Wed, 2009-09-23 at 18:43 +0900, YOSHIFUJI Hideaki wrote:
> Hello.
> 
> First of all, thank you for this work.
> 
> On Wed, 23 Sep 2009 00:02:51 +0200
> Alexandre Cassen <acassen@freebox.fr> wrote:
> 
> > This patch add support to 6rd tunnel mode as described into
> > draft-despres-6rd-03.
> > 
> > Patch history :
> > * http://patchwork.ozlabs.org/patch/26870/
> > * http://patchwork.ozlabs.org/patch/34026/
> > * http://patchwork.ozlabs.org/patch/34045/
> > 
> > IPv6 rapid deployment (draft-despres-6rd-03) builds upon mechanisms
> 
> Well, I was confused.  I think draft-softwire-ipv6-6rd
> is the latest one, no?

draft-despres-6rd-03    : targeting informational RFC
                          (=> currently pending)
draft-softwire-ipv6-6rd : targeting standard track

after last IETF meeting previous draft-townsley-ipv6-6rd as been pushed
to IETG softwires WG.

So you right, ref should be set to draft-softwire-ipv6-6rd.

> Another comment is that we should combine 6to4
> and 6rd.

Completly agree. 6to4 is a special case of 6rd.

Okay, so I stop producing (fixing according to last comments) and
resending new patch for 6rd.

regs,
Alexandre


^ permalink raw reply

* Re: Resend: [PATCH] TCP Early Retransmit: reduce required dupacks for triggering fast retrans
From: Ilpo Järvinen @ 2009-09-23  9:58 UTC (permalink / raw)
  To: Christian Samsel, David Miller; +Cc: Netdev
In-Reply-To: <fab65f44d75b.4ab891f2@rwth-aachen.de>

On Tue, 22 Sep 2009, Christian Samsel wrote:

> This patch implements draft-ietf-tcpm-early-rexmt. The early retransmit 
> mechanism allows the transport to reduce the number of duplicate
> acknowledgments required to trigger a fast retransmission in case we
> don't expect enough dupacks, (e.g. because there are not enough
> packets inflight and nothing to send). This allows the transport to use
> fast retransmit to recover packet losses that would otherwise require
> a lengthy retransmission timeout.
> 
> See: http://tools.ietf.org/html/draft-ietf-tcpm-early-rexmt-01
> 
> Signed-off-by: Christian Samsel <christian.samsel@rwth-aachen.de>
> 
> ---
>  net/ipv4/tcp_input.c |   16 ++++++++++++++++
>  1 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index af6d6fa..c0cc4fd 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2913,6 +2913,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
>   int do_lost = is_dupack || ((flag & FLAG_DATA_SACKED) &&
>                                      (tcp_fackets_out(tp) > tp->reordering));
>   int fast_rexmit = 0, mib_idx;
> + u32 in_flight;
>  
>   if (WARN_ON(!tp->packets_out && tp->sacked_out))
>           tp->sacked_out = 0;
> @@ -3062,6 +3063,21 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
>   if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
>           tcp_update_scoreboard(sk, fast_rexmit);
>   tcp_cwnd_down(sk, flag);
> +       
> +
> + /* draft-ietf-tcpm-early-rexmt: lowers dup ack threshold to prevent rto
> +         * in case we don't expect enough dup ack. if number of outstanding
> +         * packets is less than four and there is either no unsent data ready
> +         * for transmission or the advertised window does not permit new
> +         * segments.
> +         */
> + in_flight = tcp_packets_in_flight(tp);
> + if ( in_flight < 4 && (skb_queue_empty(&sk->sk_write_queue) ||
> +         tcp_may_send_now(sk) == 0) )
> +         tp->reordering = in_flight - 1;
> + else if (tp->reordering != sysctl_tcp_reordering)
> +         tp->reordering = sysctl_tcp_reordering;
> +
>   tcp_xmit_retransmit_queue(sk);
>  }

This is entirely flawed approach, I'd recommend you start from the 
scratch, almost nothing of this current one is worth keeping (expect 
parts of the comment). ...It will just not work for many cases, however, 
it's nice that you tried nevertheless.

First, the right place to change is in tcp_time_to_recover(). Another 
thing you need is to add a min() into tcp_update_scoreboard. Also, I don't 
think you should be touching tp->reordering at all to artificially lower 
the threshold for a period, just calculate the artificial value on the 
fly. And skb_queue_empty is not doing what you want, in fact I'm unsure 
what you want it to do in the first place? Instead of four, use 
tp->reordering. (I could have coded all that in couple of minutes, in fact 
in less than writing this mail but it's more useful that you go to those 
places, learn and code that instead :-)).

With all the cases that I know to not work with this _submitted_ version, 
I doubt that this is well tested, if any at all. ...I hope you're not 
submitting somebody elses work without understanding at all what the code 
does and what it doesn't...?

Also, before starting, please go through what is written in 
Documentation/CodingStyle.

-- 
 i.

^ permalink raw reply

* Re: [PATCH][RESEND 3] IPv6: 6rd tunnel mode
From: YOSHIFUJI Hideaki @ 2009-09-23  9:43 UTC (permalink / raw)
  To: Alexandre Cassen; +Cc: yoshfuji, netdev
In-Reply-To: <20090922220251.GA22874@lnxos.staff.proxad.net>

Hello.

First of all, thank you for this work.

On Wed, 23 Sep 2009 00:02:51 +0200
Alexandre Cassen <acassen@freebox.fr> wrote:

> This patch add support to 6rd tunnel mode as described into
> draft-despres-6rd-03.
> 
> Patch history :
> * http://patchwork.ozlabs.org/patch/26870/
> * http://patchwork.ozlabs.org/patch/34026/
> * http://patchwork.ozlabs.org/patch/34045/
> 
> IPv6 rapid deployment (draft-despres-6rd-03) builds upon mechanisms

Well, I was confused.  I think draft-softwire-ipv6-6rd
is the latest one, no?

Another comment is that we should combine 6to4
and 6rd.

In fact, I've been taking care of it since I met
with Mark Townsley last week.  Here's my tentative
version for reference.

Several points:
 - based on latest version.
 - share code path with 6to4.

(If anyone can invent better bitops,
it will great help...)

Regards,

--yoshfuji

----
>From 7c82f67d361155a2e8ee831c66c9663617ae45bc Mon Sep 17 00:00:00 2001
From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date: Tue, 22 Sep 2009 16:29:54 +0900
Subject: [PATCH] ipv6 sit: 6rd (IPv6 Rapid Deployment) Support.

IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
deploy IPv6 unicast service to IPv4 sites to which it provides
customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
IPv4 encapsulation in order to transit IPv4-only network
infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
prefix of its own in place of the fixed 6to4 prefix.

With this option enabled, the SIT driver offers 6rd functionality by
providing additional ioctl API to configure the IPv6 Prefix for in
stead of static 2002::/16 for 6to4.

Original patch was done by Alexandre Cassen <acassen@freebox.fr>
based on old Internet-Draft.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
---
 include/linux/if_tunnel.h |   11 ++++
 include/net/ipip.h        |   13 +++++
 net/ipv6/Kconfig          |   19 +++++++
 net/ipv6/sit.c            |  124 ++++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 159 insertions(+), 8 deletions(-)

diff --git a/include/linux/if_tunnel.h b/include/linux/if_tunnel.h
index 5eb9b0f..cab4938 100644
--- a/include/linux/if_tunnel.h
+++ b/include/linux/if_tunnel.h
@@ -15,6 +15,10 @@
 #define SIOCADDPRL      (SIOCDEVPRIVATE + 5)
 #define SIOCDELPRL      (SIOCDEVPRIVATE + 6)
 #define SIOCCHGPRL      (SIOCDEVPRIVATE + 7)
+#define SIOCGET6RD      (SIOCDEVPRIVATE + 8)
+#define SIOCADD6RD      (SIOCDEVPRIVATE + 9)
+#define SIOCDEL6RD      (SIOCDEVPRIVATE + 10)
+#define SIOCCHG6RD      (SIOCDEVPRIVATE + 11)
 
 #define GRE_CSUM	__cpu_to_be16(0x8000)
 #define GRE_ROUTING	__cpu_to_be16(0x4000)
@@ -51,6 +55,13 @@ struct ip_tunnel_prl {
 /* PRL flags */
 #define	PRL_DEFAULT		0x0001
 
+struct ip_tunnel_6rd {
+	struct in6_addr		prefix;
+	__be32			relay_prefix;
+	__u16			prefixlen;
+	__u16			relay_prefixlen;
+};
+
 enum
 {
 	IFLA_GRE_UNSPEC,
diff --git a/include/net/ipip.h b/include/net/ipip.h
index 5d3036f..157be1c 100644
--- a/include/net/ipip.h
+++ b/include/net/ipip.h
@@ -7,6 +7,15 @@
 /* Keep error state on tunnel for 30 sec */
 #define IPTUNNEL_ERR_TIMEO	(30*HZ)
 
+/* 6rd prefix/relay information */
+struct ip_tunnel_6rd_parm
+{
+	struct in6_addr		prefix;
+	__be32			relay_prefix;
+	u16			prefixlen;
+	u16			relay_prefixlen;
+};
+
 struct ip_tunnel
 {
 	struct ip_tunnel	*next;
@@ -24,6 +33,10 @@ struct ip_tunnel
 
 	struct ip_tunnel_parm	parms;
 
+	/* for SIT */
+#ifdef CONFIG_IPV6_SIT_6RD
+	struct ip_tunnel_6rd_parm	ip6rd;
+#endif
 	struct ip_tunnel_prl_entry	*prl;		/* potential router list */
 	unsigned int			prl_count;	/* # of entries in PRL */
 };
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index ead6c7a..f561998 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -170,6 +170,25 @@ config IPV6_SIT
 
 	  Saying M here will produce a module called sit. If unsure, say Y.
 
+config IPV6_SIT_6RD
+	bool "IPv6: IPv6 Rapid Development (6RD) (EXPERIMENTAL)"
+	depends on IPV6_SIT && EXPERIMENTAL
+	default n
+	---help---
+	  IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
+	  mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
+	  deploy IPv6 unicast service to IPv4 sites to which it provides
+	  customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
+	  IPv4 encapsulation in order to transit IPv4-only network
+	  infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
+	  prefix of its own in place of the fixed 6to4 prefix.
+
+	  With this option enabled, the SIT driver offers 6rd functionality by
+	  providing additional ioctl API to configure the IPv6 Prefix for in
+	  stead of static 2002::/16 for 6to4.
+
+	  If unsure, say N.
+
 config IPV6_NDISC_NODETYPE
 	bool
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0ae4f64..14bd503 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -162,6 +162,21 @@ static void ipip6_tunnel_link(struct sit_net *sitn, struct ip_tunnel *t)
 	write_unlock_bh(&ipip6_lock);
 }
 
+static void ipip6_tunnel_clone_6rd(struct ip_tunnel *t, struct sit_net *sitn)
+{
+#ifdef CONFIG_IPV6_SIT_6RD
+	if (t->dev == sitn->fb_tunnel_dev) {
+		ipv6_addr_set(&t->ip6rd.prefix, htonl(0x20020000), 0, 0, 0);
+		t->ip6rd.relay_prefix = 0;
+		t->ip6rd.prefixlen = 16;
+		t->ip6rd.relay_prefixlen = 0;
+	} else {
+		struct ip_tunnel *t0 = netdev_priv(sitn->fb_tunnel_dev);
+		memcpy(&t->ip6rd, &t0->ip6rd, sizeof(t->ip6rd));
+	}
+#endif
+}
+
 static struct ip_tunnel * ipip6_tunnel_locate(struct net *net,
 		struct ip_tunnel_parm *parms, int create)
 {
@@ -214,6 +229,8 @@ static struct ip_tunnel * ipip6_tunnel_locate(struct net *net,
 
 	dev_hold(dev);
 
+	ipip6_tunnel_clone_6rd(t, sitn);
+
 	ipip6_tunnel_link(sitn, nt);
 	return nt;
 
@@ -590,17 +607,41 @@ out:
 	return 0;
 }
 
-/* Returns the embedded IPv4 address if the IPv6 address
-   comes from 6to4 (RFC 3056) addr space */
-
-static inline __be32 try_6to4(struct in6_addr *v6dst)
+/*
+ * Returns the embedded IPv4 address if the IPv6 address
+ * comes from 6rd / 6to4 (RFC 3056) addr space.
+ */
+static inline
+__be32 try_6rd(struct in6_addr *v6dst, struct ip_tunnel *tunnel)
 {
 	__be32 dst = 0;
 
+#ifdef CONFIG_IPV6_SIT_6RD
+	if (ipv6_prefix_equal(v6dst, &tunnel->ip6rd.prefix,
+			      tunnel->ip6rd.prefixlen)) {
+		unsigned pbw0, pbi0;
+		int pbi1;
+		u32 d;
+
+		pbw0 = tunnel->ip6rd.prefixlen >> 5;
+		pbi0 = tunnel->ip6rd.prefixlen & 0x1f;
+
+		d = (ntohl(tunnel->ip6rd.prefix.s6_addr32[pbw0]) << pbi0) >>
+		    tunnel->ip6rd.relay_prefixlen;
+
+		pbi1 = pbi0 - tunnel->ip6rd.relay_prefixlen;
+		if (pbi1 > 0)
+			d |= ntohl(tunnel->ip6rd.prefix.s6_addr32[pbw0 + 1]) >>
+			     (32 - pbi1);
+
+		dst = tunnel->ip6rd.relay_prefix | htonl(d);
+	}
+#else
 	if (v6dst->s6_addr16[0] == htons(0x2002)) {
 		/* 6to4 v6 addr has 16 bits prefix, 32 v4addr, 16 SLA, ... */
 		memcpy(&dst, &v6dst->s6_addr16[1], 4);
 	}
+#endif
 	return dst;
 }
 
@@ -658,7 +699,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 	}
 
 	if (!dst)
-		dst = try_6to4(&iph6->daddr);
+		dst = try_6rd(&iph6->daddr, tunnel);
 
 	if (!dst) {
 		struct neighbour *neigh = NULL;
@@ -851,9 +892,15 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	struct ip_tunnel *t;
 	struct net *net = dev_net(dev);
 	struct sit_net *sitn = net_generic(net, sit_net_id);
+#ifdef CONFIG_IPV6_SIT_6RD
+	struct ip_tunnel_6rd ip6rd;
+#endif
 
 	switch (cmd) {
 	case SIOCGETTUNNEL:
+#ifdef CONFIG_IPV6_SIT_6RD
+	case SIOCGET6RD:
+#endif
 		t = NULL;
 		if (dev == sitn->fb_tunnel_dev) {
 			if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) {
@@ -864,9 +911,25 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 		}
 		if (t == NULL)
 			t = netdev_priv(dev);
-		memcpy(&p, &t->parms, sizeof(p));
-		if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof(p)))
-			err = -EFAULT;
+
+		err = -EFAULT;
+		if (cmd == SIOCGETTUNNEL) {
+			memcpy(&p, &t->parms, sizeof(p));
+			if (copy_to_user(ifr->ifr_ifru.ifru_data, &p,
+					 sizeof(p)))
+				goto done;
+#ifdef CONFIG_IPV6_SIT_6RD
+		} else {
+			ipv6_addr_copy(&ip6rd.prefix, &t->ip6rd.prefix);
+			ip6rd.relay_prefix = t->ip6rd.relay_prefix;
+			ip6rd.prefixlen = t->ip6rd.prefixlen;
+			ip6rd.relay_prefixlen = t->ip6rd.relay_prefixlen;
+			if (copy_to_user(ifr->ifr_ifru.ifru_data, &ip6rd,
+					 sizeof(ip6rd)))
+				goto done;
+#endif
+		}
+		err = 0;
 		break;
 
 	case SIOCADDTUNNEL:
@@ -987,6 +1050,51 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 		netdev_state_change(dev);
 		break;
 
+#ifdef CONFIG_IPV6_SIT_6RD
+	case SIOCADD6RD:
+	case SIOCCHG6RD:
+	case SIOCDEL6RD:
+		err = -EPERM;
+		if (!capable(CAP_NET_ADMIN))
+			goto done;
+
+		err = -EFAULT;
+		if (copy_from_user(&ip6rd, ifr->ifr_ifru.ifru_data,
+				   sizeof(ip6rd)))
+			goto done;
+
+		t = netdev_priv(dev);
+
+		if (cmd != SIOCDEL6RD) {
+			struct in6_addr prefix;
+			__be32 relay_prefix;
+
+			err = -EINVAL;
+			if (ip6rd.relay_prefixlen > 32 ||
+			    ip6rd.prefixlen + (32 - ip6rd.relay_prefixlen) > 64)
+				goto done;
+
+			ipv6_addr_prefix(&prefix, &ip6rd.prefix,
+					 ip6rd.prefixlen);
+			if (!ipv6_addr_equal(&prefix, &ip6rd.prefix))
+				goto done;
+			relay_prefix = ip6rd.relay_prefix &
+				       htonl(0xffffffffUL <<
+					     (32 - ip6rd.relay_prefixlen));
+			if (relay_prefix != ip6rd.relay_prefix)
+				goto done;
+
+			ipv6_addr_copy(&t->ip6rd.prefix, &prefix);
+			t->ip6rd.relay_prefix = relay_prefix;
+			t->ip6rd.prefixlen = ip6rd.prefixlen;
+			t->ip6rd.relay_prefixlen = ip6rd.relay_prefixlen;
+		} else
+			ipip6_tunnel_clone_6rd(t, sitn);
+
+		err = 0;
+		break;
+#endif
+
 	default:
 		err = -EINVAL;
 	}
-- 
1.5.6.5


--yoshfuji

^ permalink raw reply related

* [PATCH] genetlink: fix netns vs. netlink table locking (2)
From: Johannes Berg @ 2009-09-23  9:34 UTC (permalink / raw)
  To: netdev

Similar to commit d136f1bd366fdb7e747ca7e0218171e7a00a98a5,
there's a bug when unregistering a generic netlink family,
which is caught by the might_sleep() added in that commit:

    BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
    in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
    2 locks held by rmmod/1510:
     #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130
     #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
    Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
    Call Trace:
     [<ffffffff81044ff9>] __might_sleep+0x119/0x150
     [<ffffffff81380501>] netlink_table_grab+0x21/0x100
     [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
     [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
     [<ffffffff81382866>] genl_unregister_family+0x56/0x130
     [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
     [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]

Fix in the same way by grabbing the netlink table lock
before doing rcu_read_lock().

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
 include/linux/netlink.h  |    1 +
 net/netlink/af_netlink.c |   19 +++++++++++--------
 net/netlink/genetlink.c  |    4 +++-
 3 files changed, 15 insertions(+), 9 deletions(-)

--- wireless-testing.orig/include/linux/netlink.h	2009-09-23 11:15:56.000000000 +0200
+++ wireless-testing/include/linux/netlink.h	2009-09-23 11:16:14.000000000 +0200
@@ -187,6 +187,7 @@ extern struct sock *netlink_kernel_creat
 extern void netlink_kernel_release(struct sock *sk);
 extern int __netlink_change_ngroups(struct sock *sk, unsigned int groups);
 extern int netlink_change_ngroups(struct sock *sk, unsigned int groups);
+extern void __netlink_clear_multicast_users(struct sock *sk, unsigned int group);
 extern void netlink_clear_multicast_users(struct sock *sk, unsigned int group);
 extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err);
 extern int netlink_has_listeners(struct sock *sk, unsigned int group);
--- wireless-testing.orig/net/netlink/af_netlink.c	2009-09-23 11:09:44.000000000 +0200
+++ wireless-testing/net/netlink/af_netlink.c	2009-09-23 11:14:52.000000000 +0200
@@ -1610,6 +1610,16 @@ int netlink_change_ngroups(struct sock *
 }
 EXPORT_SYMBOL(netlink_change_ngroups);
 
+void __netlink_clear_multicast_users(struct sock *ksk, unsigned int group)
+{
+	struct sock *sk;
+	struct hlist_node *node;
+	struct netlink_table *tbl = &nl_table[ksk->sk_protocol];
+
+	sk_for_each_bound(sk, node, &tbl->mc_list)
+		netlink_update_socket_mc(nlk_sk(sk), group, 0);
+}
+
 /**
  * netlink_clear_multicast_users - kick off multicast listeners
  *
@@ -1620,15 +1630,8 @@ EXPORT_SYMBOL(netlink_change_ngroups);
  */
 void netlink_clear_multicast_users(struct sock *ksk, unsigned int group)
 {
-	struct sock *sk;
-	struct hlist_node *node;
-	struct netlink_table *tbl = &nl_table[ksk->sk_protocol];
-
 	netlink_table_grab();
-
-	sk_for_each_bound(sk, node, &tbl->mc_list)
-		netlink_update_socket_mc(nlk_sk(sk), group, 0);
-
+	__netlink_clear_multicast_users(ksk, group);
 	netlink_table_ungrab();
 }
 EXPORT_SYMBOL(netlink_clear_multicast_users);
--- wireless-testing.orig/net/netlink/genetlink.c	2009-09-23 11:09:46.000000000 +0200
+++ wireless-testing/net/netlink/genetlink.c	2009-09-23 11:16:50.000000000 +0200
@@ -220,10 +220,12 @@ static void __genl_unregister_mc_group(s
 	struct net *net;
 	BUG_ON(grp->family != family);
 
+	netlink_table_grab();
 	rcu_read_lock();
 	for_each_net_rcu(net)
-		netlink_clear_multicast_users(net->genl_sock, grp->id);
+		__netlink_clear_multicast_users(net->genl_sock, grp->id);
 	rcu_read_unlock();
+	netlink_table_ungrab();
 
 	clear_bit(grp->id, mc_groups);
 	list_del(&grp->list);



^ permalink raw reply

* Re: fanotify as syscalls
From: Tvrtko Ursulin @ 2009-09-23  8:39 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Andreas Gruenbacher, Jamie Lokier, Eric Paris, Linus Torvalds,
	Evgeniy Polyakov, David Miller, Linux Kernel Mailing List,
	linux-fsdevel@vger.kernel.org, netdev@vger.kernel.org,
	viro@zeniv.linux.org.uk, alan@linux.intel.com, hch@infradead.org
In-Reply-To: <alpine.DEB.2.00.0909220836200.10460@makko.or.mcafeemobile.com>

On Tuesday 22 September 2009 17:04:44 Davide Libenzi wrote:
> On Tue, 22 Sep 2009, Andreas Gruenbacher wrote:
> > The fatal flaw of syscall interception is race conditions: you look up a
> > pathname in your interception layer; then when you call into the proper
> > syscall, the kernel again looks up the same pathname. There is no way to
> > guarantee that you end up at the same object in both lookups. The
> > security and fsnotify hooks are placed in the appropriate spots to avoid
> > exactly that.
>
> Fatal? You mean, for this corner case that the anti-malware industry lived
> with for so much time (in Linux and Windows), you're prepared in pushing
> all the logic that is currently implemented into their modules, into the
> kernel?

Lived with it because there was no other option. We used LSM while it was 
available for modules but then it was taken away. 

And not all vendors even use syscall interception, not even across platforms, 
of which you sound so sure about. You can't even scan something which is not 
in your namespace if you are at the syscall level. And you can't catch things 
like kernel nfsd. No, syscall interception is not really appropriate at all.

Tvrtko

^ permalink raw reply

* Re: [PATCH 1/3] iwmc3200top: Add Intel Wireless MultiCom 3200 top driver.
From: Johannes Berg @ 2009-09-23  7:34 UTC (permalink / raw)
  To: Tomas Winkler
  Cc: davem, linville, netdev, linux-wireless, linux-mmc, yi.zhu,
	inaky.perez-gonzalez, cindy.h.kao, guy.cohen, ron.rindjunsky
In-Reply-To: <1ba2fa240909230023v17fe2b49v4981d464dba469ed@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 510 bytes --]

On Wed, 2009-09-23 at 10:23 +0300, Tomas Winkler wrote:

> From HW perspective your assumption is not exactly correct. All the
> devices are visible on the SDIO bus but they are not operational
> (probe won't succeed) until TOP download the firmware and kicks the
> devices. From SW perspective to create another bus layer is an option.
> I'm not sure if it's not more complicated one.

Ah, ok, so it is quite different. Not sure how sdio probing works, so I
guess I can't say much here.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH 1/3] iwmc3200top: Add Intel Wireless MultiCom 3200 top driver.
From: Tomas Winkler @ 2009-09-23  7:23 UTC (permalink / raw)
  To: Johannes Berg
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linville-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA, yi.zhu-ral2JQCrhuEAvxtiuMwx3w,
	inaky.perez-gonzalez-ral2JQCrhuEAvxtiuMwx3w,
	cindy.h.kao-ral2JQCrhuEAvxtiuMwx3w,
	guy.cohen-ral2JQCrhuEAvxtiuMwx3w,
	ron.rindjunsky-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <1253689036.4458.22.camel-YfaajirXv2244ywRPIzf9A@public.gmane.org>

On Wed, Sep 23, 2009 at 9:57 AM, Johannes Berg
<johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> wrote:
> On Wed, 2009-09-23 at 02:38 +0300, Tomas Winkler wrote:
>
>> +config IWMC3200TOP
>> +        tristate "Intel Wireless MultiCom Top Driver"
>> +        depends on MMC && EXPERIMENTAL
>> +        select FW_LOADER
>> +     ---help---
>> +       Intel Wireless MultiCom 3200 Top driver is responsible for
>> +       for firmware load and enabled coms enumeration
>
> This seems like the wrong approach to me.
>
> To me, it seems like you have a device that contains an internal bus and
> allows bus enumeration. Typically, we would surface that bus in the
> driver/device model and allow sub-drivers to bind to that by way of
> exposing the internal bus, like e.g. drivers/ssb/.

From HW perspective your assumption is not exactly correct. All the
devices are visible on the SDIO bus but they are not operational
(probe won't succeed) until TOP download the firmware and kicks the
devices. From SW perspective to create another bus layer is an option.
I'm not sure if it's not more complicated one.

Thanks
Tomas
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/3] iwmc3200top: Add Intel Wireless MultiCom 3200 top driver.
From: Johannes Berg @ 2009-09-23  6:57 UTC (permalink / raw)
  To: Tomas Winkler
  Cc: davem, linville, netdev, linux-wireless, linux-mmc, yi.zhu,
	inaky.perez-gonzalez, cindy.h.kao, guy.cohen, ron.rindjunsky
In-Reply-To: <1253662724-16497-2-git-send-email-tomas.winkler@intel.com>

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

On Wed, 2009-09-23 at 02:38 +0300, Tomas Winkler wrote:

> +config IWMC3200TOP
> +        tristate "Intel Wireless MultiCom Top Driver"
> +        depends on MMC && EXPERIMENTAL
> +        select FW_LOADER
> +	---help---
> +	  Intel Wireless MultiCom 3200 Top driver is responsible for
> +	  for firmware load and enabled coms enumeration

This seems like the wrong approach to me.

To me, it seems like you have a device that contains an internal bus and
allows bus enumeration. Typically, we would surface that bus in the
driver/device model and allow sub-drivers to bind to that by way of
exposing the internal bus, like e.g. drivers/ssb/.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* r8169, enabling TX checksumming breaks things?
From: Denys Fedoryschenko @ 2009-09-23  6:15 UTC (permalink / raw)
  To: romieu, netdev

Hi

Is it expected that:
1)TX checksumming is off by default
2)If i try to enable it over ethtool -K eth0 tx on , TCP sessions on proxy 
getting stuck, even in tcpdump looks everything fine and packets reaching 
destination, i don't understand what is a reason of failure.
Maybe if this feature supposed to not work - user must not be able just to 
turn it on?

Checksum OFF, connection established, no data received.
www.nuclearcat.com/files/r8169_tx_off.txt

Checksum ON, connection established, no data received.
www.nuclearcat.com/files/r8169_tx_on.txt

If required i can capture binary pcap files.

^ permalink raw reply

* Re: [PATCH][RESEND 3] IPv6: 6rd tunnel mode
From: Alexandre Cassen @ 2009-09-23  6:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <4AB94D2C.4000006@gmail.com>



On Wed, 23 Sep 2009, Eric Dumazet wrote:
>> +#ifdef CONFIG_IPV6_SIT_6RD
>> +	case SIOCGET6RD:
>> +		err = -EINVAL;
>> +		if (dev == sitn->fb_tunnel_dev)
>> +			goto done;
>> +		err = -ENOENT;
>> +		if (!(t = netdev_priv(dev)))
>> +			goto done;
>
>> +		memcpy(&ip6rd, &t->ip6rd_prefix, sizeof(ip6rd));
>
> Just wondering why you need a temporary ip6rd here,
> why dont you copy_to_user(ifr->ifr_ifru.ifru_data, &t->ip6rd_prefix, sizeof(ip6rd)); ?
>
>> +		if (copy_to_user(ifr->ifr_ifru.ifru_data, &ip6rd, sizeof(ip6rd)))
>> +			err = -EFAULT;
>> +		else
>> +			err = 0;
>> +		break;

agreed. will fix.

^ permalink raw reply

* pktgen: tricks
From: Stephen Hemminger @ 2009-09-23  5:49 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Robert Olsson; +Cc: netdev

I thought others want to know how to get maximum speed of pktgen.

1. Run nothing else (even X11), just a command line
2. Make sure ethernet flow control is disabled
   ethtool -A eth0 autoneg off rx off tx off
3. Make sure clocksource is TSC.  On my old SMP Opteron's
   needed to get patch since in 2.6.30 or later, the clock guru's
   decided to remove it on all non Intel machines.  Look for patch
   than enables "tsc=reliable"
4. Compile Ethernet drivers in, the overhead of the indirect
   function call required for modules (or cache footprint),
   slows things down.
5. Increase transmit ring size to 1000
   ethtool -G eth0 tx 1000

Result: OK: 70408581(c70405979+d2602) nsec, 100000000 (60byte,0frags)
  1420281pps 681Mb/sec (681734880bps) errors: 0

^ permalink raw reply

* Re: [RFC] skb align patch
From: Thomas Graf @ 2009-09-23  5:47 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, shemminger, jesse.brandeburg, hawk, netdev
In-Reply-To: <20090921.222940.258576265.davem@davemloft.net>

On Mon, Sep 21, 2009 at 10:29:40PM -0700, David Miller wrote:
> The alignment in this patch is a real big deal for 64 byte forwarding
> tests, where the entire packet is a whole PCI-E cacheline.  But not
> if it isn't aligned properly.

As I pointed out to Herbert already, this alignment change may actually
make things worse or even break things as long as compare_ether_header()
used in __napi_gro_receive() expects the IP header to be aligned to 4
bytes. That can be fixed of course, just wanted to mention it.

^ permalink raw reply

* [PATCH 1/2] pktgen: T_TERMINATE flag is unused
From: Stephen Hemminger @ 2009-09-23  5:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20090923054141.932043798@vyatta.com>

[-- Attachment #1: pktgen-terminate.patch --]
[-- Type: text/plain, Size: 872 bytes --]

Get rid of unused flag bit.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/core/pktgen.c	2009-09-21 20:35:50.106198752 -0700
+++ b/net/core/pktgen.c	2009-09-21 20:36:03.349310985 -0700
@@ -192,11 +192,10 @@
 #define F_QUEUE_MAP_CPU (1<<14)	/* queue map mirrors smp_processor_id() */
 
 /* Thread control flag bits */
-#define T_TERMINATE   (1<<0)
-#define T_STOP        (1<<1)	/* Stop run */
-#define T_RUN         (1<<2)	/* Start run */
-#define T_REMDEVALL   (1<<3)	/* Remove all devs */
-#define T_REMDEV      (1<<4)	/* Remove one dev */
+#define T_STOP        (1<<0)	/* Stop run */
+#define T_RUN         (1<<1)	/* Start run */
+#define T_REMDEVALL   (1<<2)	/* Remove all devs */
+#define T_REMDEV      (1<<3)	/* Remove one dev */
 
 /* If lock -- can be removed after some work */
 #define   if_lock(t)           spin_lock(&(t->if_lock));

-- 


^ permalink raw reply

* [PATCH 2/2] pktgen: better scheduler friendliness
From: Stephen Hemminger @ 2009-09-23  5:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20090923054141.932043798@vyatta.com>

[-- Attachment #1: pktgen-fix.patch --]
[-- Type: text/plain, Size: 6846 bytes --]

Previous update did not resched in inner loop causing watchdogs.
Rewrite inner loop to:
  * account for delays better with less clock calls
  * more accurate timing of delay:
    - only delay if packet was successfully sent
    - if delay is 100ns and it takes 10ns to build packet then
      account for that
  * use wait_event_interruptible_timeout rather than open coding it.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/core/pktgen.c	2009-09-21 20:36:03.349310985 -0700
+++ b/net/core/pktgen.c	2009-09-21 20:49:53.647508802 -0700
@@ -2104,7 +2104,7 @@ static void pktgen_setup_inject(struct p
 
 static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 {
-	ktime_t start;
+	ktime_t start_time, end_time;
 	s32 remaining;
 	struct hrtimer_sleeper t;
 
@@ -2115,7 +2115,7 @@ static void spin(struct pktgen_dev *pkt_
 	if (remaining <= 0)
 		return;
 
-	start = ktime_now();
+	start_time = ktime_now();
 	if (remaining < 100)
 		udelay(remaining); 	/* really small just spin */
 	else {
@@ -2134,7 +2134,10 @@ static void spin(struct pktgen_dev *pkt_
 		} while (t.task && pkt_dev->running && !signal_pending(current));
 		__set_current_state(TASK_RUNNING);
 	}
-	pkt_dev->idle_acc += ktime_to_ns(ktime_sub(ktime_now(), start));
+	end_time = ktime_now();
+
+	pkt_dev->idle_acc += ktime_to_ns(ktime_sub(end_time, start_time));
+	pkt_dev->next_tx = ktime_add_ns(end_time, pkt_dev->delay);
 }
 
 static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)
@@ -3364,19 +3367,29 @@ static void pktgen_rem_thread(struct pkt
 	mutex_unlock(&pktgen_thread_lock);
 }
 
-static void idle(struct pktgen_dev *pkt_dev)
+static void pktgen_resched(struct pktgen_dev *pkt_dev)
 {
 	ktime_t idle_start = ktime_now();
+	schedule();
+	pkt_dev->idle_acc += ktime_to_ns(ktime_sub(ktime_now(), idle_start));
+}
 
-	if (need_resched())
-		schedule();
-	else
-		cpu_relax();
+static void pktgen_wait_for_skb(struct pktgen_dev *pkt_dev)
+{
+	ktime_t idle_start = ktime_now();
 
+	while (atomic_read(&(pkt_dev->skb->users)) != 1) {
+		if (signal_pending(current))
+			break;
+
+		if (need_resched())
+			pktgen_resched(pkt_dev);
+		else
+			cpu_relax();
+	}
 	pkt_dev->idle_acc += ktime_to_ns(ktime_sub(ktime_now(), idle_start));
 }
 
-
 static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 {
 	struct net_device *odev = pkt_dev->odev;
@@ -3386,36 +3399,21 @@ static void pktgen_xmit(struct pktgen_de
 	u16 queue_map;
 	int ret;
 
-	if (pkt_dev->delay) {
-		spin(pkt_dev, pkt_dev->next_tx);
-
-		/* This is max DELAY, this has special meaning of
-		 * "never transmit"
-		 */
-		if (pkt_dev->delay == ULLONG_MAX) {
-			pkt_dev->next_tx = ktime_add_ns(ktime_now(), ULONG_MAX);
-			return;
-		}
-	}
-
-	if (!pkt_dev->skb) {
-		set_cur_queue_map(pkt_dev);
-		queue_map = pkt_dev->cur_queue_map;
-	} else {
-		queue_map = skb_get_queue_mapping(pkt_dev->skb);
+	/* If device is offline, then don't send */
+	if (unlikely(!netif_running(odev) || !netif_carrier_ok(odev))) {
+		pktgen_stop_device(pkt_dev);
+		return;
 	}
 
-	txq = netdev_get_tx_queue(odev, queue_map);
-	/* Did we saturate the queue already? */
-	if (netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq)) {
-		/* If device is down, then all queues are permnantly frozen */
-		if (netif_running(odev))
-			idle(pkt_dev);
-		else
-			pktgen_stop_device(pkt_dev);
+	/* This is max DELAY, this has special meaning of
+	 * "never transmit"
+	 */
+	if (unlikely(pkt_dev->delay == ULLONG_MAX)) {
+		pkt_dev->next_tx = ktime_add_ns(ktime_now(), ULONG_MAX);
 		return;
 	}
 
+	/* If no skb or clone count exhausted then get new one */
 	if (!pkt_dev->skb || (pkt_dev->last_ok &&
 			      ++pkt_dev->clone_count >= pkt_dev->clone_skb)) {
 		/* build a new pkt */
@@ -3434,54 +3432,45 @@ static void pktgen_xmit(struct pktgen_de
 		pkt_dev->clone_count = 0;	/* reset counter */
 	}
 
-	/* fill_packet() might have changed the queue */
+	if (pkt_dev->delay && pkt_dev->last_ok)
+		spin(pkt_dev, pkt_dev->next_tx);
+
 	queue_map = skb_get_queue_mapping(pkt_dev->skb);
 	txq = netdev_get_tx_queue(odev, queue_map);
 
 	__netif_tx_lock_bh(txq);
-	if (unlikely(netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq)))
-		pkt_dev->last_ok = 0;
-	else {
-		atomic_inc(&(pkt_dev->skb->users));
+	atomic_inc(&(pkt_dev->skb->users));
 
-	retry_now:
+	if (unlikely(netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq)))
+		ret = NETDEV_TX_BUSY;
+	else
 		ret = (*xmit)(pkt_dev->skb, odev);
-		switch (ret) {
-		case NETDEV_TX_OK:
-			txq_trans_update(txq);
-			pkt_dev->last_ok = 1;
-			pkt_dev->sofar++;
-			pkt_dev->seq_num++;
-			pkt_dev->tx_bytes += pkt_dev->cur_pkt_size;
-			break;
-		case NETDEV_TX_LOCKED:
-			cpu_relax();
-			goto retry_now;
-		default: /* Drivers are not supposed to return other values! */
-			if (net_ratelimit())
-				pr_info("pktgen: %s xmit error: %d\n",
-					odev->name, ret);
-			pkt_dev->errors++;
-			/* fallthru */
-		case NETDEV_TX_BUSY:
-			/* Retry it next time */
-			atomic_dec(&(pkt_dev->skb->users));
-			pkt_dev->last_ok = 0;
-		}
-
-		if (pkt_dev->delay)
-			pkt_dev->next_tx = ktime_add_ns(ktime_now(),
-							pkt_dev->delay);
+
+	switch (ret) {
+	case NETDEV_TX_OK:
+		txq_trans_update(txq);
+		pkt_dev->last_ok = 1;
+		pkt_dev->sofar++;
+		pkt_dev->seq_num++;
+		pkt_dev->tx_bytes += pkt_dev->cur_pkt_size;
+		break;
+	default: /* Drivers are not supposed to return other values! */
+		if (net_ratelimit())
+			pr_info("pktgen: %s xmit error: %d\n",
+				odev->name, ret);
+		pkt_dev->errors++;
+		/* fallthru */
+	case NETDEV_TX_LOCKED:
+	case NETDEV_TX_BUSY:
+		/* Retry it next time */
+		atomic_dec(&(pkt_dev->skb->users));
+		pkt_dev->last_ok = 0;
 	}
 	__netif_tx_unlock_bh(txq);
 
 	/* If pkt_dev->count is zero, then run forever */
 	if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) {
-		while (atomic_read(&(pkt_dev->skb->users)) != 1) {
-			if (signal_pending(current))
-				break;
-			idle(pkt_dev);
-		}
+		pktgen_wait_for_skb(pkt_dev);
 
 		/* Done with this */
 		pktgen_stop_device(pkt_dev);
@@ -3514,20 +3503,24 @@ static int pktgen_thread_worker(void *ar
 	while (!kthread_should_stop()) {
 		pkt_dev = next_to_run(t);
 
-		if (!pkt_dev &&
-		    (t->control & (T_STOP | T_RUN | T_REMDEVALL | T_REMDEV))
-		    == 0) {
-			prepare_to_wait(&(t->queue), &wait,
-					TASK_INTERRUPTIBLE);
-			schedule_timeout(HZ / 10);
-			finish_wait(&(t->queue), &wait);
+		if (unlikely(!pkt_dev && t->control == 0)) {
+			wait_event_interruptible_timeout(t->queue,
+							 t->control != 0,
+							 HZ/10);
+			continue;
 		}
 
 		__set_current_state(TASK_RUNNING);
 
-		if (pkt_dev)
+		if (likely(pkt_dev)) {
 			pktgen_xmit(pkt_dev);
 
+			if (need_resched())
+				pktgen_resched(pkt_dev);
+			else
+				cpu_relax();
+		}
+
 		if (t->control & T_STOP) {
 			pktgen_stop(t);
 			t->control &= ~(T_STOP);

-- 


^ permalink raw reply

* Re: [PATCH 3/3] i2400m-sdio: select IWMC3200TOP in Kconfig
From: Tomas Winkler @ 2009-09-23  5:36 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linville-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA, yi.zhu-ral2JQCrhuEAvxtiuMwx3w,
	inaky.perez-gonzalez-ral2JQCrhuEAvxtiuMwx3w,
	cindy.h.kao-ral2JQCrhuEAvxtiuMwx3w,
	guy.cohen-ral2JQCrhuEAvxtiuMwx3w,
	ron.rindjunsky-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <1253666644.2931.7.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>

On Wed, Sep 23, 2009 at 3:44 AM, Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org> wrote:
> Hi Tomas,
>
>> i2400m-sdio requires iwmc3200top for its operation
>>
>> Signed-off-by: Tomas Winkler <tomas.winkler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> ---
>>  drivers/net/wimax/i2400m/Kconfig |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/wimax/i2400m/Kconfig b/drivers/net/wimax/i2400m/Kconfig
>> index d623b3d..7368ad5 100644
>> --- a/drivers/net/wimax/i2400m/Kconfig
>> +++ b/drivers/net/wimax/i2400m/Kconfig
>> @@ -25,6 +25,7 @@ config WIMAX_I2400M_SDIO
>>       tristate "Intel Wireless WiMAX Connection 2400 over SDIO"
>>       depends on WIMAX && MMC
>>       select WIMAX_I2400M
>> +     select IWMC3200TOP
>>       help
>>         Select if you have a device based on the Intel WiMAX
>>         Connection 2400 over SDIO.
>
> this is not true actually. Since the WiMAX hardware in my laptop doesn't
> require the top driver.
>
SDIO?

Thanks
Tomas
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: igb VF allocation with quirk_i82576_sriov
From: Chris Wright @ 2009-09-23  5:12 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Chris Wright, netdev@vger.kernel.org, Ronciak, John,
	e1000-devel@lists.sourceforge.net
In-Reply-To: <4AB8F042.4030306@intel.com>

* Alexander Duyck (alexander.h.duyck@intel.com) wrote:
> Chris Wright wrote:
>> Is this known to work?  During recent virt testing for upcoming Fedora 12,
>> a box w/out SR-IOV support in BIOS was using quirk to create VF BAR space,
>> VF allocation worked enough to assign a device to the guest, but igbvf
>> was not actually functioning properly in the guest.
>>
>> Is it worth debugging this further, or is it already a known issue?
>
> You could be experiencing one of a couple different issues.
>
> First when you say you started SR-IOV on a box w/out SR-IOV support I  
> assume you are using "pci=assign-busses" in order to reserve the bus  
> space for the VFs, is that correct?  Also while your system may not  
> support SR-IOV does it at least support VT-d?  Without VT-d support you  
> won't be able to assign a device to the guest.

VT-d was definitely there, as for the rest I'll have to ask the tester
for more details.  I just wanted to verify that it's a known working
combo before spending more time on it.

Regarding the bus numbering, I don't think there's a bus issue.
The PF+VFs all stay w/in same bus segment despite large offset and the
stride (IIRC, this was only device on bus 2, a dual port igb on .0 and .1.
the offset is 128 and stride is 2, so even w/ 8 VFs the max device would
be something like 2:11.7 or 2:12.0).

> My recommendations for further testing would be to test a VF on the host  
> kernel to see if that works.  If it does then you could also try direct  
> assigning an entire port to see if that works.  If the entire port  
> doesn't work then you probably don't have VT-d enabled.

Yeah, IIRC, igbvf at least loaded on the host (on the guest too, after
unbinding host driver).  I didn't get a chance to see if VF passed
traffic on the host, and from the report, it wasn't able to get a dhcp
address in the guest.  Will dig into it a bit more after plumbers.

thanks,
-chris

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf

^ permalink raw reply

* Re: [PATCH][RESEND 3] IPv6: 6rd tunnel mode
From: Brian Haley @ 2009-09-23  1:47 UTC (permalink / raw)
  To: Alexandre Cassen; +Cc: netdev
In-Reply-To: <20090922220251.GA22874@lnxos.staff.proxad.net>

Alexandre Cassen wrote:
>
> +/* 6RD parms */
> +struct ip_tunnel_6rd {
> +	struct in6_addr		addr;
> +	__u8			prefixlen;
> +};

Are you sure you're not going to want to add anything to this struct in
the future like ifindex or flags?  Since it's part of the API you'd want
to do that now.

-Brian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox