Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net: use synchronize_rcu_expedited()
From: Eric Dumazet @ 2011-05-24  9:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Paul E. McKenney

synchronize_rcu() is very slow in various situations (HZ=100,
CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)

Extract from my (mostly idle) 8 core machine :

 synchronize_rcu() in 99985 us
 synchronize_rcu() in 79982 us
 synchronize_rcu() in 87612 us
 synchronize_rcu() in 79827 us
 synchronize_rcu() in 109860 us
 synchronize_rcu() in 98039 us
 synchronize_rcu() in 89841 us
 synchronize_rcu() in 79842 us
 synchronize_rcu() in 80151 us
 synchronize_rcu() in 119833 us
 synchronize_rcu() in 99858 us
 synchronize_rcu() in 73999 us
 synchronize_rcu() in 79855 us
 synchronize_rcu() in 79853 us


When we hold RTNL mutex, we would like to spend some cpu cycles but not
block too long other processes waiting for this mutex.

We also want to setup/dismantle network features as fast as possible at
boot/shutdown time.

This patch makes synchronize_net() call the expedited version if RTNL is
locked.

synchronize_rcu_expedited() typical delay is about 20 us on my machine.

 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 20 us
 synchronize_rcu_expedited() in 16 us
 synchronize_rcu_expedited() in 20 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Ben Greear <greearb@candelatech.com>
---
 net/core/dev.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index bcb05cb..ec11d75 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5954,7 +5954,10 @@ EXPORT_SYMBOL(free_netdev);
 void synchronize_net(void)
 {
 	might_sleep();
-	synchronize_rcu();
+	if (rtnl_is_locked())
+		synchronize_rcu_expedited();
+	else
+		synchronize_rcu();
 }
 EXPORT_SYMBOL(synchronize_net);
 



^ permalink raw reply related

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Eric W. Biederman @ 2011-05-24  8:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ingo Molnar, netdev, linux-kernel, Geert Uytterhoeven,
	Linux Containers, Linus Torvalds
In-Reply-To: <1306221981.10201.8.camel@mulgrave.site>

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Tue, 2011-05-24 at 00:03 -0700, Eric W. Biederman wrote:
>> Ingo Molnar <mingo@elte.hu> writes:
>> 
>> > I agree with Linus's notion in this thread though, a core kernel change should 
>> > generally not worry about hooking up rare-arch system calls (concentrate on the 
>> > architectures that get tested most) - those are better enabled gradually 
>> > anyway.
>> 
>> The way I read it he was complaining about my having parisc bits and
>> asking for my branch to be merged before the parisc bits had been
>> merged.  Which I credit as a fair complaint.  If I am going to depend on
>> other peoples trees I should wait.
>> 
>> At the same time when I am busy looking for every possible source of
>> trouble and putting code into net-next to detect pending conflicts,
>> and when maintainers complain when I ask for review that my patches
>> conflict with their patches.  Being a contentious developer I am
                                         ^^^^^^^^^^^ conscientious
I didn't realize it was possible to make that typo.

>> inclined to do something.
>
> Right ... and the problem is that someone has to care, because the
> conflict will show up in linux-next.  I think Stephen Rothwell would
> appreciate us making his life easier rather than leaving it to him to
> sort out the problems.
>
>> Now that the reality has sunk in that it means waiting for other peoples
>> code to be merged before I request for my changes to be merged I don't
>> think I will structure a tree that way again while I remember.
>
> Right.   This is quite a common occurrence in SCSI (mostly changes
> entangled with block or libata).  If you don't feel comfortable running
> a postmerge tree, just send me the bits and I'll do it (after all it
> works either way around: I can pull in the syscalls and depend on your
> tree rather than vice versa).

Well for the moment I don't see too many problems.  I sent another pull
request to Linus earlier today now that your changes are in.  So I am
hoping either Linus will pull my tree or someone will educate me on what
he will Linus will accept.

Right now my tree is tested and in a good state.  Heck I'm running it
to send this email.  So I am reluctant to change anything without clear
feedback.

James when you refer to a postmerge tree what are the dynamics/semantics
usually associated with that?  Is this a tree that gets pulled a couple
of times?  Once with the non-conflicting bits.  Another time when the
bits it depends on have been merged?  Or is this a tree that gets pulled
after the merge window entirely?

Eric

^ permalink raw reply

* Re: [patch 1/1] net: convert %p usage to %pK
From: David Miller @ 2011-05-24  7:58 UTC (permalink / raw)
  To: eric.dumazet
  Cc: joe, mingo, akpm, netdev, drosenberg, a.p.zijlstra, eparis,
	eugeneteo, jmorris, kees.cook, tgraf
In-Reply-To: <1306223101.2638.43.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 24 May 2011 09:45:01 +0200

> Le mardi 24 mai 2011 à 00:35 -0700, Joe Perches a écrit :
> 
>> I think it's be better without the casts
>> using the standard kernel.h macros.
>> 
>> 	void *ptr;
>> 
>> 	ptr = maybe_hide_ptr(sk);
>> 	r->id.idiag_cookie[0] = lower_32_bits(ptr);
>> 	r->id.idiag_cookie[1] = upper_32_bits(ptr);
>> 
> 
> I am not sure I want to patch lower_32_bits() and upper_32_bits() for
> this.
> 
> They dont work on pointers, but on "numbers", according to kerneldoc
> Andrew wrote years ago. gcc agrees :
> 
> net/ipv4/inet_diag.c: In function ‘inet_csk_diag_fill’:
> net/ipv4/inet_diag.c:119: warning: cast from pointer to integer of different size
> net/ipv4/inet_diag.c:120: error: invalid operands to binary >>
> make[1]: *** [net/ipv4/inet_diag.o] Error 1

Also you can't do this, the "cookie" is used by the kernel future
lookups to find sockets.

The kernel pointer is part of the API, so sorry you can't "hide"
kernel pointers in this case without really breaking user visible
things.

^ permalink raw reply

* Re: [PATCH 1/3] vlan: Do not support clearing VLAN_FLAG_REORDER_HDR
From: Eric W. Biederman @ 2011-05-24  7:56 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, greearb, nicolas.2p.debian, jpirko, xiaosuo, netdev,
	kaber, fubar, eric.dumazet, andy, jesse
In-Reply-To: <20110524.011942.393855175233217324.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Mon, 23 May 2011 15:05:54 -0700
>
>> 3) What do we do with pf_packet and vlan hardware acceleration when
>>    dumping not the vlan interface but the interface below the vlan
>>    interface?
>> 
>>    Do we provide an option to keep the vlan header?  Should that option
>>    be on by default?
>> 
>
> The vlan_tci in the V2 pf_packet auxdata was intended for this
> purpose.
>
> So no matter what variant of behavior is occurring, apps can properly
> reconstitute the VLAN header if they inspect the vlan_tci in the
> auxdata.

It sucks a little bit to deal with that but that is fair.

> The only thing that seems to be missing is an indication that a VLAN
> tag was present at all, ie. vlan_tx_tag_present(), in this manner an
> application could then differentiate between no VLAN header and a VLAN
> tag of zero.

Good point.

I had seen that we were putting the data there but I missed the fact
that we deleted the indicator.  That makes packets not destined for a
vlan but that just have priority bits set hard to detect.  Especially
if the priority bits are 0.

Would it cause many problems if we added used tp_status to hold a flag
indicating the presence of a vlan header?

We also have an issue that the socket filter doesn't have access to any
of the vlan information at present.

Eric

^ permalink raw reply

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Krishna Kumar2 @ 2011-05-24  7:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	lguest-uLR06cmDAlY/bJ5BZ2RsiQ, Shirley Ma,
	kvm-u79uwXL29TY76Z2rM5mHXA, Carsten Otte,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, netdev-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky,
	linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <20110523111900.GB27212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

"Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote on 05/23/2011 04:49:00 PM:

> > To do this properly, we should really be using the actual number of sg
> > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > know how many.
> >
> > Cheers,
> > Rusty.
>
> Maybe I'm confused here.  The problem isn't the failing
> add_buf for the given skb IIUC.  What we are trying to do here is stop
> the queue *before xmit_skb fails*. We can't look at the
> number of fragments in the current skb - the next one can be
> much larger.  That's why we check capacity after xmit_skb,
> not before it, right?

Maybe Rusty means it is a simpler model to free the amount
of space that this xmit needs. We will still fail anyway
at some time but it is unlikely, since earlier iteration
freed up atleast the space that it was going to use. The
code could become much simpler:

start_xmit()
{
{
        num_sgs = get num_sgs for this skb;

        /* Free enough pending old buffers to enable queueing this one */
        free_old_xmit_skbs(vi, num_sgs * 2);     /* ?? */

        if (virtqueue_get_capacity() < num_sgs) {
                netif_stop_queue(dev);
                if (virtqueue_enable_cb_delayed(vi->svq) ||
                    free_old_xmit_skbs(vi, num_sgs)) {
                        /* Nothing freed up, or not enough freed up */
                        kfree_skb(skb);
                        return NETDEV_TX_OK;
                }
                netif_start_queue(dev);
                virtqueue_disable_cb(vi->svq);
        }

        /* xmit_skb cannot fail now, also pass 'num_sgs' */
        xmit_skb(vi, skb, num_sgs);
        virtqueue_kick(vi->svq);

        skb_orphan(skb);
        nf_reset(skb);

        return NETDEV_TX_OK;
}

We could even return TX_BUSY since that makes the dequeue
code more efficient. See dev_dequeue_skb() - you can skip a
lot of code (and avoid taking locks) to check if the queue
is already stopped but that code runs only if you return
TX_BUSY in the earlier iteration.

BTW, shouldn't the check in start_xmit be:
	if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
		...
	}

Thanks,

- KK

^ permalink raw reply

* Re: [patch 1/1] net: convert %p usage to %pK
From: Eric Dumazet @ 2011-05-24  7:45 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ingo Molnar, David Miller, akpm, netdev, drosenberg, a.p.zijlstra,
	eparis, eugeneteo, jmorris, kees.cook, tgraf
In-Reply-To: <1306222507.2298.23.camel@Joe-Laptop>

Le mardi 24 mai 2011 à 00:35 -0700, Joe Perches a écrit :

> I think it's be better without the casts
> using the standard kernel.h macros.
> 
> 	void *ptr;
> 
> 	ptr = maybe_hide_ptr(sk);
> 	r->id.idiag_cookie[0] = lower_32_bits(ptr);
> 	r->id.idiag_cookie[1] = upper_32_bits(ptr);
> 

I am not sure I want to patch lower_32_bits() and upper_32_bits() for
this.

They dont work on pointers, but on "numbers", according to kerneldoc
Andrew wrote years ago. gcc agrees :

net/ipv4/inet_diag.c: In function ‘inet_csk_diag_fill’:
net/ipv4/inet_diag.c:119: warning: cast from pointer to integer of different size
net/ipv4/inet_diag.c:120: error: invalid operands to binary >>
make[1]: *** [net/ipv4/inet_diag.o] Error 1




^ permalink raw reply

* Re: [PATCH 1/3] vlan: Do not support clearing VLAN_FLAG_REORDER_HDR
From: Michał Mirosław @ 2011-05-24  7:44 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Ben Greear, Eric W. Biederman, David Miller, shemminger, jpirko,
	xiaosuo, netdev, kaber, fubar, eric.dumazet, andy, jesse
In-Reply-To: <4DDB5A3F.8050007@gmail.com>

2011/5/24 Nicolas de Pesloüan <nicolas.2p.debian@gmail.com>:
>> If it turns out the NIC is not stripping VLAN tags for whatever reason,
>> we might be able to optimize things so that it never does the HW emulation
>> so that it never has to un-do it later.
> It might be very tricky to avoid any do-undo-redo situation. I will latter
> try and describe all the possible situations: with/without hw-accel,
> with/without a protocol handler registered on the parent interface,
> with/without a child interface build on top of a particular parent interface
> and with the corresponding VLAN ID...

Hardware tag stripping could be disabled whenever AF_PACKET listener
is present on the base interface. It should be easy now with the new
features handling.

BTW, what's the performance gain from hw tag stripping? Besides saving
12-byte memmove() in one cacheline?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH] vlan: Fix the b0rked ingress VLAN_FLAG_REORDER_HDR check.
From: Eric W. Biederman @ 2011-05-24  7:38 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, greearb, nicolas.2p.debian, jpirko, xiaosuo, netdev,
	kaber, fubar, eric.dumazet, andy, jesse
In-Reply-To: <20110524.022406.2228892895515155850.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Mon, 23 May 2011 23:18:02 -0700
>
>> Feel free to read through the code, to convince yourself it is correct.
>> In addition the code is untouched from the vlan header insertion for
>> emulation of vlan header acceleration in dev_hard_start_xmit() which
>> presumably has been working for quite awhile.
>
> I'm not keeping code there that does eth_hdr(skb)->foo when there
> can be either a vlan_hdr(skb) or a eth_hdr(skb) there.
>
> That's just asking for trouble.

How so?  eth_hdr(skb) is a proper subset of vlan_hdr(skb)?

vlan_insert_tag() moves the ethernet header to make space for the vlan
header after it, but before the rest of the data.  With the appropriate
fixups applied to the skb->mac_pointer.

I can see not leaving a vlan_hdr(skb) reference, but beyond that I'm
not seeing the problem.

Certainly we need to do the insert before we update the statics or
we will get rx_bytes wrong.

Would you be happier if the pkt_type check was moved earlier in the
function like say:

diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index c08dae7..7571637 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -22,21 +22,6 @@ bool vlan_do_receive(struct sk_buff **skbp)
 	if (unlikely(!skb))
 		return false;
 
-	skb->dev = vlan_dev;
-	if (unlikely(!(vlan_dev_info(vlan_dev)->flags & VLAN_FLAG_REORDER_HDR))) {
-		skb = *skbp = vlan_insert_tag(skb, skb->vlan_tci);
-		if (!skb)
-			return false;
-	}
-	skb->priority = vlan_get_ingress_priority(vlan_dev, skb->vlan_tci);
-	skb->vlan_tci = 0;
-
-	rx_stats = this_cpu_ptr(vlan_dev_info(vlan_dev)->vlan_pcpu_stats);
-
-	u64_stats_update_begin(&rx_stats->syncp);
-	rx_stats->rx_packets++;
-	rx_stats->rx_bytes += skb->len;
-
 	switch (skb->pkt_type) {
 	case PACKET_BROADCAST:
 		break;
@@ -52,6 +37,21 @@ bool vlan_do_receive(struct sk_buff **skbp)
 			skb->pkt_type = PACKET_HOST;
 		break;
 	}
+
+	skb->dev = vlan_dev;
+	if (unlikely(!(vlan_dev_info(vlan_dev)->flags & VLAN_FLAG_REORDER_HDR))) {
+		skb = *skbp = vlan_insert_tag(skb, skb->vlan_tci);
+		if (!skb)
+			return false;
+	}
+	skb->priority = vlan_get_ingress_priority(vlan_dev, skb->vlan_tci);
+	skb->vlan_tci = 0;
+
+	rx_stats = this_cpu_ptr(vlan_dev_info(vlan_dev)->vlan_pcpu_stats);
+
+	u64_stats_update_begin(&rx_stats->syncp);
+	rx_stats->rx_packets++;
+	rx_stats->rx_bytes += skb->len;
 	u64_stats_update_end(&rx_stats->syncp);
 
 	return true;

Eric

^ permalink raw reply related

* Re: [patch 1/1] net: convert %p usage to %pK
From: Joe Perches @ 2011-05-24  7:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, David Miller, akpm, netdev, drosenberg, a.p.zijlstra,
	eparis, eugeneteo, jmorris, kees.cook, tgraf
In-Reply-To: <1306220681.2638.40.camel@edumazet-laptop>

On Tue, 2011-05-24 at 09:04 +0200, Eric Dumazet wrote:
> Le mardi 24 mai 2011 à 08:57 +0200, Ingo Molnar a écrit :
> > * Joe Perches <joe@perches.com> wrote:
> > > Maybe for clarity it'd be better to use a switch/case
> > > or something like:
> Here we are, thanks.

Hey Eric.  Just more trivia:

> [PATCH v2] inet_diag: hide socket pointers
> Provide a mayber_hide_ptr() helper and use it in inet_diag to not

typo maybe_hide_ptr

> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> @@ -798,6 +798,26 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
>  }
>  
>  int kptr_restrict __read_mostly;
> +/**
> + * maybe_hide_ptr - Eventually nullify a kernel pointer given to user

Not a great description.
Maybe something like:
 * maybe_hide_ptr - Set user values of kernel pointers to null
 *                  when appropriate
[]
> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index 6ffe94c..b5646a3 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -84,6 +84,7 @@ static int inet_csk_diag_fill(struct sock *sk,
>  	struct inet_diag_meminfo  *minfo = NULL;
>  	unsigned char	 *b = skb_tail_pointer(skb);
>  	const struct inet_diag_handler *handler;
> +	u64 ptr;
>  
>  	handler = inet_diag_table[unlh->nlmsg_type];
>  	BUG_ON(handler == NULL);
> @@ -114,8 +115,9 @@ static int inet_csk_diag_fill(struct sock *sk,
>  	r->idiag_retrans = 0;
>  
>  	r->id.idiag_if = sk->sk_bound_dev_if;
> -	r->id.idiag_cookie[0] = (u32)(unsigned long)sk;
> -	r->id.idiag_cookie[1] = (u32)(((unsigned long)sk >> 31) >> 1);
> +	ptr = (u64)maybe_hide_ptr(sk);
> +	r->id.idiag_cookie[0] = (u32)ptr;
> +	r->id.idiag_cookie[1] = (u32)(ptr >> 32);

I think it's be better without the casts
using the standard kernel.h macros.

	void *ptr;

	ptr = maybe_hide_ptr(sk);
	r->id.idiag_cookie[0] = lower_32_bits(ptr);
	r->id.idiag_cookie[1] = upper_32_bits(ptr);



^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: James Bottomley @ 2011-05-24  7:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, netdev, linux-kernel, Geert Uytterhoeven,
	Linux Containers, Linus Torvalds
In-Reply-To: <m1k4dgr35i.fsf@fess.ebiederm.org>

On Tue, 2011-05-24 at 00:03 -0700, Eric W. Biederman wrote:
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > I agree with Linus's notion in this thread though, a core kernel change should 
> > generally not worry about hooking up rare-arch system calls (concentrate on the 
> > architectures that get tested most) - those are better enabled gradually 
> > anyway.
> 
> The way I read it he was complaining about my having parisc bits and
> asking for my branch to be merged before the parisc bits had been
> merged.  Which I credit as a fair complaint.  If I am going to depend on
> other peoples trees I should wait.
> 
> At the same time when I am busy looking for every possible source of
> trouble and putting code into net-next to detect pending conflicts,
> and when maintainers complain when I ask for review that my patches
> conflict with their patches.  Being a contentious developer I am
> inclined to do something.

Right ... and the problem is that someone has to care, because the
conflict will show up in linux-next.  I think Stephen Rothwell would
appreciate us making his life easier rather than leaving it to him to
sort out the problems.

> Now that the reality has sunk in that it means waiting for other peoples
> code to be merged before I request for my changes to be merged I don't
> think I will structure a tree that way again while I remember.

Right.   This is quite a common occurrence in SCSI (mostly changes
entangled with block or libata).  If you don't feel comfortable running
a postmerge tree, just send me the bits and I'll do it (after all it
works either way around: I can pull in the syscalls and depend on your
tree rather than vice versa).

James

^ permalink raw reply

* Re: [PATCH 1/3] vlan: Do not support clearing VLAN_FLAG_REORDER_HDR
From: Nicolas de Pesloüan @ 2011-05-24  7:19 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Eric W. Biederman, Changli Gao, Ben Greear, David Miller, netdev,
	shemminger, kaber, fubar, eric.dumazet, andy, Jesse Gross
In-Reply-To: <20110524055836.GA2955@psychotron>

Le 24/05/2011 07:58, Jiri Pirko a écrit :
<snip>

>>> Btw what's the rationale to move untag to earlier position?
>>
>> Maybe simply because we try to mimic hw-accel, and hw-accel untagging
>> definitely happens before we enter __netif_receive_skb and only
>> happens once.
>>
>> So having software untagging inside the __netif_receive_skb loop looks different.
>
> I understand. But what code prior to current vlan_untag position needs
> to see the skb untagged?

Any protocol handlers (ptype_all or ptype_base) registered on the parent interface may need to see 
the skb untagged, for all the reasons given in this thread. Arguably, doing software untagging 
earlier wouldn't help. :-) We need a strong logic to decide whether and when to untag and/or 
possibly re-tag.

	Nicolas.

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Ingo Molnar @ 2011-05-24  7:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Bottomley, Linus Torvalds, linux-kernel, Linux Containers,
	netdev, Geert Uytterhoeven
In-Reply-To: <m1k4dgr35i.fsf@fess.ebiederm.org>

* Eric W. Biederman <ebiederm@xmission.com> wrote:

> > Also, system call table conflicts are trivial to resolve. Merging in 
> > net-next to avoid such a conflict is like cracking a nut with a 
> > sledgehammer.
> 
> Well I still have trauma from how nasty it was to test with syscall numbers 
> continuing to change when I was working on the kexec_load system call.
> 
> As far as I can tell any one system call conflict on any one
> architecture is easy to resolve.  Resolving a conflict on all
> architectures would amount to at least 50 files that need to be resolved
> that feels a bit more than trivial.

Of course - and the straightforward solution is to not do it but concentrate on 
the 2-3 archs you find to be the primary target of your patches. How many 
parisc systems are there on the planet, which in the future will be upgraded to 
both kernel and user-space running your new syscall for real? Less than 10? How 
many ARM and x86 systems?

> My gut feel says we should really implement an 
> include/asm-generic/unistd-common.h to include all new system calls.
> 
> That way there would be only one file to touch instead of 50. Certainly it 
> works for include/asm-generic/unistd.h for the architectures that use it.  
> And all we really need is just a little abstraction on that concept.

I suppose that could be tried, although in practice it would probably be 
somewhat complex due to the various compat syscall handling differences.
So i guess this is one of the 'lets see how ugly/fragile it becomes' patches.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 1/3] vlan: Do not support clearing VLAN_FLAG_REORDER_HDR
From: Nicolas de Pesloüan @ 2011-05-24  7:11 UTC (permalink / raw)
  To: Ben Greear
  Cc: Eric W. Biederman, David Miller, shemminger, jpirko, xiaosuo,
	netdev, kaber, fubar, eric.dumazet, andy, jesse
In-Reply-To: <4DDB3226.8010404@candelatech.com>

Le 24/05/2011 06:20, Ben Greear a écrit :
<snip>

> Then I think we should put it back with pf_packet logic. Possibly with
> a per-socket option to disable this and send it as only aux data if that
> is more efficient.

Why would we need a per-socket option for that?

If the socket listen on eth0, it probably expect to receive the raw packet. If it happens to expect 
the untagged packet, it should listen on vlan2000.

The only reason I can imagine to listen on eth0 while expecting the packet to be untagged is to 
listen to several VLAN at the same time on a single socket, while still expecting the kernel or the 
NIC to extract the vlan ID for us. But I don't have a real life use case for this in mind.

And maybe, for that particular situation, we should have a special vlan interface, with wildcard 
vlan ID:

        +--eth0.100
        |
eth0 --+--eth0.200
        |
        +--eth0.any

- Someone listening on eth0 would receive raw packet.
- Someone listening on eth0.100/eth0.200 would receive untagged packets originally having 100/200 as 
the VLAN ID.
- Someone listening on eth0.any would receive untagged packets originally having any VLAN ID and 
would not receive non-originally-tagged packets.

> If it turns out the NIC is not stripping VLAN tags for whatever reason,
> we might be able to optimize things so that it never does the HW emulation
> so that it never has to un-do it later.

It might be very tricky to avoid any do-undo-redo situation. I will latter try and describe all the 
possible situations: with/without hw-accel, with/without a protocol handler registered on the parent 
interface, with/without a child interface build on top of a particular parent interface and with the 
corresponding VLAN ID...

	Nicolas.

^ permalink raw reply

* Re: [gianfar PATCH] RFC v2: Add rx_ntuple feature
From: Ben Hutchings @ 2011-05-24  7:09 UTC (permalink / raw)
  To: s.poehn; +Cc: netdev, sebastian.poehn
In-Reply-To: <14348.80.254.147.148.1305638298.squirrel@webmail.hs-esslingen.de>

On Tue, 2011-05-17 at 15:18 +0200, Sebastian Pöhn wrote:
> This patch implements rx ntuple filtering for the gianfar driver.
> Changes since last version:
> #Added code is now re-entrant
> #Consolidation of convert routines
> 
> I am planing to do some final testing. After that I hope I can send the
> final patch.
> 
> Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
> ---
> 
>  drivers/net/gianfar.c         |   16 +-
>  drivers/net/gianfar.h         |   44 ++
>  drivers/net/gianfar_ethtool.c | 1006
> +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1062 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index ff60b23..ddd4007 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
[...]
> @@ -1042,6 +1048,9 @@ static int gfar_probe(struct platform_device *ofdev)
>  	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_VLAN)
>  		dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
> 
> +	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_RX_FILER)
> +		dev->features |= NETIF_F_NTUPLE;
> +

It should be possible to turn this feature on and off, so add it to
hw_features and handle it in gfar_set_features() by clearing filters
when it's being turned off.

[...]
> --- a/drivers/net/gianfar.h
> +++ b/drivers/net/gianfar.h
[...]
> @@ -1066,6 +1069,9 @@ struct gfar_private {
> 
>  	struct vlan_group *vlgrp;
> 
> +	/* RX queue filer rule set*/
> +	struct ethtool_rx_ntuple_list ntuple_list;
> +	struct mutex rx_queue_access;
> 
>  	/* Hash registers and their width */
>  	u32 __iomem *hash_regs[16];

Don't use struct ethtool_rx_ntuple_list; it is going to be removed once
ixgbe is converted to implement RX NFC.

Really, at this point I would say: please implement RX NFC, not RX
n-tuple.

> @@ -1140,6 +1146,16 @@ static inline void gfar_write_filer(struct
> gfar_private *priv,
>  	gfar_write(&regs->rqfpr, fpr);
>  }
> 
> +static inline void gfar_read_filer(struct gfar_private *priv,
> +		unsigned int far, unsigned int *fcr, unsigned int *fpr)
> +{
> +	struct gfar __iomem *regs = priv->gfargrp[0].regs;
> +
> +	gfar_write(&regs->rqfar, far);
> +	*fcr = gfar_read(&regs->rqfcr);
> +	*fpr = gfar_read(&regs->rqfpr);
> +}
> +
>  extern void lock_rx_qs(struct gfar_private *priv);
>  extern void lock_tx_qs(struct gfar_private *priv);
>  extern void unlock_rx_qs(struct gfar_private *priv);
> @@ -1157,4 +1173,32 @@ int gfar_set_features(struct net_device *dev, u32
> features);
> 
>  extern const struct ethtool_ops gfar_ethtool_ops;
> 
> +#define ESWFULL 160
> +#define EHWFULL 161
> +#define EOUTOFRANGE 162

You can't just make up new error codes like this.

[...]
> --- a/drivers/net/gianfar_ethtool.c
> +++ b/drivers/net/gianfar_ethtool.c
[...]
> +static int gfar_add_table_entry(struct ethtool_rx_ntuple_flow_spec *flow,
> +		struct ethtool_rx_ntuple_list *list)
> +{
> +	struct ethtool_rx_ntuple_flow_spec_container *temp;
> +	temp = kmalloc(sizeof(struct ethtool_rx_ntuple_flow_spec_container),
> +			GFP_KERNEL);
> +	if (temp == NULL)
> +		return -ENOMEM;
> +	memcpy(&temp->fs, flow, sizeof(struct ethtool_rx_ntuple_flow_spec));
> +	list_add_tail(&temp->list, &list->list);
> +	list->count++;
> +
> +	if (~flow->data_mask)
> +		printk(KERN_WARNING
> +			"User-specific data is not supported by hardware!\n");
> +	if (flow->flow_type == IP_USER_FLOW)
> +		if (flow->m_u.usr_ip4_spec.ip_ver != 255)
> +			printk(KERN_WARNING
> +				"IP-Version is not supported by hardware!\n");

That's not right; the value of ip_ver must be 4 and the mask must be 0.

[...]
> +static int gfar_process_filer_changes(struct ethtool_rx_ntuple_flow_spec
> *flow,
> +		struct gfar_private *priv)
> +{
> +	struct ethtool_rx_ntuple_flow_spec_container *j;
> +	struct filer_table *tab;
> +	s32 i = 0;
> +	s32 ret = 0;
> +
> +	/*So index is set to zero, too!*/
> +	tab = kzalloc(sizeof(struct filer_table), GFP_KERNEL);
> +	if (tab == NULL) {
> +		printk(KERN_WARNING "Can not get memory!\n");
> +		return -ENOMEM;
> +	}
> +
> +	j = gfar_search_table_entry(flow, &priv->ntuple_list);
> +
> +	if (flow->action == ETHTOOL_RXNTUPLE_ACTION_CLEAR) {
> +		if (j != NULL)
> +			gfar_del_table_entry(j, &priv->ntuple_list);
> +		else {
> +			printk(KERN_WARNING
> +			"Deleting this rule is not possible,"
> +			" because it does not exist!\n");
> +			return -1;

-1 is -EPERM.  You should return -ENOENT and not log anything.

> +		}
> +	} else if (j != NULL) {
> +		printk(KERN_WARNING "Adding this rule is not possible,"
> +			" because it already exists!\n");
> +		return -1;

You should return -EEXIST and not log anything.

> +	}
> +
> +	/*Now convert the existing filer data from flow_spec into
> +	* filer tables binary format*/
> +	list_for_each_entry(j, &priv->ntuple_list.list, list) {
> +		ret = gfar_convert_to_filer(&j->fs, tab);
> +		if (ret == -ESWFULL) {
> +			printk(KERN_WARNING
> +			"Adding this rule is not possible,"
> +			" because there is not space left!\n");
> +			goto end;
> +		}
> +	}
[...]

I think the error code should be -EBUSY if the hardware or software
table is full.

Also, please run checkpatch.pl over your changes and fix the style
errors it finds.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [patch 1/1] net: convert %p usage to %pK
From: Eric Dumazet @ 2011-05-24  7:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Joe Perches, David Miller, akpm, netdev, drosenberg, a.p.zijlstra,
	eparis, eugeneteo, jmorris, kees.cook, tgraf
In-Reply-To: <20110524065715.GB12816@elte.hu>

Le mardi 24 mai 2011 à 08:57 +0200, Ingo Molnar a écrit :
> * Joe Perches <joe@perches.com> wrote:
> 
> > Maybe for clarity it'd be better to use a switch/case
> > or something like:
> > 
> > 	if (kptr_restrict == 0)
> > 		return ptr;
> > 	if (ptr_restrict == 1 &&
> > 	    has_capability_noaudit(current, CAP_SYSLOG))
> > 		return ptr;
> > 	return NULL;
> 
> While not breaking that line really - so something like:
> 
> 	if (kptr_restrict == 0)
> 		return ptr;
> 
> 	if (kptr_restrict == 1 && has_capability_noaudit(current, CAP_SYSLOG))
> 		return ptr;
> 
> 	return NULL;
> 
> Thanks,
> 
> 	Ingo

Here we are, thanks.

[PATCH v2] inet_diag: hide socket pointers

Provide a mayber_hide_ptr() helper and use it in inet_diag to not
disclose kernel pointers to user, with following kptr_restrict logic :

kptr_restrict = 0 : kernel pointers are not mangled
kptr_restrict = 1 : if the current user does not have CAP_SYSLOG,
		    kernel pointers are replaced by 0
kptr_restrict = 2 : kernel pointers are replaced by 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
v2: kerneldoc and cleanup (Joe & Ingo feedback)

 include/linux/printk.h |    1 +
 lib/vsprintf.c         |   25 +++++++++++++++++++++----
 net/ipv4/inet_diag.c   |    6 ++++--
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index ee048e7..47c0cef 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -111,6 +111,7 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 extern int printk_delay_msec;
 extern int dmesg_restrict;
 extern int kptr_restrict;
+void *maybe_hide_ptr(void *ptr);
 
 void log_buf_kexec_setup(void);
 #else
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 1d659d7..db1a193 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -798,6 +798,26 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
 }
 
 int kptr_restrict __read_mostly;
+/**
+ * maybe_hide_ptr - Eventually nullify a kernel pointer given to user
+ * @ptr: kernel pointer
+ *
+ * kptr_restrict = 0 : kernel pointer not changed
+ * kptr_restrict = 1 : if the current user does not have CAP_SYSLOG,
+ *		       kernel pointer is replaced by 0
+ * kptr_restrict = 2 : kernel pointer is replaced by 0 for all users.
+ */
+void *maybe_hide_ptr(void *ptr)
+{
+	if (!kptr_restrict)
+		return ptr;
+
+	if (kptr_restrict == 1 && has_capability_noaudit(current, CAP_SYSLOG))
+		return ptr;
+
+	return NULL;
+}
+EXPORT_SYMBOL(maybe_hide_ptr);
 
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
@@ -911,10 +931,7 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 				spec.field_width = 2 * sizeof(void *);
 			return string(buf, end, "pK-error", spec);
 		}
-		if (!((kptr_restrict == 0) ||
-		      (kptr_restrict == 1 &&
-		       has_capability_noaudit(current, CAP_SYSLOG))))
-			ptr = NULL;
+		ptr = maybe_hide_ptr(ptr);
 		break;
 	}
 	spec.flags |= SMALL;
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 6ffe94c..b5646a3 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -84,6 +84,7 @@ static int inet_csk_diag_fill(struct sock *sk,
 	struct inet_diag_meminfo  *minfo = NULL;
 	unsigned char	 *b = skb_tail_pointer(skb);
 	const struct inet_diag_handler *handler;
+	u64 ptr;
 
 	handler = inet_diag_table[unlh->nlmsg_type];
 	BUG_ON(handler == NULL);
@@ -114,8 +115,9 @@ static int inet_csk_diag_fill(struct sock *sk,
 	r->idiag_retrans = 0;
 
 	r->id.idiag_if = sk->sk_bound_dev_if;
-	r->id.idiag_cookie[0] = (u32)(unsigned long)sk;
-	r->id.idiag_cookie[1] = (u32)(((unsigned long)sk >> 31) >> 1);
+	ptr = (u64)maybe_hide_ptr(sk);
+	r->id.idiag_cookie[0] = (u32)ptr;
+	r->id.idiag_cookie[1] = (u32)(ptr >> 32);
 
 	r->id.idiag_sport = inet->inet_sport;
 	r->id.idiag_dport = inet->inet_dport;



^ permalink raw reply related

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Eric W. Biederman @ 2011-05-24  7:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: James Bottomley, Linus Torvalds, linux-kernel, Linux Containers,
	netdev, Geert Uytterhoeven
In-Reply-To: <20110522084224.GA12279@elte.hu>

Ingo Molnar <mingo@elte.hu> writes:

> I agree with Linus's notion in this thread though, a core kernel change should 
> generally not worry about hooking up rare-arch system calls (concentrate on the 
> architectures that get tested most) - those are better enabled gradually 
> anyway.

The way I read it he was complaining about my having parisc bits and
asking for my branch to be merged before the parisc bits had been
merged.  Which I credit as a fair complaint.  If I am going to depend on
other peoples trees I should wait.

At the same time when I am busy looking for every possible source of
trouble and putting code into net-next to detect pending conflicts,
and when maintainers complain when I ask for review that my patches
conflict with their patches.  Being a contentious developer I am
inclined to do something.

Now that the reality has sunk in that it means waiting for other peoples
code to be merged before I request for my changes to be merged I don't
think I will structure a tree that way again while I remember.

> Also, system call table conflicts are trivial to resolve. Merging in net-next 
> to avoid such a conflict is like cracking a nut with a sledgehammer.

Well I still have trauma from how nasty it was to test with syscall
numbers continuing to change when I was working on the kexec_load system
call.

As far as I can tell any one system call conflict on any one
architecture is easy to resolve.  Resolving a conflict on all
architectures would amount to at least 50 files that need to be resolved
that feels a bit more than trivial.

My gut feel says we should really implement an
include/asm-generic/unistd-common.h to include all new system calls.

That way there would be only one file to touch instead of 50.
Certainly it works for include/asm-generic/unistd.h for the
architectures that use it.  And all we really need is just a little
abstraction on that concept.

Eric

^ permalink raw reply

* Re: [patch 1/1] net: convert %p usage to %pK
From: Ingo Molnar @ 2011-05-24  6:57 UTC (permalink / raw)
  To: Joe Perches
  Cc: Eric Dumazet, David Miller, akpm, netdev, drosenberg,
	a.p.zijlstra, eparis, eugeneteo, jmorris, kees.cook, tgraf
In-Reply-To: <1306218793.2298.10.camel@Joe-Laptop>


* Joe Perches <joe@perches.com> wrote:

> Maybe for clarity it'd be better to use a switch/case
> or something like:
> 
> 	if (kptr_restrict == 0)
> 		return ptr;
> 	if (ptr_restrict == 1 &&
> 	    has_capability_noaudit(current, CAP_SYSLOG))
> 		return ptr;
> 	return NULL;

While not breaking that line really - so something like:

	if (kptr_restrict == 0)
		return ptr;

	if (kptr_restrict == 1 && has_capability_noaudit(current, CAP_SYSLOG))
		return ptr;

	return NULL;

Thanks,

	Ingo

^ permalink raw reply

* Re: INFO: suspicious rcu_dereference_check() usage.
From: Justin Mattock @ 2011-05-24  6:50 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: netdev
In-Reply-To: <4DDB491A.2070901@gmail.com>

On Mon, May 23, 2011 at 10:58 PM, Justin P. Mattock
<justinmattock@gmail.com> wrote:
> seems this is just info, but probably should post this so you guys are
> aware..:
>
>
>
> [ 1308.117068]
> [ 1308.117072] ===================================================
> [ 1308.117080] [ INFO: suspicious rcu_dereference_check() usage. ]
> [ 1308.117085] ---------------------------------------------------
> [ 1308.117091] kernel/sched_rt.c:1266 invoked rcu_dereference_check()
> without protection!
> [ 1308.117096]
> [ 1308.117097] other info that might help us debug this:
> [ 1308.117099]
> [ 1308.117104]
> [ 1308.117106] rcu_scheduler_active = 1, debug_locks = 1
> [ 1308.117112] 1 lock held by watchdog/1/16:
> [ 1308.117116]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff8103dac0>]
> post_schedule.part.25+0x14/0x4a
> [ 1308.117137]
> [ 1308.117139] stack backtrace:
> [ 1308.117146] Pid: 16, comm: watchdog/1 Not tainted
> 2.6.39-04906-g5e152b4-dirty #2
> [ 1308.117151] Call Trace:
> [ 1308.117163]  [<ffffffff81079f6f>] lockdep_rcu_dereference+0x9a/0xa2
> [ 1308.117172]  [<ffffffff8103d256>] find_lowest_rq+0xfe/0x179
> [ 1308.117182]  [<ffffffff81043f64>] push_rt_task.part.60+0x7b/0x1c8
> [ 1308.117190]  [<ffffffff810441bc>] post_schedule_rt+0x20/0x28
> [ 1308.117198]  [<ffffffff8103dadc>] post_schedule.part.25+0x30/0x4a
> [ 1308.117209]  [<ffffffff8148cce5>] schedule+0x725/0x773
> [ 1308.117217]  [<ffffffff810a3864>] ? watchdog_enable+0x198/0x198
> [ 1308.117226]  [<ffffffff8106e4b8>] ? cpu_clock+0x47/0x51
> [ 1308.117234]  [<ffffffff810a3864>] ? watchdog_enable+0x198/0x198
> [ 1308.117241]  [<ffffffff810a38d1>] watchdog+0x6d/0xb0
> [ 1308.117249]  [<ffffffff8106877d>] kthread+0xa8/0xb0
> [ 1308.117259]  [<ffffffff81496e64>] kernel_thread_helper+0x4/0x10
> [ 1308.117268]  [<ffffffff8148f594>] ? retint_restore_args+0x13/0x13
> [ 1308.117276]  [<ffffffff810686d5>] ? __init_kthread_worker+0x5a/0x5a
> [ 1308.117283]  [<ffffffff81496e60>] ? gs_change+0x13/0x13
> [ 1308.117675]
> [ 1308.117678] ===================================================
> [ 1308.117685] [ INFO: suspicious rcu_dereference_check() usage. ]
> [ 1308.117690] ---------------------------------------------------
> [ 1308.117695] kernel/sched.c:619 invoked rcu_dereference_check() without
> protection!
> [ 1308.117700]
> [ 1308.117701] other info that might help us debug this:
> [ 1308.117704]
> [ 1308.117708]
> [ 1308.117710] rcu_scheduler_active = 1, debug_locks = 1
> [ 1308.117715] 2 locks held by rcuc0/7:
> [ 1308.117719]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff8148c6a1>]
> schedule+0xe1/0x773
> [ 1308.117736]  #1:  (&rq->lock/1){..-.-.}, at: [<ffffffff8103f13e>]
> double_lock_balance+0x6c/0x78
> [ 1308.117753]
> [ 1308.117755] stack backtrace:
> [ 1308.117761] Pid: 7, comm: rcuc0 Not tainted 2.6.39-04906-g5e152b4-dirty
> #2
> [ 1308.117766] Call Trace:
> [ 1308.117776]  [<ffffffff81079f6f>] lockdep_rcu_dereference+0x9a/0xa2
> [ 1308.117784]  [<ffffffff8103ca8e>] task_group+0x7f/0xbe
> [ 1308.117792]  [<ffffffff8103cae4>] set_task_rq+0x17/0x70
> [ 1308.117800]  [<ffffffff81043c8d>] set_task_cpu+0xf7/0x14c
> [ 1308.117810]  [<ffffffff81217099>] ? plist_check_head+0x9a/0x9e
> [ 1308.117818]  [<ffffffff812171cf>] ? plist_del+0x70/0x79
> [ 1308.117825]  [<ffffffff8103e8b4>] ? dequeue_task_rt+0x38/0x3d
> [ 1308.117833]  [<ffffffff8103eb5c>] ? dequeue_task+0x87/0x8e
> [ 1308.117841]  [<ffffffff81043df7>] pull_rt_task+0x115/0x169
> [ 1308.117849]  [<ffffffff81043ee7>] pre_schedule_rt+0x1e/0x20
> [ 1308.117857]  [<ffffffff8148c889>] schedule+0x2c9/0x773
> [ 1308.117865]  [<ffffffff81068e22>] ? prepare_to_wait+0x6c/0x78
> [ 1308.117875]  [<ffffffff810a998a>] rcu_cpu_kthread+0xeb/0x34d
> [ 1308.117883]  [<ffffffff81068ef8>] ? __init_waitqueue_head+0x4b/0x4b
> [ 1308.117892]  [<ffffffff810a989f>] ? rcu_needs_cpu+0x1bd/0x1bd
> [ 1308.117899]  [<ffffffff8106877d>] kthread+0xa8/0xb0
> [ 1308.117908]  [<ffffffff81496e64>] kernel_thread_helper+0x4/0x10
> [ 1308.117916]  [<ffffffff8148f594>] ? retint_restore_args+0x13/0x13
> [ 1308.117924]  [<ffffffff810686d5>] ? __init_kthread_worker+0x5a/0x5a
> [ 1308.117933]  [<ffffffff81496e60>] ? gs_change+0x13/0x13
>
>
> full dmesg is here:
> http://fpaste.org/IqRy/
>
> Justin P. Mattock
>


well, after posting the above, I noticed some others as well.
(system got sluggish, but is responsive).

[ 2862.310349] WARNING: at net/ipv4/route.c:1668 ip_rt_bug+0x5c/0x62()
[ 2862.310353] Hardware name: MacBookPro2,2
[ 2862.310355] Modules linked in: fuse 8021q garp stp cpufreq_ondemand
llc acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables arc4
ath9k mac80211 radeon ath9k_common ath9k_hw ath cfg80211 btusb
bluetooth ttm drm_kms_helper drm uvcvideo joydev microcode appletouch
applesmc input_polldev iTCO_wdt videodev iTCO_vendor_support
v4l2_compat_ioctl32 i2c_i801 rfkill sky2 i2c_algo_bit i2c_core
apple_bl video [last unloaded: scsi_wait_scan]
[ 2862.310414] Pid: 6153, comm: gcm-session Not tainted
2.6.39-04906-g5e152b4-dirty #2
[ 2862.310417] Call Trace:
[ 2862.310424]  [<ffffffff8104c634>] warn_slowpath_common+0x83/0x9b
[ 2862.310430]  [<ffffffff8104c666>] warn_slowpath_null+0x1a/0x1c
[ 2862.310434]  [<ffffffff814095c9>] ip_rt_bug+0x5c/0x62
[ 2862.310439]  [<ffffffff814112a1>] dst_output+0x19/0x1d
[ 2862.310443]  [<ffffffff81412aa0>] ip_local_out+0x20/0x25
[ 2862.310448]  [<ffffffff814139c9>] ip_send_skb+0x19/0x58
[ 2862.310453]  [<ffffffff8142fa4e>] udp_send_skb+0x239/0x29b
[ 2862.310458]  [<ffffffff814310f0>] udp_sendmsg+0x5a1/0x7d4
[ 2862.310464]  [<ffffffff81079408>] ? trace_hardirqs_off+0xd/0xf
[ 2862.310469]  [<ffffffff8141139c>] ? ip_select_ident+0x3d/0x3d
[ 2862.310475]  [<ffffffff810525b8>] ? local_bh_enable_ip+0xe/0x10
[ 2862.310481]  [<ffffffff8148f131>] ? _raw_spin_unlock_bh+0x31/0x35
[ 2862.310486]  [<ffffffff813d41a6>] ? release_sock+0x14c/0x155
[ 2862.310490]  [<ffffffff814386ac>] inet_sendmsg+0x66/0x6f
[ 2862.310495]  [<ffffffff813d02b0>] sock_sendmsg+0xe6/0x109
[ 2862.310501]  [<ffffffff8107d63f>] ? lock_acquire+0xe1/0x109
[ 2862.310505]  [<ffffffff8107d535>] ? lock_release+0x1aa/0x1d3
[ 2862.310512]  [<ffffffff810ed549>] ? might_fault+0xa5/0xac
[ 2862.310516]  [<ffffffff813ceb34>] ? copy_from_user+0x2f/0x31
[ 2862.310521]  [<ffffffff813d1f34>] sys_sendto+0x132/0x174
[ 2862.310526]  [<ffffffff81495cfa>] ? sysret_check+0x2e/0x69
[ 2862.310531]  [<ffffffff8107b016>] ? trace_hardirqs_on_caller+0x13f/0x172
[ 2862.310537]  [<ffffffff8109fd4d>] ? audit_syscall_entry+0x11c/0x148
[ 2862.310542]  [<ffffffff8121debe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 2862.310546]  [<ffffffff81495cc2>] system_call_fastpath+0x16/0x1b
[ 2862.310549] ---[ end trace 2d2332adaa8bf2b5 ]---
[ 2863.373889] ip_rt_bug: 10.0.0.10 -> 255.255.255.255, ?
[ 2863.373903] ------------[ cut here ]------------
[ 2863.373910] WARNING: at net/ipv4/route.c:1668 ip_rt_bug+0x5c/0x62()
[ 2863.373919] Hardware name: MacBookPro2,2
[ 2863.373921] Modules linked in: fuse 8021q garp stp cpufreq_ondemand
llc acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables arc4
ath9k mac80211 radeon ath9k_common ath9k_hw ath cfg80211 btusb
bluetooth ttm drm_kms_helper drm uvcvideo joydev microcode appletouch
applesmc input_polldev iTCO_wdt videodev iTCO_vendor_support
v4l2_compat_ioctl32 i2c_i801 rfkill sky2 i2c_algo_bit i2c_core
apple_bl video [last unloaded: scsi_wait_scan]
[ 2863.373986] Pid: 6153, comm: gcm-session Tainted: G        W
2.6.39-04906-g5e152b4-dirty #2
[ 2863.373989] Call Trace:
[ 2863.373996]  [<ffffffff8104c634>] warn_slowpath_common+0x83/0x9b
[ 2863.374058]  [<ffffffff8104c666>] warn_slowpath_null+0x1a/0x1c
[ 2863.374063]  [<ffffffff814095c9>] ip_rt_bug+0x5c/0x62
[ 2863.374068]  [<ffffffff814112a1>] dst_output+0x19/0x1d
[ 2863.374073]  [<ffffffff81412aa0>] ip_local_out+0x20/0x25
[ 2863.374079]  [<ffffffff814139c9>] ip_send_skb+0x19/0x58
[ 2863.374084]  [<ffffffff8142fa4e>] udp_send_skb+0x239/0x29b
[ 2863.374089]  [<ffffffff814310f0>] udp_sendmsg+0x5a1/0x7d4
[ 2863.374095]  [<ffffffff81079408>] ? trace_hardirqs_off+0xd/0xf
[ 2863.374100]  [<ffffffff8141139c>] ? ip_select_ident+0x3d/0x3d
[ 2863.374106]  [<ffffffff810525b8>] ? local_bh_enable_ip+0xe/0x10
[ 2863.374111]  [<ffffffff8148f131>] ? _raw_spin_unlock_bh+0x31/0x35
[ 2863.374117]  [<ffffffff813d41a6>] ? release_sock+0x14c/0x155
[ 2863.374122]  [<ffffffff814386ac>] inet_sendmsg+0x66/0x6f
[ 2863.374127]  [<ffffffff813d02b0>] sock_sendmsg+0xe6/0x109
[ 2863.374132]  [<ffffffff8107d63f>] ? lock_acquire+0xe1/0x109
[ 2863.374137]  [<ffffffff8107d535>] ? lock_release+0x1aa/0x1d3
[ 2863.374143]  [<ffffffff810ed549>] ? might_fault+0xa5/0xac
[ 2863.374148]  [<ffffffff813ceb34>] ? copy_from_user+0x2f/0x31
[ 2863.374153]  [<ffffffff813d1f34>] sys_sendto+0x132/0x174
[ 2863.374158]  [<ffffffff81495cfa>] ? sysret_check+0x2e/0x69
[ 2863.374163]  [<ffffffff8107b016>] ? trace_hardirqs_on_caller+0x13f/0x172
[ 2863.374169]  [<ffffffff8109fd4d>] ? audit_syscall_entry+0x11c/0x148
[ 2863.374175]  [<ffffffff8121debe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 2863.374180]  [<ffffffff81495cc2>] system_call_fastpath+0x16/0x1b
[ 2863.374184] ---[ end trace 2d2332adaa8bf2b6 ]---

full dmesg here:
http://fpaste.org/W4UK/

-- 
Justin P. Mattock

^ permalink raw reply

* Re: [PATCH v2]net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version.
From: Justin P. Mattock @ 2011-05-24  6:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, joe, greearb
In-Reply-To: <20110524.022144.2121988243667976554.davem@davemloft.net>

On 05/23/2011 11:21 PM, David Miller wrote:
> From: "Justin P. Mattock"<justinmattock@gmail.com>
> Date: Mon, 23 May 2011 22:40:47 -0700
>
>> The below patch removes vlan_buggyright and vlan_copyright from vlan_proto_init,
>> so that it prints out just the fullname of vlan and the version number.
>
> Come on Justin, you're making various strings now completely
> unreferenced.  Don't just leave them there, remove them.
>

ah.. I did think, but was warry about removing them. resent with the 
references removed(hopefully).

Justin P. Mattock

^ permalink raw reply

* [PATCH v3]net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version.
From: Justin P. Mattock @ 2011-05-24  6:43 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Justin P. Mattock, Joe Perches, David S. Miller,
	Ben Greear

The below patch removes vlan_buggyright and vlan_copyright from vlan_proto_init, 
so that it prints out just the fullname of vlan and the version number.

before:

[   30.438203] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
[   30.441542] All bugs added by David S. Miller <davem@redhat.com>

after:

[   31.513910] 802.1Q VLAN Support v1.8

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
CC: Joe Perches <joe@perches.com>
CC: David S. Miller <davem@davemloft.net>
CC: Ben Greear <greearb@candelatech.com>
---
 net/8021q/vlan.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index b2274d1..c7a581a 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -46,8 +46,6 @@ int vlan_net_id __read_mostly;
 
 const char vlan_fullname[] = "802.1Q VLAN Support";
 const char vlan_version[] = DRV_VERSION;
-static const char vlan_copyright[] = "Ben Greear <greearb@candelatech.com>";
-static const char vlan_buggyright[] = "David S. Miller <davem@redhat.com>";
 
 /* End of global variables definitions. */
 
@@ -673,8 +671,7 @@ static int __init vlan_proto_init(void)
 {
 	int err;
 
-	pr_info("%s v%s %s\n", vlan_fullname, vlan_version, vlan_copyright);
-	pr_info("All bugs added by %s\n", vlan_buggyright);
+	pr_info("%s v%s\n", vlan_fullname, vlan_version);
 
 	err = register_pernet_subsys(&vlan_net_ops);
 	if (err < 0)
-- 
1.7.5.1

^ permalink raw reply related

* Re: [patch 1/1] net: convert %p usage to %pK
From: Joe Perches @ 2011-05-24  6:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, akpm, netdev, drosenberg, a.p.zijlstra, eparis,
	eugeneteo, jmorris, kees.cook, mingo, tgraf
In-Reply-To: <1306217837.2638.36.camel@edumazet-laptop>

On Tue, 2011-05-24 at 08:17 +0200, Eric Dumazet wrote:
> We probably need to extend this to inet_diag as well.
> Provide a mayber_hide_ptr() helper and use it in inet_diag to not
> disclose kernel pointers to user, with kptr_restrict logic :
> kptr_restrict = 0 : kernel pointers are not mangled
> kptr_restrict = 1 : if the current user does not have CAP_SYSLOG,
> kernel pointers are replaced by 0
> kptr_restrict = 2 : kernel pointers are replaced by 0
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
[]
> +void *maybe_hide_ptr(void *ptr)
> +{
> +	if (!((kptr_restrict == 0) ||
> +	      (kptr_restrict == 1 &&
> +	       has_capability_noaudit(current, CAP_SYSLOG))))
> +		ptr = NULL;
> +	return ptr;
> +}
> +EXPORT_SYMBOL(maybe_hide_ptr);

Makes sense to me.

Maybe for clarity it'd be better to use a switch/case
or something like:

	if (kptr_restrict == 0)
		return ptr;
	if (ptr_restrict == 1 &&
	    has_capability_noaudit(current, CAP_SYSLOG))
		return ptr;
	return NULL;



^ permalink raw reply

* [PATCH] be2net: hash key for rss-config cmd not set
From: Sathya Perla @ 2011-05-24  6:29 UTC (permalink / raw)
  To: netdev; +Cc: Sathya Perla

A non-zero, non-descript value is needed as the hash key. The hash variable was left un-initialized; but sometimes it gets a zero value
and hashing is not effective. The constant value used now (not of any significance) seems to work fine.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/benet/be_cmds.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 2463b1c..81654ae 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -1703,7 +1703,8 @@ int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable, u16 table_size)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_rss_config *req;
-	u32 myhash[10];
+	u32 myhash[10] = {0x0123, 0x4567, 0x89AB, 0xCDEF, 0x01EF,
+			0x0123, 0x4567, 0x89AB, 0xCDEF, 0x01EF};
 	int status;
 
 	if (mutex_lock_interruptible(&adapter->mbox_lock))
-- 
1.7.4


^ permalink raw reply related

* Re: [PATCH] vlan: Fix the b0rked ingress VLAN_FLAG_REORDER_HDR check.
From: David Miller @ 2011-05-24  6:24 UTC (permalink / raw)
  To: ebiederm
  Cc: shemminger, greearb, nicolas.2p.debian, jpirko, xiaosuo, netdev,
	kaber, fubar, eric.dumazet, andy, jesse
In-Reply-To: <m1aaectyed.fsf@fess.ebiederm.org>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Mon, 23 May 2011 23:18:02 -0700

> Feel free to read through the code, to convince yourself it is correct.
> In addition the code is untouched from the vlan header insertion for
> emulation of vlan header acceleration in dev_hard_start_xmit() which
> presumably has been working for quite awhile.

I'm not keeping code there that does eth_hdr(skb)->foo when there
can be either a vlan_hdr(skb) or a eth_hdr(skb) there.

That's just asking for trouble.

^ permalink raw reply

* Re: [PATCH] be2net: hash key for rss-config cmd not set
From: David Miller @ 2011-05-24  6:22 UTC (permalink / raw)
  To: Sathya.Perla; +Cc: netdev
In-Reply-To: <3367B80B08154D42A3B2BC708B5D41F63CC4DA8EC4@EXMAIL.ad.emulex.com>

From: <Sathya.Perla@Emulex.Com>
Date: Mon, 23 May 2011 22:46:48 -0700

> Pls let me know if you want me to re-send the patch with a more-descriptive changelog...

Yes, I do.

^ permalink raw reply

* Re: [PATCH v2]net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version.
From: David Miller @ 2011-05-24  6:21 UTC (permalink / raw)
  To: justinmattock; +Cc: netdev, linux-kernel, joe, greearb
In-Reply-To: <1306215647-2857-1-git-send-email-justinmattock@gmail.com>

From: "Justin P. Mattock" <justinmattock@gmail.com>
Date: Mon, 23 May 2011 22:40:47 -0700

> The below patch removes vlan_buggyright and vlan_copyright from vlan_proto_init, 
> so that it prints out just the fullname of vlan and the version number.

Come on Justin, you're making various strings now completely
unreferenced.  Don't just leave them there, remove them.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox