Netdev List
 help / color / mirror / Atom feed
* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, hadi, shemminger, netdev, robert, xiaosuo, andi
In-Reply-To: <1271278661.16881.1761.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 14 Apr 2010 22:57:41 +0200

> RPS can be tuned (Changli wants a finer tuning...), it would be
> intereting to tune multiqueue devices too. I dont know if its possible
> right now.

Only NIU allows real detailed control over queue selection and
stuff like that, because the hardware has a real TCAM for
packet matching and packets which match in TCAM entries can
steer to different collections of queues.

We have ethtool interfaces for this (ETHTOOL_GRXCLS*), so you can
change it.

For most other chips we only have interfaces for modifying the
RX hashing algorithm or what the RX hash covers, stuff like
that.

See also ETHTOOL_GRXFH, ETHTOOL_SRXFH, ETHTOOL_SRXNTUPLE, and
ETHTOOL_GRXNTUPLE, the latter two of which were added for Intel
NICs.

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-15  8:51 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <20100415.012607.15449571.davem@davemloft.net>

Le jeudi 15 avril 2010 à 01:26 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 15 Apr 2010 10:02:33 +0200
> 
> > Denys got a crash that we cannot explain yet. He said he has no
> > multiqueue devices, so obviously my patch cant help him.
> > 
> > But this patch was fixing a real issue, I believe I pointed it twice
> > already...
> 
> Ok, I thought your patch was specifically meant to fix Denys's bug.
> 
> The confusion comes from the fact that you mention Denys's crash in
> your commit message:
> 
> --------------------
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> --------------------
> 
> Anyways, I studied your patch once more and read the thread discussion
> with Krishna again and your patch looks fine.  I'll apply it to
> net-2.6, thanks!
> 

In any case, I think there is a fundamental problem with this sk
caching. Because one packet can travel in many stacked devices before
hitting the wire.

(bonding, vlan, ethernet) for example.

Socket cache is meaningfull for one level only...




^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:51 UTC (permalink / raw)
  To: hadi; +Cc: shemminger, eric.dumazet, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <1271276568.4567.59.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Apr 2010 16:22:48 -0400

> On Wed, 2010-04-14 at 12:44 -0700, Stephen Hemminger wrote:
> 
>> RPS might also interact with the core turbo boost functionality on Intel chips.
>> Newer chips will make a single core faster if other core can be kept idle.
> 
> how well does it work with Linux?

It's completely transparent and should just happen without any
BIOS tweaks.


^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:51 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, hadi, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <1271275130.16881.1749.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 14 Apr 2010 21:58:50 +0200

> Every time we change RPS to be on or off, we might have some extra
> noise. Maybe we already have this problem with irqbalance ?

irqbalance should never move network device interrupts around
under normal circumstances.  Arjan assured me that there is
specific logic in the irqbalance daemon to not move NIC
interrupts around once a target has been choosen.


^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:50 UTC (permalink / raw)
  To: shemminger; +Cc: hadi, eric.dumazet, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <20100414124426.6aee95c3@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 14 Apr 2010 12:44:26 -0700

> On Wed, 14 Apr 2010 14:53:42 -0400
> jamal <hadi@cyberus.ca> wrote:
> 
>> Agreed. So to enumerate, the benefits come in if:
>> a) you have many processors
>> b) you have single-queue nic
>> c) at sub-threshold traffic you dont care about a little latency
> 
> There probably needs to be better autotuning for this, there is no reason
> that RPS to be steering packets unless the queue is getting backed up.

I disagree, if the goal is to migrate the bulk of packet processing
to where the app will actually sink and process the data then it should
forward to RPS marked cpus regardless of local queue levels.


^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Eric Dumazet @ 2010-04-15  8:49 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, therbert, eparis, netdev
In-Reply-To: <u2o412e6f7f1004150047nc9250ffuaef68be066a7575c@mail.gmail.com>

Le jeudi 15 avril 2010 à 15:47 +0800, Changli Gao a écrit :
> On Thu, Apr 15, 2010 at 3:37 PM, David Miller <davem@davemloft.net> wrote:
> > From: Changli Gao <xiaosuo@gmail.com>
> > Date: Thu, 15 Apr 2010 15:30:44 +0800
> >
> >> Should netif_rx() be used only when preemption is disabled? If not,
> >> netif_rx_ni() should be used instead.?
> >
> > netif_rx() must be invoked from a hardware or software interrupt,
> > which implies preemption disabled.
> >
> > In netif_rx_ni(), the "ni" means "not interrupt".
> >
> 
> yea, I know netif_rx_ni()'s meaning. It means that the following
> changes aren't necessary.
> 
>  #else
> -       cpu = smp_processor_id();
> +       ret = enqueue_to_backlog(skb, get_cpu());
> +       put_cpu();
> 
> ret = enqueue_to_backlog(skb, smp_processor_id()); should be OK.
> 
>  #endif
> -
> -       return enqueue_to_backlog(skb, cpu);
> +       return ret;
>  }

netif_rx is meant to be called from interrupts because it doesn't wake
up ksoftirqd.  For calling from outside interrupts, netif_rx_ni exists,
to make _sure_ do_softirq() is called.

However, netif_rx() _could_ be called from process context, it was safe,
but sofirq was a bit delayed.

Now, after RPS changes this can trigger this :

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204032] Pid: 2093, comm: avahi-daemon Tainted: G        W
2.6.34-rc3-next-20100412+ #65
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204177]  [<ffffffff810e3a89>] ? might_fault+0xb9/0xd0
[   14.204184]  [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204205]  [<ffffffff811107f1>] ? do_sync_write+0xd1/0x110
[   14.204211]  [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b


We have two possibilities :

1) Make sure no netif_rx() caller is in process context, preemption
enabled.

2) Change netif_rx() to meet its initial behavior : It _can_ be called
from process context, preemption enabled. Frame might be delayed a bit
as before.

We chose 2), but David, I am not sure this is OK given git history, some
calling points were changed to avoid "'NOHZ: local_softirq_pending 08' "
messages...

----------------------------------------------------------------------------------------

commit 481a8199142c050b72bff8a1956a49fd0a75bbe0
Author: Oliver Hartkopp <oliver@hartkopp.net>
Date:   Tue Sep 15 01:31:34 2009 -0700

    can: fix NOHZ local_softirq_pending 08 warning
    
    When using nanosleep() in an userspace application we get a
ratelimit warning
    
    NOHZ: local_softirq_pending 08
    
    for 10 times.
    
    The echo of CAN frames is done from process context and softirq
context only.
    Therefore the usage of netif_rx() was wrong (for years).
    
    This patch replaces netif_rx() with netif_rx_ni() which has to be
used from
    process/softirq context. It also adds a missing comment that
can_send() must
    no be used from hardirq context.
    
    Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
    Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/can/vcan.c b/drivers/net/can/vcan.c
index 6971f6c..80ac563 100644
--- a/drivers/net/can/vcan.c
+++ b/drivers/net/can/vcan.c
@@ -80,7 +80,7 @@ static void vcan_rx(struct sk_buff *skb, struct
net_device *dev)
        skb->dev       = dev;
        skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-       netif_rx(skb);
+       netif_rx_ni(skb);
 }
 
 static netdev_tx_t vcan_tx(struct sk_buff *skb, struct net_device *dev)
diff --git a/net/can/af_can.c b/net/can/af_can.c
index ef1c43a..6068321 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -199,6 +199,8 @@ static int can_create(struct net *net, struct socket
*sock, int protocol)
  * @skb: pointer to socket buffer with CAN frame in data section
  * @loop: loopback for listeners on local CAN sockets (recommended
default!)
  *
+ * Due to the loopback this routine must not be called from hardirq
context.
+ *
  * Return:
  *  0 on success
  *  -ENETDOWN when the selected interface is down
@@ -278,7 +280,7 @@ int can_send(struct sk_buff *skb, int loop)
        }
 
        if (newskb)
-               netif_rx(newskb);
+               netif_rx_ni(newskb);
 
        /* update statistics */
        can_stats.tx_frames++;


commit ae3e0fcf901e4b7df87aef7ab39093e142a8de8b
Author: Holger Schurig <hs4233@mail.mn-solutions.de>
Date:   Wed Jan 16 15:48:44 2008 +0100

    libertas cs/sdio: fix 'NOHZ: local_softirq_pending 08' message
    
    netif_rx should be called only from interrupt context. if_cs and
if_sdio receive
    packets from other contexts, and thus should call netif_rx_ni.
    
    Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
    Acked-by: Holger Schurig <hs4233@mail.mn-solutions.de>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

diff --git a/drivers/net/wireless/libertas/rx.c
b/drivers/net/wireless/libertas/rx.c
index 6332fd4..149557a 100644
--- a/drivers/net/wireless/libertas/rx.c
+++ b/drivers/net/wireless/libertas/rx.c
@@ -247,7 +247,10 @@ int lbs_process_rxed_packet(struct lbs_private
*priv, struct sk_buff *skb)
        priv->stats.rx_packets++;
 
        skb->protocol = eth_type_trans(skb, dev);
-       netif_rx(skb);
+       if (in_interrupt())
+               netif_rx(skb);
+       else
+               netif_rx_ni(skb);
 
        ret = 0;
 done:

-----------------------------------------------------------------------------------------
Maybe we should add a new function after all...

int netif_rx_any(struct sk_buff *skb) 
{
       if (in_interrupt())
               return netif_rx(skb);

	return netif_rx_ni(skb);
}




^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:48 UTC (permalink / raw)
  To: hadi; +Cc: eric.dumazet, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <1271271222.4567.51.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Apr 2010 14:53:42 -0400

> On Wed, 2010-04-14 at 20:04 +0200, Eric Dumazet wrote:
> 
>> Yes, multiqueue is far better of course, but in case of hardware lacking
>> multiqueue, RPS can help many workloads, where application has _some_
>> work to do, not only counting frames or so...
> 
> Agreed. So to enumerate, the benefits come in if:
> a) you have many processors
> b) you have single-queue nic
> c) at sub-threshold traffic you dont care about a little latency
> d) you have a specific cache hierachy
> e) app is working hard to process incoming messages

A single-queue NIC is actually not a requirement, RPS helps also in
cases where you have 'N' application threads and N is less than the
number of CPUs your multi-queue NIC is distributing traffic to.

Moving the bulk of the input packet processing to the cpus where
the applications actually sit had a non-trivial benefit.  RFS takes
this aspect to yet another level.

> I think the main challenge for my pedantic mind is missing details. Is
> there a paper on rps? Example for #d above, the commit log mentions that
> rps benefits if you have certain types of "cache hierachy". Probably
> some arch with large shared L2/3 (maybe inclusive) cache will benefit.
> example: it does well on Nehalem and probably opterons as long (as you
> dont start stacking these things on some interconnect like QPI or HT).
> But what happens when you have FSB sharing across cores (still a very
> common setup)? etc etc

I think for the case where application locality is important,
RPS/RFS can help regardless of cache details.

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:42 UTC (permalink / raw)
  To: hadi; +Cc: therbert, eric.dumazet, netdev, robert, xiaosuo, andi
In-Reply-To: <1271245986.3943.55.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Apr 2010 07:53:06 -0400

> I base tested against no rps being used and a kernel which didnt
> have any RPS config on.  [BTW, I had to hand-edit the .config since
> i couldnt do it from menuconfig (Is there any reason for it to be
> so?)]

The RPS config is merely an indirect dependency on SMP as we have it
coded up in the Kconfig files, it's not meant to be user selectable
and is intended to be unconditionally on for SMP builds.

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  8:33 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <k2y412e6f7f1004150127u4e7ec668r80f066bfb3efea81@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 16:27:19 +0800

> I think the following patch from Eric should be applied instead.
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index c65f18e..d1bcc9f 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
>        newskb->pkt_type = PACKET_LOOPBACK;
>        newskb->ip_summed = CHECKSUM_UNNECESSARY;
>        WARN_ON(!skb_dst(newskb));
> -       netif_rx(newskb);
> +       netif_rx_ni(newskb);
>        return 0;
>  }

Yes, this looks more reasonable.  Eric if you agree please (re-)submit
this formally, I must have missed this somehow, sorry.

And this is a bug fix in any kernel, not just one's that have RPS
patches applied.

If we are not called from some interrupt context, there is no sure
trigger to make sure software interrupts will be executed after the
packet is queued locally.  netif_rx_ni() makes sure that any pending
software interrupts will run in such cases.

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  8:27 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.005726.205428265.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:57 PM, David Miller <davem@davemloft.net> wrote:
>
> Why?  If we are in an interrupt (either soft or hard) then
> smp_processor_id() is stable, and valid.
>
> Changli, I find it very frustrating to communicate with you, you are
> very terse in your descriptions and analysis and you make many simple
> errors that would be avoided if you spent more time thinking about
> things before sending your emails. :-/
>
> Instead of just showing some pseudo patch, state WHY it is needed.
> Talk about the execution state of environment and what rules or other
> things are being violated which must be corrected.
>

Sorry. English isn't my native language, so sometimes I can't describe
myself clearly.

I think the following patch from Eric should be applied instead.

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
       newskb->pkt_type = PACKET_LOOPBACK;
       newskb->ip_summed = CHECKSUM_UNNECESSARY;
       WARN_ON(!skb_dst(newskb));
-       netif_rx(newskb);
+       netif_rx_ni(newskb);
       return 0;
 }

As you know  "netif_rx() must be invoked from a hardware or software
interrupt, which implies preemption disabled.", obviously
ip_dev_loopback_xmit() doesn't obey this rule, so the crash isn't the
fault of net_rx(). If there are other users, who don't obey this rule,
they should be fixed too.

For this patch:

-       cpu = smp_processor_id();
+       ret = enqueue_to_backlog(skb, get_cpu());
+       put_cpu();

You said: " If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.". so we don't need to call
get_cpu() instead of smp_processor_id(). get_cpu() brings no good but
additional cost preempt_disable().

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply related

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: David Miller @ 2010-04-15  8:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <1271318553.16881.2161.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Apr 2010 10:02:33 +0200

> Denys got a crash that we cannot explain yet. He said he has no
> multiqueue devices, so obviously my patch cant help him.
> 
> But this patch was fixing a real issue, I believe I pointed it twice
> already...

Ok, I thought your patch was specifically meant to fix Denys's bug.

The confusion comes from the fact that you mention Denys's crash in
your commit message:

--------------------
When dev_pick_tx() caches tx queue_index on a socket, we must check
socket dst_entry matches skb one, or risk a crash later, as reported by
Denys Fedorysychenko, if old packets are in flight during a route
change, involving devices with different number of queues.
--------------------

Anyways, I studied your patch once more and read the thread discussion
with Krishna again and your patch looks fine.  I'll apply it to
net-2.6, thanks!


^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-15  8:02 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <20100414.235256.190096561.davem@davemloft.net>

Le mercredi 14 avril 2010 à 23:52 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 12 Apr 2010 09:18:17 +0200
> 
> > [PATCH] net: dev_pick_tx() fix
> > 
> > When dev_pick_tx() caches tx queue_index on a socket, we must check
> > socket dst_entry matches skb one, or risk a crash later, as reported by
> > Denys Fedorysychenko, if old packets are in flight during a route
> > change, involving devices with different number of queues.
> > 
> > Bug introduced by commit a4ee3ce3
> > (net: Use sk_tx_queue_mapping for connected sockets)
> > 
> > Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> It looks like Denys is still getting crashes even with this patch
> applied.  And I also think there is some meric to some of Krishna's
> analysis.
> 


> To me it seems to make more sense to validate the SKB's queue against
> the real actual choosen device's range.
> 
> The socket queue index will catch up and eventually become valid
> because the dst reset will invalidate the queue setting, and we'll
> thus recompute it as needed, as Krishna stated.
> 

??? 


> So I'm tossing this patch for now since it doesn't even aparently
> fix the bug.

I am a bit lost here David.

Denys got a crash that we cannot explain yet. He said he has no
multiqueue devices, so obviously my patch cant help him.

But this patch was fixing a real issue, I believe I pointed it twice
already...

I'll try to setup an environment to trigger this bug for real, but this
will take time, my dev machines are not multiqueue.




^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:57 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <u2o412e6f7f1004150047nc9250ffuaef68be066a7575c@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 15:47:26 +0800

> On Thu, Apr 15, 2010 at 3:37 PM, David Miller <davem@davemloft.net> wrote:
>> From: Changli Gao <xiaosuo@gmail.com>
>> Date: Thu, 15 Apr 2010 15:30:44 +0800
>>
>>> Should netif_rx() be used only when preemption is disabled? If not,
>>> netif_rx_ni() should be used instead.?
>>
>> netif_rx() must be invoked from a hardware or software interrupt,
>> which implies preemption disabled.
>>
>> In netif_rx_ni(), the "ni" means "not interrupt".
>>
> 
> yea, I know netif_rx_ni()'s meaning. It means that the following
> changes aren't necessary.
> 
>  #else
> -       cpu = smp_processor_id();
> +       ret = enqueue_to_backlog(skb, get_cpu());
> +       put_cpu();
> 

Why?  If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.

Changli, I find it very frustrating to communicate with you, you are
very terse in your descriptions and analysis and you make many simple
errors that would be avoided if you spent more time thinking about
things before sending your emails. :-/

Instead of just showing some pseudo patch, state WHY it is needed.
Talk about the execution state of environment and what rules or other
things are being violated which must be corrected.

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  7:47 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.003711.159334670.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:37 PM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Thu, 15 Apr 2010 15:30:44 +0800
>
>> Should netif_rx() be used only when preemption is disabled? If not,
>> netif_rx_ni() should be used instead.?
>
> netif_rx() must be invoked from a hardware or software interrupt,
> which implies preemption disabled.
>
> In netif_rx_ni(), the "ni" means "not interrupt".
>

yea, I know netif_rx_ni()'s meaning. It means that the following
changes aren't necessary.

 #else
-       cpu = smp_processor_id();
+       ret = enqueue_to_backlog(skb, get_cpu());
+       put_cpu();

ret = enqueue_to_backlog(skb, smp_processor_id()); should be OK.

 #endif
-
-       return enqueue_to_backlog(skb, cpu);
+       return ret;
 }

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:37 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <l2p412e6f7f1004150030o8a406133s91d6614cbb106796@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 15:30:44 +0800

> Should netif_rx() be used only when preemption is disabled? If not,
> netif_rx_ni() should be used instead.?

netif_rx() must be invoked from a hardware or software interrupt,
which implies preemption disabled.

In netif_rx_ni(), the "ni" means "not interrupt".

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  7:30 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.001446.244372815.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:14 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 13 Apr 2010 09:14:17 +0200
>
>> [PATCH net-next-2.6] net: netif_rx() must disable preemption
>>
>> Eric Paris reported netif_rx() is calling smp_processor_id() from
>> preemptible context, in particular when caller is
>> ip_dev_loopback_xmit().
>>
>> RPS commit added this smp_processor_id() call, this patch makes sure
>> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
>> can dot it a bit earlier.
>>
>> Reported-by: Eric Paris <eparis@redhat.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> I've applied this with some coding style fixups.
>
> Thanks!
>
> --------------------
> net: netif_rx() must disable preemption
>
> Eric Paris reported netif_rx() is calling smp_processor_id() from
> preemptible context, in particular when caller is
> ip_dev_loopback_xmit().
>
> RPS commit added this smp_processor_id() call, this patch makes sure
> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
> can dot it a bit earlier.
>
> Reported-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/core/dev.c |   25 +++++++++++++++----------
>  1 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 876b111..e8041eb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
>  /*
>  * get_rps_cpu is called from netif_receive_skb and returns the target
>  * CPU from the RPS map of the receiving queue for a given skb.
> + * rcu_read_lock must be held on entry.
>  */
>  static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
>  {
> @@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
>        u8 ip_proto;
>        u32 addr1, addr2, ports, ihl;
>
> -       rcu_read_lock();
> -
>        if (skb_rx_queue_recorded(skb)) {
>                u16 index = skb_get_rx_queue(skb);
>                if (unlikely(index >= dev->num_rx_queues)) {
> @@ -2296,7 +2295,6 @@ got_hash:
>        }
>
>  done:
> -       rcu_read_unlock();
>        return cpu;
>  }
>
> @@ -2392,7 +2390,7 @@ enqueue:
>
>  int netif_rx(struct sk_buff *skb)
>  {
> -       int cpu;
> +       int ret;
>
>        /* if netpoll wants it, pretend we never saw it */
>        if (netpoll_rx(skb))
> @@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
>                net_timestamp(skb);
>
>  #ifdef CONFIG_RPS
> -       cpu = get_rps_cpu(skb->dev, skb);
> -       if (cpu < 0)
> -               cpu = smp_processor_id();
> +       {
> +               int cpu;
> +
> +               rcu_read_lock();
> +               cpu = get_rps_cpu(skb->dev, skb);
> +               if (cpu < 0)
> +                       cpu = smp_processor_id();
> +               ret = enqueue_to_backlog(skb, cpu);
> +               rcu_read_unlock();
> +       }
>  #else
> -       cpu = smp_processor_id();
> +       ret = enqueue_to_backlog(skb, get_cpu());
> +       put_cpu();
>  #endif
> -
> -       return enqueue_to_backlog(skb, cpu);
> +       return ret;
>  }
>  EXPORT_SYMBOL(netif_rx);
>

Should netif_rx() be used only when preemption is disabled? If not,
netif_rx_ni() should be used instead.?


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH 1/2] igb: double increment nr_frags
From: David Miller @ 2010-04-15  7:28 UTC (permalink / raw)
  To: sanagi.koki
  Cc: netdev, e1000-devel, jeffrey.t.kirsher, jesse.brandeburg,
	bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <4BC6BE3C.6070204@jp.fujitsu.com>


Your email client corrupted your patch by applying text
formatting to it.

Please disable all text formatting in your email client
and resend your changes.

You can read Documentation/email-clients.txt for help in
this area.

^ permalink raw reply

* [PATCH 2/2] igbvf: dobule increment nr_frags
From: Koki Sanagi @ 2010-04-15  7:23 UTC (permalink / raw)
  To: netdev, e1000-devel
  Cc: davem, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	peter.p.waskiewicz.jr, john.ronciak

There is no need to increment nr_frags becasue skb_fill_page_desc increments
it.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 drivers/net/igbvf/netdev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
index ea8abf5..97cfc60 100644
--- a/drivers/net/igbvf/netdev.c
+++ b/drivers/net/igbvf/netdev.c
@@ -288,7 +288,7 @@ static bool igbvf_clean_rx_irq(struct igbvf_adapter *adapter,
 			               PCI_DMA_FROMDEVICE);
 			buffer_info->page_dma = 0;
 
-			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
+			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags,
 			                   buffer_info->page,
 			                   buffer_info->page_offset,
 			                   length);


^ permalink raw reply related

* [PATCH 1/2] igb: double increment nr_frags
From: Koki Sanagi @ 2010-04-15  7:20 UTC (permalink / raw)
  To: netdev, e1000-devel
  Cc: davem, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	peter.p.waskiewicz.jr, john.ronciak

There is no need to increment nr_frags becasue skb_fill_page_desc increments
it.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
  drivers/net/igb/igb_main.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 78cc742..8bde6c3 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -5254,7 +5254,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
  				       PAGE_SIZE / 2, PCI_DMA_FROMDEVICE);
  			buffer_info->page_dma = 0;
  
-			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
+			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags,
  						buffer_info->page,
  						buffer_info->page_offset,
  						length);


^ permalink raw reply related

* Re: [PATCH] CONFIG_SMP should be CONFIG_RPS
From: David Miller @ 2010-04-15  7:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, netdev, therbert
In-Reply-To: <1271141947.16881.177.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 08:59:07 +0200

> Thanks Changli, this is part of RFS patches :)
> 

I'll apply this, Tom has to respin RFS for other reasons
already...

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, eparis, netdev
In-Reply-To: <1271142857.16881.193.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 09:14:17 +0200

> [PATCH net-next-2.6] net: netif_rx() must disable preemption
> 
> Eric Paris reported netif_rx() is calling smp_processor_id() from
> preemptible context, in particular when caller is
> ip_dev_loopback_xmit().
> 
> RPS commit added this smp_processor_id() call, this patch makes sure
> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
> can dot it a bit earlier.
> 
> Reported-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I've applied this with some coding style fixups.

Thanks!

--------------------
net: netif_rx() must disable preemption

Eric Paris reported netif_rx() is calling smp_processor_id() from
preemptible context, in particular when caller is
ip_dev_loopback_xmit().

RPS commit added this smp_processor_id() call, this patch makes sure
preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
can dot it a bit earlier.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dev.c |   25 +++++++++++++++----------
 1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 876b111..e8041eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
 /*
  * get_rps_cpu is called from netif_receive_skb and returns the target
  * CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
  */
 static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 {
@@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 	u8 ip_proto;
 	u32 addr1, addr2, ports, ihl;
 
-	rcu_read_lock();
-
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
 		if (unlikely(index >= dev->num_rx_queues)) {
@@ -2296,7 +2295,6 @@ got_hash:
 	}
 
 done:
-	rcu_read_unlock();
 	return cpu;
 }
 
@@ -2392,7 +2390,7 @@ enqueue:
 
 int netif_rx(struct sk_buff *skb)
 {
-	int cpu;
+	int ret;
 
 	/* if netpoll wants it, pretend we never saw it */
 	if (netpoll_rx(skb))
@@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
 		net_timestamp(skb);
 
 #ifdef CONFIG_RPS
-	cpu = get_rps_cpu(skb->dev, skb);
-	if (cpu < 0)
-		cpu = smp_processor_id();
+	{
+		int cpu;
+
+		rcu_read_lock();
+		cpu = get_rps_cpu(skb->dev, skb);
+		if (cpu < 0)
+			cpu = smp_processor_id();
+		ret = enqueue_to_backlog(skb, cpu);
+		rcu_read_unlock();
+	}
 #else
-	cpu = smp_processor_id();
+	ret = enqueue_to_backlog(skb, get_cpu());
+	put_cpu();
 #endif
-
-	return enqueue_to_backlog(skb, cpu);
+	return ret;
 }
 EXPORT_SYMBOL(netif_rx);
 
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH 2.6.34-git] 8139too: fix Coding Styles
From: David Miller @ 2010-04-15  7:08 UTC (permalink / raw)
  To: jblanco; +Cc: netdev
In-Reply-To: <ba865f7b49c5df1b71bc393ad42ff40a.squirrel@neurowork.net>

From: "Javier Blanco de Torres (Neurowork)" <jblanco@neurowork.net>
Date: Mon, 12 Apr 2010 09:36:43 +0200

> Fixed coding styles in the 8139too net driver.
> 
> Signed-off-by: Javier Blanco de Torres <jblanco@neurowork.net>
> Signed-off-by: Alejandro Sánchez Acosta <asanchez@neurowork.net>

This is why I absolutely hate pure checkpatch.pl patches, people just
try to make the tool happy and don't think about what the tool is
trying to tell them.

The worst of this is this "typedef enum" part of your changes:

-typedef enum {
+enum {
 	RTL8139 = 0,
 	RTL8129,
 } board_t;

and checkpatch was telling you:

WARNING: do not add new typedefs
#220: FILE: net/8139too.c:220:
+typedef enum {

Well, you're still adding a new type!  Getting rid of the type name is
what it's telling you to stop doing.

It's still a newly named type after your change, it wants you to get
rid of the "board_t" thing altogether.

Give the enum a real "enum" name like:

enum rtl8139_board_t {

Then use _THAT_ in the sources:

	enum rtl8139_board_t x;

The typedef section of Documentation/CodingStyle makes this very
clear.

But your entire patch is like this, the changes are largely pointless
and many of them are false interpreations of what checkpatch complains
about.

Therefore I really don't encourage that you pursue this any further,
sorry.

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: David Miller @ 2010-04-15  6:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <1271056697.16881.7.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Apr 2010 09:18:17 +0200

> [PATCH] net: dev_pick_tx() fix
> 
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> 
> Bug introduced by commit a4ee3ce3
> (net: Use sk_tx_queue_mapping for connected sockets)
> 
> Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

It looks like Denys is still getting crashes even with this patch
applied.  And I also think there is some meric to some of Krishna's
analysis.

To me it seems to make more sense to validate the SKB's queue against
the real actual choosen device's range.

The socket queue index will catch up and eventually become valid
because the dst reset will invalidate the queue setting, and we'll
thus recompute it as needed, as Krishna stated.

So I'm tossing this patch for now since it doesn't even aparently
fix the bug.


^ permalink raw reply

* Re: [PATCH] drivers/net: remove network drivers' last few uses of IRQF_SAMPLE_RANDOM
From: David Miller @ 2010-04-15  6:42 UTC (permalink / raw)
  To: cpeterso; +Cc: netdev, linux-kernel, mpm
In-Reply-To: <1270877380-23813-1-git-send-email-cpeterso@cpeterso.com>

From: Chris Peterson <cpeterso@cpeterso.com>
Date: Sat, 10 Apr 2010 01:29:40 -0400

> feature-removal-schedule.txt says a new /dev/random entropy model is planned 
> (for July 2009). Until then, this patch removes the few remaining uses of 
> IRQF_SAMPLE_RANDOM from drivers/net/*. 12 network drivers are affected, one 
> more than last year (netxen_nic_main.c). If idle servers need more entropy, 
> they ought to use a hardware RNG or an entropy-gathering userspace daemon.
> 
> Signed-off-by: Chris Peterson <cpeterso@cpeterso.com>

Well, that feature-removal-schedule.txt entry states that new
add_*_randomness() interfaces will be created such that things
like network devices can indicate what kind of entropy they
are providing.

Those interfaces have not yet been created as far as I can tell, and
the idea is that network drivers could use those not-yet-written new
interfaces.

Until that time I feel it does more harm than good to remove these
annotations since frankly there never ever was good agreement about
whether we should do this or not.

It may seem stupid and pointless to just do nothing, but at this time
I think that's still the best thing to do.

Therefore, I'm not applying your patch, sorry.

^ permalink raw reply

* Re: [linux-2.6.34-rc3] drivers/net: ks8851 MLL ethernet network driver
From: David Miller @ 2010-04-15  6:38 UTC (permalink / raw)
  To: David.Choi; +Cc: netdev


I cannot apply this, there are too many issues:

@@ -378,34 +384,34 @@ union ks_tx_hdr {
 
 /**
  * struct ks_net - KS8851 driver private data
- * @net_device 	: The network device we're bound to
+ * @net_device	: The network device we're bound to
  * @hw_addr	: start address of data register.

Do not mix lots of coding style and functional changes.

If you want to fix coding style, do it later in a seperate
patch after you implement your functional change.

+	if (!reg_data & CCR_ENDIAN) {

This test will never do what you want it to do.

First the compiler will evaluate "!reg_data" which will be either "0"
or "1", then it will and it with the CCR_ENDIAN mask, which is (1 <<
10).  And this can never be true.

I also do not like all of the endian ifdefs peppered all over the
driver.  Try consolidate this by creating a header file that contains
the structure and macro definitions, and in there create helper inline
functions which do the ifdefs.  This way the ks8851_mll.c file can
stay ifdef clean.

Thanks.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox