Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [RFC][PATCH v3 1/3] A device for zero-copy based on KVM virtio-net.
From: Xin, Xiaohui @ 2010-04-15  9:01 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, jdike@linux.intel.com
In-Reply-To: <201004141655.21885.arnd@arndb.de>

Arnd,
>> From: Xin Xiaohui <xiaohui.xin@intel.com>
>> 
>> Add a device to utilize the vhost-net backend driver for
>> copy-less data transfer between guest FE and host NIC.
>> It pins the guest user space to the host memory and
>> provides proto_ops as sendmsg/recvmsg to vhost-net.

>Sorry for taking so long before finding the time to look
>at your code in more detail.

>It seems that you are duplicating a lot of functionality that
>is already in macvtap. I've asked about this before but then
>didn't look at your newer versions. Can you explain the value
>of introducing another interface to user land?

>I'm still planning to add zero-copy support to macvtap,
>hopefully reusing parts of your code, but do you think there
>is value in having both?

I have not looked into your macvtap code in detail before.
Does the two interface exactly the same? We just want to create a simple
way to do zero-copy. Now it can only support vhost, but in future
we also want it to support directly read/write operations from user space too.

Basically, compared to the interface, I'm more worried about the modification
to net core we have made to implement zero-copy now. If this hardest part
can be done, then any user space interface modifications or integrations are 
more easily to be done after that.

>> diff --git a/drivers/vhost/mpassthru.c b/drivers/vhost/mpassthru.c
>> new file mode 100644
>> index 0000000..86d2525
>> --- /dev/null
>> +++ b/drivers/vhost/mpassthru.c
> >@@ -0,0 +1,1264 @@
> >+
> >+#ifdef MPASSTHRU_DEBUG
>> +static int debug;
>> +
>> +#define DBG  if (mp->debug) printk
> >+#define DBG1 if (debug == 2) printk
> >+#else
> >+#define DBG(a...)
> >+#define DBG1(a...)
> >+#endif

>This should probably just use the existing dev_dbg/pr_debug infrastructure.

	Thanks. Will try that.
> [... skipping buffer management code for now]

> >+static int mp_sendmsg(struct kiocb *iocb, struct socket *sock,
> >+			struct msghdr *m, size_t total_len)
> >+{
> >[...]

>This function looks like we should be able to easily include it into
>macvtap and get zero-copy transmits without introducing the new
>user-level interface.

>> +static int mp_recvmsg(struct kiocb *iocb, struct socket *sock,
>> +			struct msghdr *m, size_t total_len,
>> +			int flags)
>> +{
>> +	struct mp_struct *mp = container_of(sock->sk, struct mp_sock, sk)->mp;
>> +	struct page_ctor *ctor;
>> +	struct vhost_virtqueue *vq = (struct vhost_virtqueue *)(iocb->private);

>It smells like a layering violation to look at the iocb->private field
>from a lower-level driver. I would have hoped that it's possible to implement
>this without having this driver know about the higher-level vhost driver
>internals. Can you explain why this is needed?

I don't like this too, but since the kiocb is maintained by vhost with a list_head.
And mp device is responsible to collect the kiocb into the list_head,
We need something known by vhost/mp both.
 
>> +	spin_lock_irqsave(&ctor->read_lock, flag);
>> +	list_add_tail(&info->list, &ctor->readq);
> >+	spin_unlock_irqrestore(&ctor->read_lock, flag);
> >+
> >+	if (!vq->receiver) {
> >+		vq->receiver = mp_recvmsg_notify;
> >+		set_memlock_rlimit(ctor, RLIMIT_MEMLOCK,
> >+				   vq->num * 4096,
> >+				   vq->num * 4096);
> >+	}
> >+
> >+	return 0;
> >+}

>Not sure what I'm missing, but who calls the vq->receiver? This seems
>to be neither in the upstream version of vhost nor introduced by your
>patch.

See Patch v3 2/3 I have sent out, it is called by handle_rx() in vhost.

>> +static void __mp_detach(struct mp_struct *mp)
>> +{
> >+	mp->mfile = NULL;
> >+
> >+	mp_dev_change_flags(mp->dev, mp->dev->flags & ~IFF_UP);
> >+	page_ctor_detach(mp);
> >+	mp_dev_change_flags(mp->dev, mp->dev->flags | IFF_UP);
> >+
> >+	/* Drop the extra count on the net device */
> >+	dev_put(mp->dev);
> >+}
> >+
> >+static DEFINE_MUTEX(mp_mutex);
> >+
> >+static void mp_detach(struct mp_struct *mp)
> >+{
> >+	mutex_lock(&mp_mutex);
> >+	__mp_detach(mp);
> >+	mutex_unlock(&mp_mutex);
> >+}
> >+
> >+static void mp_put(struct mp_file *mfile)
> >+{
> >+	if (atomic_dec_and_test(&mfile->count))
> >+		mp_detach(mfile->mp);
> >+}
> >+
> >+static int mp_release(struct socket *sock)
> >+{
> >+	struct mp_struct *mp = container_of(sock->sk, struct mp_sock, sk)->mp;
> >+	struct mp_file *mfile = mp->mfile;
> >+
> >+	mp_put(mfile);
> >+	sock_put(mp->socket.sk);
> >+	put_net(mfile->net);
> >+
> >+	return 0;
> >+}

>Doesn't this prevent the underlying interface from going away while the chardev
>is open? You also have logic to handle that case, so why do you keep the extra
>reference on the netdev?

Let me think.

>> +/* Ops structure to mimic raw sockets with mp device */
>> +static const struct proto_ops mp_socket_ops = {
>> +	.sendmsg = mp_sendmsg,
>> +	.recvmsg = mp_recvmsg,
>> +	.release = mp_release,
>> +};

>> +static int mp_chr_open(struct inode *inode, struct file * file)
>> +{
>> +	struct mp_file *mfile;
>>+	cycle_kernel_lock();

>I don't think you really want to use the BKL here, just kill that line.

>> +static long mp_chr_ioctl(struct file *file, unsigned int cmd,
>> +		unsigned long arg)
>> +{
>> +	struct mp_file *mfile = file->private_data;
>> +	struct mp_struct *mp;
>> +	struct net_device *dev;
>> +	void __user* argp = (void __user *)arg;
>> +	struct ifreq ifr;
>> +	struct sock *sk;
>> +	int ret;
>> +
>> +	ret = -EINVAL;
>> +
>> +	switch (cmd) {
>> +	case MPASSTHRU_BINDDEV:
>> +		ret = -EFAULT;
>> +		if (copy_from_user(&ifr, argp, sizeof ifr))
>> +			break;

>This is broken for 32 bit compat mode ioctls, because struct ifreq
>is different between 32 and 64 bit systems. Since you are only
>using the device name anyway, a fixed length string or just the
>interface index would be simpler and work better.

 Thanks, will look into this.

>> +		ifr.ifr_name[IFNAMSIZ-1] = '\0';
>> +
>> +		ret = -EBUSY;
>> +
>> +		if (ifr.ifr_flags & IFF_MPASSTHRU_EXCL)
>> +			break;

>Your current use of the IFF_MPASSTHRU* flags does not seem to make
>any sense whatsoever. You check that this flag is never set, but set
>it later yourself and then ignore all flags.

Using that flag is tried to prevent if another one wants to bind the same device
Again. But I will see if it really ignore all other flags.

>> +		ret = -ENODEV;
>> +		dev = dev_get_by_name(mfile->net, ifr.ifr_name);
>> +		if (!dev)
>> +			break;

>There is no permission checking on who can access what device, which
>seems a bit simplistic. Any user that has access to the mpassthru device
>seems to be able to bind to any network interface in the namespace.
>This is one point where the macvtap model seems more appropriate, it
>separates the permissions for creating logical interfaces and using them.

Yes, that's a problem I have not addressed yet.

>> +static ssize_t mp_chr_aio_write(struct kiocb *iocb, const struct iovec *iov,
>> +				unsigned long count, loff_t pos)
>> +{
>> +	struct file *file = iocb->ki_filp;
>> +	struct mp_struct *mp = mp_get(file->private_data);
>> +	struct sock *sk = mp->socket.sk;
>> +	struct sk_buff *skb;
>> +	int len, err;
>> +	ssize_t result;

>Can you explain what this function is even there for? AFAICT, vhost-net
>doesn't call it, the interface is incompatible with the existing
>tap interface, and you don't provide a read function.

 I saw Michael have given the answer already.

>> diff --git a/include/linux/mpassthru.h b/include/linux/mpassthru.h
>> new file mode 100644
>> index 0000000..2be21c5
>> --- /dev/null
>> +++ b/include/linux/mpassthru.h
>> @@ -0,0 +1,29 @@
>> +#ifndef __MPASSTHRU_H
>> +#define __MPASSTHRU_H
>> +
> >+#include <linux/types.h>
>> +#include <linux/if_ether.h>
>> +
>> +/* ioctl defines */
>> +#define MPASSTHRU_BINDDEV      _IOW('M', 213, int)
> >+#define MPASSTHRU_UNBINDDEV    _IOW('M', 214, int)

>These definitions are slightly wrong, because you pass more than just an 'int'.

 Thanks. I wrote them too quickly. :-(

>> +/* MPASSTHRU ifc flags */
>> +#define IFF_MPASSTHRU		0x0001
>> +#define IFF_MPASSTHRU_EXCL	0x0002

>As mentioned above, these flags don't make any sense with your current code.

I used them try to prevent the one who want to bind the same device again.
	Arnd

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Eric Dumazet @ 2010-04-15  8:58 UTC (permalink / raw)
  To: David Miller; +Cc: xiaosuo, therbert, eparis, netdev
In-Reply-To: <20100415.013347.98375530.davem@davemloft.net>

Le jeudi 15 avril 2010 à 01:33 -0700, David Miller a écrit :

> Yes, this looks more reasonable.  Eric if you agree please (re-)submit
> this formally, I must have missed this somehow, sorry.
> 
> And this is a bug fix in any kernel, not just one's that have RPS
> patches applied.
> 
> If we are not called from some interrupt context, there is no sure
> trigger to make sure software interrupts will be executed after the
> packet is queued locally.  netif_rx_ni() makes sure that any pending
> software interrupts will run in such cases.

Our mails crossed ;)

Yes I think it's more reasonable to fix it like this, I'll submit a
patch after fully testing it :)




^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, hadi, shemminger, netdev, robert, xiaosuo, andi
In-Reply-To: <1271278661.16881.1761.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 14 Apr 2010 22:57:41 +0200

> RPS can be tuned (Changli wants a finer tuning...), it would be
> intereting to tune multiqueue devices too. I dont know if its possible
> right now.

Only NIU allows real detailed control over queue selection and
stuff like that, because the hardware has a real TCAM for
packet matching and packets which match in TCAM entries can
steer to different collections of queues.

We have ethtool interfaces for this (ETHTOOL_GRXCLS*), so you can
change it.

For most other chips we only have interfaces for modifying the
RX hashing algorithm or what the RX hash covers, stuff like
that.

See also ETHTOOL_GRXFH, ETHTOOL_SRXFH, ETHTOOL_SRXNTUPLE, and
ETHTOOL_GRXNTUPLE, the latter two of which were added for Intel
NICs.

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-15  8:51 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <20100415.012607.15449571.davem@davemloft.net>

Le jeudi 15 avril 2010 à 01:26 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 15 Apr 2010 10:02:33 +0200
> 
> > Denys got a crash that we cannot explain yet. He said he has no
> > multiqueue devices, so obviously my patch cant help him.
> > 
> > But this patch was fixing a real issue, I believe I pointed it twice
> > already...
> 
> Ok, I thought your patch was specifically meant to fix Denys's bug.
> 
> The confusion comes from the fact that you mention Denys's crash in
> your commit message:
> 
> --------------------
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> --------------------
> 
> Anyways, I studied your patch once more and read the thread discussion
> with Krishna again and your patch looks fine.  I'll apply it to
> net-2.6, thanks!
> 

In any case, I think there is a fundamental problem with this sk
caching. Because one packet can travel in many stacked devices before
hitting the wire.

(bonding, vlan, ethernet) for example.

Socket cache is meaningfull for one level only...




^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:51 UTC (permalink / raw)
  To: hadi; +Cc: shemminger, eric.dumazet, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <1271276568.4567.59.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Apr 2010 16:22:48 -0400

> On Wed, 2010-04-14 at 12:44 -0700, Stephen Hemminger wrote:
> 
>> RPS might also interact with the core turbo boost functionality on Intel chips.
>> Newer chips will make a single core faster if other core can be kept idle.
> 
> how well does it work with Linux?

It's completely transparent and should just happen without any
BIOS tweaks.


^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:51 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, hadi, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <1271275130.16881.1749.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 14 Apr 2010 21:58:50 +0200

> Every time we change RPS to be on or off, we might have some extra
> noise. Maybe we already have this problem with irqbalance ?

irqbalance should never move network device interrupts around
under normal circumstances.  Arjan assured me that there is
specific logic in the irqbalance daemon to not move NIC
interrupts around once a target has been choosen.

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:50 UTC (permalink / raw)
  To: shemminger; +Cc: hadi, eric.dumazet, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <20100414124426.6aee95c3@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 14 Apr 2010 12:44:26 -0700

> On Wed, 14 Apr 2010 14:53:42 -0400
> jamal <hadi@cyberus.ca> wrote:
> 
>> Agreed. So to enumerate, the benefits come in if:
>> a) you have many processors
>> b) you have single-queue nic
>> c) at sub-threshold traffic you dont care about a little latency
> 
> There probably needs to be better autotuning for this, there is no reason
> that RPS to be steering packets unless the queue is getting backed up.

I disagree, if the goal is to migrate the bulk of packet processing
to where the app will actually sink and process the data then it should
forward to RPS marked cpus regardless of local queue levels.


^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Eric Dumazet @ 2010-04-15  8:49 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, therbert, eparis, netdev
In-Reply-To: <u2o412e6f7f1004150047nc9250ffuaef68be066a7575c@mail.gmail.com>

Le jeudi 15 avril 2010 à 15:47 +0800, Changli Gao a écrit :
> On Thu, Apr 15, 2010 at 3:37 PM, David Miller <davem@davemloft.net> wrote:
> > From: Changli Gao <xiaosuo@gmail.com>
> > Date: Thu, 15 Apr 2010 15:30:44 +0800
> >
> >> Should netif_rx() be used only when preemption is disabled? If not,
> >> netif_rx_ni() should be used instead.?
> >
> > netif_rx() must be invoked from a hardware or software interrupt,
> > which implies preemption disabled.
> >
> > In netif_rx_ni(), the "ni" means "not interrupt".
> >
> 
> yea, I know netif_rx_ni()'s meaning. It means that the following
> changes aren't necessary.
> 
>  #else
> -       cpu = smp_processor_id();
> +       ret = enqueue_to_backlog(skb, get_cpu());
> +       put_cpu();
> 
> ret = enqueue_to_backlog(skb, smp_processor_id()); should be OK.
> 
>  #endif
> -
> -       return enqueue_to_backlog(skb, cpu);
> +       return ret;
>  }

netif_rx is meant to be called from interrupts because it doesn't wake
up ksoftirqd.  For calling from outside interrupts, netif_rx_ni exists,
to make _sure_ do_softirq() is called.

However, netif_rx() _could_ be called from process context, it was safe,
but sofirq was a bit delayed.

Now, after RPS changes this can trigger this :

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204032] Pid: 2093, comm: avahi-daemon Tainted: G        W
2.6.34-rc3-next-20100412+ #65
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204177]  [<ffffffff810e3a89>] ? might_fault+0xb9/0xd0
[   14.204184]  [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204205]  [<ffffffff811107f1>] ? do_sync_write+0xd1/0x110
[   14.204211]  [<ffffffff810e3a3e>] ? might_fault+0x6e/0xd0
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b


We have two possibilities :

1) Make sure no netif_rx() caller is in process context, preemption
enabled.

2) Change netif_rx() to meet its initial behavior : It _can_ be called
from process context, preemption enabled. Frame might be delayed a bit
as before.

We chose 2), but David, I am not sure this is OK given git history, some
calling points were changed to avoid "'NOHZ: local_softirq_pending 08' "
messages...

----------------------------------------------------------------------------------------

commit 481a8199142c050b72bff8a1956a49fd0a75bbe0
Author: Oliver Hartkopp <oliver@hartkopp.net>
Date:   Tue Sep 15 01:31:34 2009 -0700

    can: fix NOHZ local_softirq_pending 08 warning
    
    When using nanosleep() in an userspace application we get a
ratelimit warning
    
    NOHZ: local_softirq_pending 08
    
    for 10 times.
    
    The echo of CAN frames is done from process context and softirq
context only.
    Therefore the usage of netif_rx() was wrong (for years).
    
    This patch replaces netif_rx() with netif_rx_ni() which has to be
used from
    process/softirq context. It also adds a missing comment that
can_send() must
    no be used from hardirq context.
    
    Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
    Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/can/vcan.c b/drivers/net/can/vcan.c
index 6971f6c..80ac563 100644
--- a/drivers/net/can/vcan.c
+++ b/drivers/net/can/vcan.c
@@ -80,7 +80,7 @@ static void vcan_rx(struct sk_buff *skb, struct
net_device *dev)
        skb->dev       = dev;
        skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-       netif_rx(skb);
+       netif_rx_ni(skb);
 }
 
 static netdev_tx_t vcan_tx(struct sk_buff *skb, struct net_device *dev)
diff --git a/net/can/af_can.c b/net/can/af_can.c
index ef1c43a..6068321 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -199,6 +199,8 @@ static int can_create(struct net *net, struct socket
*sock, int protocol)
  * @skb: pointer to socket buffer with CAN frame in data section
  * @loop: loopback for listeners on local CAN sockets (recommended
default!)
  *
+ * Due to the loopback this routine must not be called from hardirq
context.
+ *
  * Return:
  *  0 on success
  *  -ENETDOWN when the selected interface is down
@@ -278,7 +280,7 @@ int can_send(struct sk_buff *skb, int loop)
        }
 
        if (newskb)
-               netif_rx(newskb);
+               netif_rx_ni(newskb);
 
        /* update statistics */
        can_stats.tx_frames++;


commit ae3e0fcf901e4b7df87aef7ab39093e142a8de8b
Author: Holger Schurig <hs4233@mail.mn-solutions.de>
Date:   Wed Jan 16 15:48:44 2008 +0100

    libertas cs/sdio: fix 'NOHZ: local_softirq_pending 08' message
    
    netif_rx should be called only from interrupt context. if_cs and
if_sdio receive
    packets from other contexts, and thus should call netif_rx_ni.
    
    Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
    Acked-by: Holger Schurig <hs4233@mail.mn-solutions.de>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

diff --git a/drivers/net/wireless/libertas/rx.c
b/drivers/net/wireless/libertas/rx.c
index 6332fd4..149557a 100644
--- a/drivers/net/wireless/libertas/rx.c
+++ b/drivers/net/wireless/libertas/rx.c
@@ -247,7 +247,10 @@ int lbs_process_rxed_packet(struct lbs_private
*priv, struct sk_buff *skb)
        priv->stats.rx_packets++;
 
        skb->protocol = eth_type_trans(skb, dev);
-       netif_rx(skb);
+       if (in_interrupt())
+               netif_rx(skb);
+       else
+               netif_rx_ni(skb);
 
        ret = 0;
 done:

-----------------------------------------------------------------------------------------
Maybe we should add a new function after all...

int netif_rx_any(struct sk_buff *skb) 
{
       if (in_interrupt())
               return netif_rx(skb);

	return netif_rx_ni(skb);
}




^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:48 UTC (permalink / raw)
  To: hadi; +Cc: eric.dumazet, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <1271271222.4567.51.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Apr 2010 14:53:42 -0400

> On Wed, 2010-04-14 at 20:04 +0200, Eric Dumazet wrote:
> 
>> Yes, multiqueue is far better of course, but in case of hardware lacking
>> multiqueue, RPS can help many workloads, where application has _some_
>> work to do, not only counting frames or so...
> 
> Agreed. So to enumerate, the benefits come in if:
> a) you have many processors
> b) you have single-queue nic
> c) at sub-threshold traffic you dont care about a little latency
> d) you have a specific cache hierachy
> e) app is working hard to process incoming messages

A single-queue NIC is actually not a requirement, RPS helps also in
cases where you have 'N' application threads and N is less than the
number of CPUs your multi-queue NIC is distributing traffic to.

Moving the bulk of the input packet processing to the cpus where
the applications actually sit had a non-trivial benefit.  RFS takes
this aspect to yet another level.

> I think the main challenge for my pedantic mind is missing details. Is
> there a paper on rps? Example for #d above, the commit log mentions that
> rps benefits if you have certain types of "cache hierachy". Probably
> some arch with large shared L2/3 (maybe inclusive) cache will benefit.
> example: it does well on Nehalem and probably opterons as long (as you
> dont start stacking these things on some interconnect like QPI or HT).
> But what happens when you have FSB sharing across cores (still a very
> common setup)? etc etc

I think for the case where application locality is important,
RPS/RFS can help regardless of cache details.

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:42 UTC (permalink / raw)
  To: hadi; +Cc: therbert, eric.dumazet, netdev, robert, xiaosuo, andi
In-Reply-To: <1271245986.3943.55.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Apr 2010 07:53:06 -0400

> I base tested against no rps being used and a kernel which didnt
> have any RPS config on.  [BTW, I had to hand-edit the .config since
> i couldnt do it from menuconfig (Is there any reason for it to be
> so?)]

The RPS config is merely an indirect dependency on SMP as we have it
coded up in the Kconfig files, it's not meant to be user selectable
and is intended to be unconditionally on for SMP builds.

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  8:33 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <k2y412e6f7f1004150127u4e7ec668r80f066bfb3efea81@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 16:27:19 +0800

> I think the following patch from Eric should be applied instead.
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index c65f18e..d1bcc9f 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
>        newskb->pkt_type = PACKET_LOOPBACK;
>        newskb->ip_summed = CHECKSUM_UNNECESSARY;
>        WARN_ON(!skb_dst(newskb));
> -       netif_rx(newskb);
> +       netif_rx_ni(newskb);
>        return 0;
>  }

Yes, this looks more reasonable.  Eric if you agree please (re-)submit
this formally, I must have missed this somehow, sorry.

And this is a bug fix in any kernel, not just one's that have RPS
patches applied.

If we are not called from some interrupt context, there is no sure
trigger to make sure software interrupts will be executed after the
packet is queued locally.  netif_rx_ni() makes sure that any pending
software interrupts will run in such cases.

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  8:27 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.005726.205428265.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:57 PM, David Miller <davem@davemloft.net> wrote:
>
> Why?  If we are in an interrupt (either soft or hard) then
> smp_processor_id() is stable, and valid.
>
> Changli, I find it very frustrating to communicate with you, you are
> very terse in your descriptions and analysis and you make many simple
> errors that would be avoided if you spent more time thinking about
> things before sending your emails. :-/
>
> Instead of just showing some pseudo patch, state WHY it is needed.
> Talk about the execution state of environment and what rules or other
> things are being violated which must be corrected.
>

Sorry. English isn't my native language, so sometimes I can't describe
myself clearly.

I think the following patch from Eric should be applied instead.

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
       newskb->pkt_type = PACKET_LOOPBACK;
       newskb->ip_summed = CHECKSUM_UNNECESSARY;
       WARN_ON(!skb_dst(newskb));
-       netif_rx(newskb);
+       netif_rx_ni(newskb);
       return 0;
 }

As you know  "netif_rx() must be invoked from a hardware or software
interrupt, which implies preemption disabled.", obviously
ip_dev_loopback_xmit() doesn't obey this rule, so the crash isn't the
fault of net_rx(). If there are other users, who don't obey this rule,
they should be fixed too.

For this patch:

-       cpu = smp_processor_id();
+       ret = enqueue_to_backlog(skb, get_cpu());
+       put_cpu();

You said: " If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.". so we don't need to call
get_cpu() instead of smp_processor_id(). get_cpu() brings no good but
additional cost preempt_disable().

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply related

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: David Miller @ 2010-04-15  8:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <1271318553.16881.2161.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Apr 2010 10:02:33 +0200

> Denys got a crash that we cannot explain yet. He said he has no
> multiqueue devices, so obviously my patch cant help him.
> 
> But this patch was fixing a real issue, I believe I pointed it twice
> already...

Ok, I thought your patch was specifically meant to fix Denys's bug.

The confusion comes from the fact that you mention Denys's crash in
your commit message:

--------------------
When dev_pick_tx() caches tx queue_index on a socket, we must check
socket dst_entry matches skb one, or risk a crash later, as reported by
Denys Fedorysychenko, if old packets are in flight during a route
change, involving devices with different number of queues.
--------------------

Anyways, I studied your patch once more and read the thread discussion
with Krishna again and your patch looks fine.  I'll apply it to
net-2.6, thanks!


^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-15  8:02 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <20100414.235256.190096561.davem@davemloft.net>

Le mercredi 14 avril 2010 à 23:52 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 12 Apr 2010 09:18:17 +0200
> 
> > [PATCH] net: dev_pick_tx() fix
> > 
> > When dev_pick_tx() caches tx queue_index on a socket, we must check
> > socket dst_entry matches skb one, or risk a crash later, as reported by
> > Denys Fedorysychenko, if old packets are in flight during a route
> > change, involving devices with different number of queues.
> > 
> > Bug introduced by commit a4ee3ce3
> > (net: Use sk_tx_queue_mapping for connected sockets)
> > 
> > Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> It looks like Denys is still getting crashes even with this patch
> applied.  And I also think there is some meric to some of Krishna's
> analysis.
> 


> To me it seems to make more sense to validate the SKB's queue against
> the real actual choosen device's range.
> 
> The socket queue index will catch up and eventually become valid
> because the dst reset will invalidate the queue setting, and we'll
> thus recompute it as needed, as Krishna stated.
> 

??? 


> So I'm tossing this patch for now since it doesn't even aparently
> fix the bug.

I am a bit lost here David.

Denys got a crash that we cannot explain yet. He said he has no
multiqueue devices, so obviously my patch cant help him.

But this patch was fixing a real issue, I believe I pointed it twice
already...

I'll try to setup an environment to trigger this bug for real, but this
will take time, my dev machines are not multiqueue.




^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:57 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <u2o412e6f7f1004150047nc9250ffuaef68be066a7575c@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 15:47:26 +0800

> On Thu, Apr 15, 2010 at 3:37 PM, David Miller <davem@davemloft.net> wrote:
>> From: Changli Gao <xiaosuo@gmail.com>
>> Date: Thu, 15 Apr 2010 15:30:44 +0800
>>
>>> Should netif_rx() be used only when preemption is disabled? If not,
>>> netif_rx_ni() should be used instead.?
>>
>> netif_rx() must be invoked from a hardware or software interrupt,
>> which implies preemption disabled.
>>
>> In netif_rx_ni(), the "ni" means "not interrupt".
>>
> 
> yea, I know netif_rx_ni()'s meaning. It means that the following
> changes aren't necessary.
> 
>  #else
> -       cpu = smp_processor_id();
> +       ret = enqueue_to_backlog(skb, get_cpu());
> +       put_cpu();
> 

Why?  If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.

Changli, I find it very frustrating to communicate with you, you are
very terse in your descriptions and analysis and you make many simple
errors that would be avoided if you spent more time thinking about
things before sending your emails. :-/

Instead of just showing some pseudo patch, state WHY it is needed.
Talk about the execution state of environment and what rules or other
things are being violated which must be corrected.

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  7:47 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.003711.159334670.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:37 PM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Thu, 15 Apr 2010 15:30:44 +0800
>
>> Should netif_rx() be used only when preemption is disabled? If not,
>> netif_rx_ni() should be used instead.?
>
> netif_rx() must be invoked from a hardware or software interrupt,
> which implies preemption disabled.
>
> In netif_rx_ni(), the "ni" means "not interrupt".
>

yea, I know netif_rx_ni()'s meaning. It means that the following
changes aren't necessary.

 #else
-       cpu = smp_processor_id();
+       ret = enqueue_to_backlog(skb, get_cpu());
+       put_cpu();

ret = enqueue_to_backlog(skb, smp_processor_id()); should be OK.

 #endif
-
-       return enqueue_to_backlog(skb, cpu);
+       return ret;
 }

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:37 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <l2p412e6f7f1004150030o8a406133s91d6614cbb106796@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 15:30:44 +0800

> Should netif_rx() be used only when preemption is disabled? If not,
> netif_rx_ni() should be used instead.?

netif_rx() must be invoked from a hardware or software interrupt,
which implies preemption disabled.

In netif_rx_ni(), the "ni" means "not interrupt".

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  7:30 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.001446.244372815.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:14 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 13 Apr 2010 09:14:17 +0200
>
>> [PATCH net-next-2.6] net: netif_rx() must disable preemption
>>
>> Eric Paris reported netif_rx() is calling smp_processor_id() from
>> preemptible context, in particular when caller is
>> ip_dev_loopback_xmit().
>>
>> RPS commit added this smp_processor_id() call, this patch makes sure
>> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
>> can dot it a bit earlier.
>>
>> Reported-by: Eric Paris <eparis@redhat.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> I've applied this with some coding style fixups.
>
> Thanks!
>
> --------------------
> net: netif_rx() must disable preemption
>
> Eric Paris reported netif_rx() is calling smp_processor_id() from
> preemptible context, in particular when caller is
> ip_dev_loopback_xmit().
>
> RPS commit added this smp_processor_id() call, this patch makes sure
> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
> can dot it a bit earlier.
>
> Reported-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/core/dev.c |   25 +++++++++++++++----------
>  1 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 876b111..e8041eb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
>  /*
>  * get_rps_cpu is called from netif_receive_skb and returns the target
>  * CPU from the RPS map of the receiving queue for a given skb.
> + * rcu_read_lock must be held on entry.
>  */
>  static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
>  {
> @@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
>        u8 ip_proto;
>        u32 addr1, addr2, ports, ihl;
>
> -       rcu_read_lock();
> -
>        if (skb_rx_queue_recorded(skb)) {
>                u16 index = skb_get_rx_queue(skb);
>                if (unlikely(index >= dev->num_rx_queues)) {
> @@ -2296,7 +2295,6 @@ got_hash:
>        }
>
>  done:
> -       rcu_read_unlock();
>        return cpu;
>  }
>
> @@ -2392,7 +2390,7 @@ enqueue:
>
>  int netif_rx(struct sk_buff *skb)
>  {
> -       int cpu;
> +       int ret;
>
>        /* if netpoll wants it, pretend we never saw it */
>        if (netpoll_rx(skb))
> @@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
>                net_timestamp(skb);
>
>  #ifdef CONFIG_RPS
> -       cpu = get_rps_cpu(skb->dev, skb);
> -       if (cpu < 0)
> -               cpu = smp_processor_id();
> +       {
> +               int cpu;
> +
> +               rcu_read_lock();
> +               cpu = get_rps_cpu(skb->dev, skb);
> +               if (cpu < 0)
> +                       cpu = smp_processor_id();
> +               ret = enqueue_to_backlog(skb, cpu);
> +               rcu_read_unlock();
> +       }
>  #else
> -       cpu = smp_processor_id();
> +       ret = enqueue_to_backlog(skb, get_cpu());
> +       put_cpu();
>  #endif
> -
> -       return enqueue_to_backlog(skb, cpu);
> +       return ret;
>  }
>  EXPORT_SYMBOL(netif_rx);
>

Should netif_rx() be used only when preemption is disabled? If not,
netif_rx_ni() should be used instead.?


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH 1/2] igb: double increment nr_frags
From: David Miller @ 2010-04-15  7:28 UTC (permalink / raw)
  To: sanagi.koki
  Cc: netdev, e1000-devel, jeffrey.t.kirsher, jesse.brandeburg,
	bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <4BC6BE3C.6070204@jp.fujitsu.com>

Your email client corrupted your patch by applying text
formatting to it.

Please disable all text formatting in your email client
and resend your changes.

You can read Documentation/email-clients.txt for help in
this area.

^ permalink raw reply

* [PATCH 2/2] igbvf: dobule increment nr_frags
From: Koki Sanagi @ 2010-04-15  7:23 UTC (permalink / raw)
  To: netdev, e1000-devel
  Cc: davem, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	peter.p.waskiewicz.jr, john.ronciak

There is no need to increment nr_frags becasue skb_fill_page_desc increments
it.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 drivers/net/igbvf/netdev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
index ea8abf5..97cfc60 100644
--- a/drivers/net/igbvf/netdev.c
+++ b/drivers/net/igbvf/netdev.c
@@ -288,7 +288,7 @@ static bool igbvf_clean_rx_irq(struct igbvf_adapter *adapter,
 			               PCI_DMA_FROMDEVICE);
 			buffer_info->page_dma = 0;
 
-			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
+			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags,
 			                   buffer_info->page,
 			                   buffer_info->page_offset,
 			                   length);


^ permalink raw reply related

* [PATCH 1/2] igb: double increment nr_frags
From: Koki Sanagi @ 2010-04-15  7:20 UTC (permalink / raw)
  To: netdev, e1000-devel
  Cc: davem, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	peter.p.waskiewicz.jr, john.ronciak

There is no need to increment nr_frags becasue skb_fill_page_desc increments
it.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
  drivers/net/igb/igb_main.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 78cc742..8bde6c3 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -5254,7 +5254,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
  				       PAGE_SIZE / 2, PCI_DMA_FROMDEVICE);
  			buffer_info->page_dma = 0;
  
-			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
+			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags,
  						buffer_info->page,
  						buffer_info->page_offset,
  						length);


^ permalink raw reply related

* Re: [PATCH] CONFIG_SMP should be CONFIG_RPS
From: David Miller @ 2010-04-15  7:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, netdev, therbert
In-Reply-To: <1271141947.16881.177.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 08:59:07 +0200

> Thanks Changli, this is part of RFS patches :)
> 

I'll apply this, Tom has to respin RFS for other reasons
already...

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, eparis, netdev
In-Reply-To: <1271142857.16881.193.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 09:14:17 +0200

> [PATCH net-next-2.6] net: netif_rx() must disable preemption
> 
> Eric Paris reported netif_rx() is calling smp_processor_id() from
> preemptible context, in particular when caller is
> ip_dev_loopback_xmit().
> 
> RPS commit added this smp_processor_id() call, this patch makes sure
> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
> can dot it a bit earlier.
> 
> Reported-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I've applied this with some coding style fixups.

Thanks!

--------------------
net: netif_rx() must disable preemption

Eric Paris reported netif_rx() is calling smp_processor_id() from
preemptible context, in particular when caller is
ip_dev_loopback_xmit().

RPS commit added this smp_processor_id() call, this patch makes sure
preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
can dot it a bit earlier.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dev.c |   25 +++++++++++++++----------
 1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 876b111..e8041eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
 /*
  * get_rps_cpu is called from netif_receive_skb and returns the target
  * CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
  */
 static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 {
@@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 	u8 ip_proto;
 	u32 addr1, addr2, ports, ihl;
 
-	rcu_read_lock();
-
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
 		if (unlikely(index >= dev->num_rx_queues)) {
@@ -2296,7 +2295,6 @@ got_hash:
 	}
 
 done:
-	rcu_read_unlock();
 	return cpu;
 }
 
@@ -2392,7 +2390,7 @@ enqueue:
 
 int netif_rx(struct sk_buff *skb)
 {
-	int cpu;
+	int ret;
 
 	/* if netpoll wants it, pretend we never saw it */
 	if (netpoll_rx(skb))
@@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
 		net_timestamp(skb);
 
 #ifdef CONFIG_RPS
-	cpu = get_rps_cpu(skb->dev, skb);
-	if (cpu < 0)
-		cpu = smp_processor_id();
+	{
+		int cpu;
+
+		rcu_read_lock();
+		cpu = get_rps_cpu(skb->dev, skb);
+		if (cpu < 0)
+			cpu = smp_processor_id();
+		ret = enqueue_to_backlog(skb, cpu);
+		rcu_read_unlock();
+	}
 #else
-	cpu = smp_processor_id();
+	ret = enqueue_to_backlog(skb, get_cpu());
+	put_cpu();
 #endif
-
-	return enqueue_to_backlog(skb, cpu);
+	return ret;
 }
 EXPORT_SYMBOL(netif_rx);
 
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH 2.6.34-git] 8139too: fix Coding Styles
From: David Miller @ 2010-04-15  7:08 UTC (permalink / raw)
  To: jblanco; +Cc: netdev
In-Reply-To: <ba865f7b49c5df1b71bc393ad42ff40a.squirrel@neurowork.net>

From: "Javier Blanco de Torres (Neurowork)" <jblanco@neurowork.net>
Date: Mon, 12 Apr 2010 09:36:43 +0200

> Fixed coding styles in the 8139too net driver.
> 
> Signed-off-by: Javier Blanco de Torres <jblanco@neurowork.net>
> Signed-off-by: Alejandro Sánchez Acosta <asanchez@neurowork.net>

This is why I absolutely hate pure checkpatch.pl patches, people just
try to make the tool happy and don't think about what the tool is
trying to tell them.

The worst of this is this "typedef enum" part of your changes:

-typedef enum {
+enum {
 	RTL8139 = 0,
 	RTL8129,
 } board_t;

and checkpatch was telling you:

WARNING: do not add new typedefs
#220: FILE: net/8139too.c:220:
+typedef enum {

Well, you're still adding a new type!  Getting rid of the type name is
what it's telling you to stop doing.

It's still a newly named type after your change, it wants you to get
rid of the "board_t" thing altogether.

Give the enum a real "enum" name like:

enum rtl8139_board_t {

Then use _THAT_ in the sources:

	enum rtl8139_board_t x;

The typedef section of Documentation/CodingStyle makes this very
clear.

But your entire patch is like this, the changes are largely pointless
and many of them are false interpreations of what checkpatch complains
about.

Therefore I really don't encourage that you pursue this any further,
sorry.

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: David Miller @ 2010-04-15  6:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <1271056697.16881.7.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Apr 2010 09:18:17 +0200

> [PATCH] net: dev_pick_tx() fix
> 
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> 
> Bug introduced by commit a4ee3ce3
> (net: Use sk_tx_queue_mapping for connected sockets)
> 
> Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

It looks like Denys is still getting crashes even with this patch
applied.  And I also think there is some meric to some of Krishna's
analysis.

To me it seems to make more sense to validate the SKB's queue against
the real actual choosen device's range.

The socket queue index will catch up and eventually become valid
because the dst reset will invalidate the queue setting, and we'll
thus recompute it as needed, as Krishna stated.

So I'm tossing this patch for now since it doesn't even aparently
fix the bug.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox