Netdev List
 help / color / mirror / Atom feed
* Patch "IPv6: only notify protocols if address is completely gone" has been added to the 2.6.34-stable tree
From: gregkh @ 2010-07-28  0:25 UTC (permalink / raw)
  To: shemminger, davem, emils.tantilov, emil.s.tantilov, gregkh, greg,
	netdev
  Cc: stable, stable-commits
In-Reply-To: <20100524113300.2ef38e12@nehalam>


This is a note to let you know that I've just added the patch titled

    IPv6: only notify protocols if address is completely gone

to the 2.6.34-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
    ipv6-only-notify-protocols-if-address-is-completely-gone.patch
and it can be found in the queue-2.6.34 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@kernel.org> know about it.


>From shemminger@vyatta.com  Tue Jul 27 16:57:29 2010
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 24 May 2010 11:33:00 -0700
Subject: IPv6: only notify protocols if address is completely gone
To: Emil S Tantilov <emils.tantilov@gmail.com>, "David S. Miller" <davem@davemloft.net>, Greg KH <greg@kroah.com>
Cc: NetDev <netdev@vger.kernel.org>, "Tantilov, Emil S" <emil.s.tantilov@intel.com>, stable@kernel.org
Message-ID: <20100524113300.2ef38e12@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>

(cherry picked from commit 8595805aafc8b077e01804c9a3668e9aa3510e89)

The notifier for address down should only be called if address is completely
gone, not just being marked as tentative on link transition. The code
in net-next would case bonding/sctp/s390 to see address disappear on link
down, but they would never see it reappear on link up.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/ipv6/addrconf.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2729,7 +2729,9 @@ static int addrconf_ifdown(struct net_de
 		write_unlock_bh(&idev->lock);
 
 		__ipv6_ifa_notify(RTM_DELADDR, ifa);
-		atomic_notifier_call_chain(&inet6addr_chain, NETDEV_DOWN, ifa);
+		if (ifa->dead)
+			atomic_notifier_call_chain(&inet6addr_chain,
+						   NETDEV_DOWN, ifa);
 		in6_ifa_put(ifa);
 
 		write_lock_bh(&idev->lock);


Patches currently in stable-queue which might be from shemminger@vyatta.com are

queue-2.6.34/sky2-restore-multicast-after-restart.patch
queue-2.6.34/ipv6-keep-route-for-tentative-address.patch
queue-2.6.34/ipv6-only-notify-protocols-if-address-is-completely-gone.patch
queue-2.6.34/ipv6-fix-null-reference-in-proxy-neighbor-discovery.patch
queue-2.6.34/bridge-fdb-cleanup-runs-too-often.patch

^ permalink raw reply

* Re: Make vhost multi-threaded and associate each thread to its guest's cgroup
From: Sridhar Samudrala @ 2010-07-28  0:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, lkml, kvm@vger.kernel.org, Tejun Heo, Li Zefan
In-Reply-To: <20100727204254.GA17947@redhat.com>

On Tue, 2010-07-27 at 23:42 +0300, Michael S. Tsirkin wrote:
> Sridhar,
> I pushed a patchset with all known issues fixed,
> on my vhost-net-next branch.
> 
> For now this ignores the cpu mask issue, addressing
> only the cgroups issue.
> 
> Would appreciate testing and reports.

I had to apply the following patch to get it build.
With this patch, i am seeing similar results as i saw earlier.

Thanks
Sridhar

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 996e751..8543898 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -23,6 +23,7 @@
 #include <linux/highmem.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <linux/cgroup.h>
 
 #include <linux/net.h>
 #include <linux/if_packet.h>
@@ -252,7 +253,7 @@ static long vhost_dev_set_owner(struct vhost_dev *dev)
 	}
 
 	dev->worker = worker;
-	err = cgroup_attach_task_current_cg(poller);
+	err = cgroup_attach_task_current_cg(worker);
 	if (err)
 		goto err_cgroup;
 	wake_up_process(worker);	/* avoid contributing to loadavg */

^ permalink raw reply related

* Re: Make vhost multi-threaded and associate each thread to its guest's cgroup
From: Michael S. Tsirkin @ 2010-07-28  1:12 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: netdev, lkml, kvm@vger.kernel.org, Tejun Heo, Li Zefan
In-Reply-To: <1280277701.27059.11.camel@w-sridhar.beaverton.ibm.com>

On Tue, Jul 27, 2010 at 05:41:41PM -0700, Sridhar Samudrala wrote:
> On Tue, 2010-07-27 at 23:42 +0300, Michael S. Tsirkin wrote:
> > Sridhar,
> > I pushed a patchset with all known issues fixed,
> > on my vhost-net-next branch.
> > 
> > For now this ignores the cpu mask issue, addressing
> > only the cgroups issue.
> > 
> > Would appreciate testing and reports.
> 
> I had to apply the following patch to get it build.
> With this patch, i am seeing similar results as i saw earlier.
> 
> Thanks
> Sridhar


Excellent, thanks for the testing.


> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 996e751..8543898 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -23,6 +23,7 @@
>  #include <linux/highmem.h>
>  #include <linux/slab.h>
>  #include <linux/kthread.h>
> +#include <linux/cgroup.h>
>  
>  #include <linux/net.h>
>  #include <linux/if_packet.h>
> @@ -252,7 +253,7 @@ static long vhost_dev_set_owner(struct vhost_dev *dev)
>  	}
>  
>  	dev->worker = worker;
> -	err = cgroup_attach_task_current_cg(poller);
> +	err = cgroup_attach_task_current_cg(worker);
>  	if (err)
>  		goto err_cgroup;
>  	wake_up_process(worker);	/* avoid contributing to loadavg */
> 

^ permalink raw reply

* vhost mergeable buffers guest's cgroup
From: Michael S. Tsirkin @ 2010-07-28  1:27 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: netdev, lkml, kvm@vger.kernel.org, Tejun Heo, Li Zefan, dlstevens

Sridhar, David,
I pushed a mergeable buffers patchset on my vhost-net-next branch,
on top of the threading changes.
 
This is a minimal patch, which in my testing has zero impact on
non-mergeable path.

Please give this a spin and let me know.

Thanks!

-- 
MST

^ permalink raw reply

* Re: vhost mergeable buffers guest's cgroup
From: Michael S. Tsirkin @ 2010-07-28  1:31 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: netdev, lkml, kvm@vger.kernel.org, Tejun Heo, Li Zefan, dlstevens
In-Reply-To: <20100728012735.GA23243@redhat.com>

On Wed, Jul 28, 2010 at 04:27:35AM +0300, Michael S. Tsirkin wrote:
> Sridhar, David,
> I pushed a mergeable buffers patchset on my vhost-net-next branch,
> on top of the threading changes.
>  
> This is a minimal patch, which in my testing has zero impact on
> non-mergeable path.
> 
> Please give this a spin and let me know.
> 
> Thanks!

The userspace bits can be found on my qemu-kvm tree at kernel.org,
branch vhost_mergeable.
Please use these for testing.

> -- 
> MST

^ permalink raw reply

* Re: [PATCH net-next] drivers/net/vxge/vxge-main.c: Use pr_<level> and netdev_<level>
From: Jon Mason @ 2010-07-28  3:21 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ramkrishna Vepa, Sreenivasa Honnur, David S. Miller, netdev, LKML
In-Reply-To: <1280267223.24054.44.camel@Joe-Laptop.home>

On Tue, Jul 27, 2010 at 02:47:03PM -0700, Joe Perches wrote:
> Use pr_fmt, pr_<level> and netdev_<level> where appropriate.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Acked-by: Jon Mason <jon.mason@exar.com>

> ---
>  drivers/net/vxge/vxge-main.c |   27 +++++++++++----------------
>  1 files changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
> index 94d87e8..c7c5605 100644
> --- a/drivers/net/vxge/vxge-main.c
> +++ b/drivers/net/vxge/vxge-main.c
> @@ -41,6 +41,8 @@
>  *
>  ******************************************************************************/
>  
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>  #include <linux/if_vlan.h>
>  #include <linux/pci.h>
>  #include <linux/slab.h>
> @@ -144,7 +146,7 @@ vxge_callback_link_up(struct __vxge_hw_device *hldev)
>  
>  	vxge_debug_entryexit(VXGE_TRACE, "%s: %s:%d",
>  		vdev->ndev->name, __func__, __LINE__);
> -	printk(KERN_NOTICE "%s: Link Up\n", vdev->ndev->name);
> +	netdev_notice(vdev->ndev, "Link Up\n");
>  	vdev->stats.link_up++;
>  
>  	netif_carrier_on(vdev->ndev);
> @@ -168,7 +170,7 @@ vxge_callback_link_down(struct __vxge_hw_device *hldev)
>  
>  	vxge_debug_entryexit(VXGE_TRACE,
>  		"%s: %s:%d", vdev->ndev->name, __func__, __LINE__);
> -	printk(KERN_NOTICE "%s: Link Down\n", vdev->ndev->name);
> +	netdev_notice(vdev->ndev, "Link Down\n");
>  
>  	vdev->stats.link_down++;
>  	netif_carrier_off(vdev->ndev);
> @@ -2679,7 +2681,7 @@ vxge_open(struct net_device *dev)
>  
>  	if (vxge_hw_device_link_state_get(vdev->devh) == VXGE_HW_LINK_UP) {
>  		netif_carrier_on(vdev->ndev);
> -		printk(KERN_NOTICE "%s: Link Up\n", vdev->ndev->name);
> +		netdev_notice(vdev->ndev, "Link Up\n");
>  		vdev->stats.link_up++;
>  	}
>  
> @@ -2817,7 +2819,7 @@ int do_vxge_close(struct net_device *dev, int do_io)
>  	}
>  
>  	netif_carrier_off(vdev->ndev);
> -	printk(KERN_NOTICE "%s: Link Down\n", vdev->ndev->name);
> +	netdev_notice(vdev->ndev, "Link Down\n");
>  	netif_tx_stop_all_queues(vdev->ndev);
>  
>  	/* Note that at this point xmit() is stopped by upper layer */
> @@ -3844,9 +3846,7 @@ static pci_ers_result_t vxge_io_slot_reset(struct pci_dev *pdev)
>  	struct vxgedev *vdev = netdev_priv(netdev);
>  
>  	if (pci_enable_device(pdev)) {
> -		printk(KERN_ERR "%s: "
> -			"Cannot re-enable device after reset\n",
> -			VXGE_DRIVER_NAME);
> +		netdev_err(netdev, "Cannot re-enable device after reset\n");
>  		return PCI_ERS_RESULT_DISCONNECT;
>  	}
>  
> @@ -3871,9 +3871,8 @@ static void vxge_io_resume(struct pci_dev *pdev)
>  
>  	if (netif_running(netdev)) {
>  		if (vxge_open(netdev)) {
> -			printk(KERN_ERR "%s: "
> -				"Can't bring device back up after reset\n",
> -				VXGE_DRIVER_NAME);
> +			netdev_err(netdev,
> +				   "Can't bring device back up after reset\n");
>  			return;
>  		}
>  	}
> @@ -4430,13 +4429,9 @@ static int __init
>  vxge_starter(void)
>  {
>  	int ret = 0;
> -	char version[32];
> -	snprintf(version, 32, "%s", DRV_VERSION);
>  
> -	printk(KERN_INFO "%s: Copyright(c) 2002-2010 Exar Corp.\n",
> -		VXGE_DRIVER_NAME);
> -	printk(KERN_INFO "%s: Driver version: %s\n",
> -			VXGE_DRIVER_NAME, version);
> +	pr_info("Copyright(c) 2002-2010 Exar Corp.\n");
> +	pr_info("Driver version: %s\n", DRV_VERSION);
>  
>  	verify_bandwidth();
>  
> 
> 

^ permalink raw reply

* Re: [PATCH net-next 0/8] bnx2x: move bnx2x to separate folder and divide to files
From: David Miller @ 2010-07-28  3:36 UTC (permalink / raw)
  To: dmitry; +Cc: netdev, eilong
In-Reply-To: <1280270447.11551.119.camel@lb-tlvb-dmitry>

From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Wed, 28 Jul 2010 01:40:47 +0300

> resubmit the series with fixes for git-apply. Thanks

All applied, thanks.

^ permalink raw reply

* Re: [net-next 2/3] stmmac: fix timer setup when use dual mac Kconfig
From: David Miller @ 2010-07-28  3:45 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1280225387-26240-2-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 27 Jul 2010 12:09:46 +0200

> The driver erroneously sets the tmrate to zero when the
> TMU initialisation fails. This actually generates problems
> while using the dual GMAC configuration.
> 
> With this patch, enabling both the dual gmac and the timer
> optimisation, the first interface opened will use the tmu
> channel 2, the second one won't be able to use the timer but
> will continue to work without mitigating the interrupts by
> using the external timer (i.e. TMU channel 2).
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

This is not how we do things.

All of the options that influence the driver should be right next
to the main driver option.

What the platform SOC Kconfig's can do is 'select' those option.

But even better is to get rid of all of these feature Kconfig options,
and communicate the capability in the platform_device probe
information or similar.

^ permalink raw reply

* Re: [net-next 2/3] stmmac: fix timer setup when use dual mac Kconfig
From: David Miller @ 2010-07-28  3:45 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1280225387-26240-2-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 27 Jul 2010 12:09:46 +0200

> The driver erroneously sets the tmrate to zero when the
> TMU initialisation fails. This actually generates problems
> while using the dual GMAC configuration.
> 
> With this patch, enabling both the dual gmac and the timer
> optimisation, the first interface opened will use the tmu
> channel 2, the second one won't be able to use the timer but
> will continue to work without mitigating the interrupts by
> using the external timer (i.e. TMU channel 2).
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [net-next 3/3] stmmac: fix automatic PAD/FCS stripping
From: David Miller @ 2010-07-28  3:45 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1280225387-26240-3-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 27 Jul 2010 12:09:47 +0200

> For Simple Ethernet frames (802.2 and 802.3) the GMAC Core
> never strips pad and fcs. This means the ACS has no effect
> on IPv4/6 frames.
> The FL bits, in the RDES0, include the FCS so the driver
> has to remove it in SW.
> For 802.3 frame format with LLC or LLC-SNAP, when set the ACS
> bit, the HW strips both PAD and FCS.
> The FL bits, in the RDES0, actually represents the frame length
> already stripped.
> This patch fixes this logic within the device driver that
> erroneously removed 4byte from 802.3 frames already stripped
> corrupting the payload.
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [net-next 2/3] stmmac: fix timer setup when use dual mac Kconfig
From: David Miller @ 2010-07-28  3:46 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <20100727.204528.149852941.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 27 Jul 2010 20:45:28 -0700 (PDT)

Sorry, I meant to say this in reply to patch #1 not #2 :)

> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
> Date: Tue, 27 Jul 2010 12:09:46 +0200
> 
>> The driver erroneously sets the tmrate to zero when the
>> TMU initialisation fails. This actually generates problems
>> while using the dual GMAC configuration.
>> 
>> With this patch, enabling both the dual gmac and the timer
>> optimisation, the first interface opened will use the tmu
>> channel 2, the second one won't be able to use the timer but
>> will continue to work without mitigating the interrupts by
>> using the external timer (i.e. TMU channel 2).
>> 
>> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> 
> This is not how we do things.
> 
> All of the options that influence the driver should be right next
> to the main driver option.
> 
> What the platform SOC Kconfig's can do is 'select' those option.
> 
> But even better is to get rid of all of these feature Kconfig options,
> and communicate the capability in the platform_device probe
> information or similar.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] ks8842: Support DMA when accessed via timberdale
From: David Miller @ 2010-07-28  3:48 UTC (permalink / raw)
  To: richard.rojfors; +Cc: netdev
In-Reply-To: <1280271421.11916.2.camel@debian>

From: Richard Röjfors <richard.rojfors@pelagicore.com>
Date: Wed, 28 Jul 2010 00:57:01 +0200

> This patch adds support for RX and TX DMA via the DMA API,
> this is only supported when the KS8842 is accessed via timberdale.
> 
> There is no support for DMA on the generic bus interface it self,
> a state machine inside the FPGA is handling RX and TX transfers to/from
> buffers in the FPGA. The host CPU can do DMA to and from these buffers.
> 
> The FPGA has to handle the RX interrupts, so these must be enabled in
> the ks8842 but not in the FPGA. The driver must not disable the RX interrupt
> that would mean that the data transfers into the FPGA buffers would stop.
> 
> The host shall not enable TX interrupts since TX is handled by the FPGA,
> the host is notified by DMA callbacks when transfers are finished.
> 
> Which DMA channels to use are added as parameters in the platform data struct.
> 
> Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>

Applied.

^ permalink raw reply

* Re: [patch -next] ixgbe: potential null dereference
From: David Miller @ 2010-07-28  3:48 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: error27, jesse.brandeburg, bruce.w.allan, alexander.h.duyck,
	peter.p.waskiewicz.jr, john.ronciak, donald.c.skidmore,
	mallikarjuna.chilakala, e1000-devel, netdev, kernel-janitors
In-Reply-To: <AANLkTim5LGj0KGs-0fGGAka7Fde3=NVVXWYBLbb_kVMs@mail.gmail.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Jul 2010 17:10:02 -0700

> On Tue, Jul 27, 2010 at 03:05, Dan Carpenter <error27@gmail.com> wrote:
>> The e_dev_err() macro dereferences "adapter" which is NULL here.
>>
>> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/8] bnx2x: move bnx2x to separate folder and divide to files
From: David Miller @ 2010-07-28  3:56 UTC (permalink / raw)
  To: dmitry; +Cc: netdev, eilong
In-Reply-To: <20100727.203640.173839576.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 27 Jul 2010 20:36:40 -0700 (PDT)

> From: "Dmitry Kravkov" <dmitry@broadcom.com>
> Date: Wed, 28 Jul 2010 01:40:47 +0300
> 
>> resubmit the series with fixes for git-apply. Thanks
> 
> All applied, thanks.

Ummm, so what in the world did you think was going to happen
the next time I try to merge net-2.6 into net-next-2.6?

Any clue what might happen?

Any idea?

Since you not only moved the driver into a new directory,
but also moved functions all over the damn place into new
files too, the bnx2x bug fixes in net-2.6 have to be applied
by hand by me during the merge, bit by bit.

This is one of the millions of reasons I absolutely detest multi-file
drivers, people move crap around, patches from one tree can't easily
be munged into another, etc.

If it's too big to fit in one file, your driver is too damn bloated.
End of story.  Put the thing into one file under drivers/net and
simplify _everyones_ life.

I'm fixing this merge mess up, but I'm very not happy about how
you guys staged this sequence of events.

^ permalink raw reply

* Re: [PATCH v2] macvlan: Fix rx counters update in macvlan_handle_frame()
From: David Miller @ 2010-07-28  4:03 UTC (permalink / raw)
  To: herbert; +Cc: sri, netdev
In-Reply-To: <20100728001404.GA14038@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 28 Jul 2010 08:14:04 +0800

> On Tue, Jul 27, 2010 at 12:10:07PM -0700, Sridhar Samudrala wrote:
>> Fix macvlan_handle_frame() to update the rx counters based
>> on the return value of the vlan->receive call.
>> 
>> Updated the patch to not do any packet count drops when the interface
>> is down based on Herber'ts comments.
>> 
>> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [net-2.6 PATCH] e1000e: 82577/82578 PHY register access issues
From: David Miller @ 2010-07-28  4:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, stable, bruce.w.allan
In-Reply-To: <20100727222808.25500.11033.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Jul 2010 15:28:46 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> The MAC-PHY interconnect on 82577/82578 uses a power management feature
> (called K1) which must be disabled when in 1Gbps due to a hardware issue on
> these parts.  The #define bit setting used to enable/disable K1 is
> incorrect and can cause PHY register accesses to stop working altogether
> until the next device reset.  This patch sets the register correctly.
> 
> This issue is present in kernels since 2.6.32.
> 
> CC: stable@kernel.org
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] Driver-core: Fix bluetooth network device rename regression
From: Kay Sievers @ 2010-07-28  4:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, Greg KH, Johannes Berg, Andrew Morton, Rafael J. Wysocki,
	Maciej W. Rozycki, netdev
In-Reply-To: <m1aapcejf5.fsf@fess.ebiederm.org>

On Tue, Jul 27, 2010 at 22:53, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Kay Sievers <kay.sievers@vrfy.org> writes:

>> It should only be used if it's really needed for known used userspace
>> interfaces. A few others that got converted already did not need it.
>
> Interesting.  The symlink creation is slightly buggy in that it is
> created after the uevent for device creation has been sent.  Which can
> lead to some interesting races in userspace.

At uevent time the 'subsystem' is specified by the 'sybsystem' link
and the SUBSYSTEM property in the event environment. The device should
not really rely on finding itself linked in class. The class and bus
link collections are only to find a group of devices of the same
subsystem, and this should not be a real world problem here.

Also there are plans to merge struct class and struct bus_type
completely and keep only a few flags around where stuff should show up
for compatibility. At that point all stuff will be created at the same
time.

> As for the rest the bus compat code is similar but not quite the same
> as the class code, so I would be extremely reluctant to deploy it
> except in extremely limited cases.  Backwards compatibility is
> important, and we should strive our best to maintain backwards
> compatibility it for the kernel<->userspace ABIs.

Today, the only real difference between class and bus devices are
these 'collection links', the devices otherwise look completely the
same. There should be no important difference.

Buses are very much preferred over classes today, no new stuff should
create any class. The bus directories are extendable and have a
reasonable layout with the devices/ subdirectory, unlike the flat
class directories where people got the silly idea to mix devices lists
with attributes to confuse everything.

Kay

^ permalink raw reply

* Re: [PATCH] Driver-core: Fix bluetooth network device rename  regression
From: Eric W. Biederman @ 2010-07-28  5:12 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg KH, Greg KH, Johannes Berg, Andrew Morton, Rafael J. Wysocki,
	Maciej W. Rozycki, netdev
In-Reply-To: <AANLkTimA9Mo0B6MOn31KQ=L+raeCUg6eH3Mqf=vB5AYz@mail.gmail.com>

Kay Sievers <kay.sievers@vrfy.org> writes:

> On Tue, Jul 27, 2010 at 22:53, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Kay Sievers <kay.sievers@vrfy.org> writes:
>
>>> It should only be used if it's really needed for known used userspace
>>> interfaces. A few others that got converted already did not need it.
>>
>> Interesting.  The symlink creation is slightly buggy in that it is
>> created after the uevent for device creation has been sent.  Which can
>> lead to some interesting races in userspace.
>
> At uevent time the 'subsystem' is specified by the 'sybsystem' link
> and the SUBSYSTEM property in the event environment. The device should
> not really rely on finding itself linked in class. The class and bus
> link collections are only to find a group of devices of the same
> subsystem, and this should not be a real world problem here.

I agree that there should be no real world problems.
The bottom line is that every sysfs attribute should be created
before we send the uevent or else we get horribly subtle
races or we get user space code that starts looping looking
for the interface.

> Also there are plans to merge struct class and struct bus_type
> completely and keep only a few flags around where stuff should show up
> for compatibility. At that point all stuff will be created at the same
> time.

That part seems reasonable.

>> As for the rest the bus compat code is similar but not quite the same
>> as the class code, so I would be extremely reluctant to deploy it
>> except in extremely limited cases.  Backwards compatibility is
>> important, and we should strive our best to maintain backwards
>> compatibility it for the kernel<->userspace ABIs.
>
> Today, the only real difference between class and bus devices are
> these 'collection links', the devices otherwise look completely the
> same. There should be no important difference.

I don't see the class subdirectories created for bus devices, and I
don't see any equivalent.  At least for the network devices this is a
huge difference, because the device namespace is controlled by
userspace and it is NOT ok to have namespace conflicts with arbitrary
sysfs attributes.

> Buses are very much preferred over classes today, no new stuff should
> create any class. The bus directories are extendable and have a
> reasonable layout with the devices/ subdirectory, unlike the flat
> class directories where people got the silly idea to mix devices lists
> with attributes to confuse everything.

Which is generally reasonable.  However busses appear to have the silly
idea that it is ok to mix child child device lists of different kinds
of children with attributes and confuse everything.

At the subsystem level bus devices look better.
At the individual device level bus devices stacked on bus devices 
appear to be a namespace disaster.

Eric


^ permalink raw reply

* Re: [PATCH] Driver-core: Fix bluetooth network device rename regression
From: Kay Sievers @ 2010-07-28  5:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, Greg KH, Johannes Berg, Andrew Morton, Rafael J. Wysocki,
	Maciej W. Rozycki, netdev
In-Reply-To: <m1aapcqje4.fsf@fess.ebiederm.org>

On Wed, Jul 28, 2010 at 07:12, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Kay Sievers <kay.sievers@vrfy.org> writes:
>> On Tue, Jul 27, 2010 at 22:53, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Kay Sievers <kay.sievers@vrfy.org> writes:
>>
>>>> It should only be used if it's really needed for known used userspace
>>>> interfaces. A few others that got converted already did not need it.
>>>
>>> Interesting.  The symlink creation is slightly buggy in that it is
>>> created after the uevent for device creation has been sent.  Which can
>>> lead to some interesting races in userspace.
>>
>> At uevent time the 'subsystem' is specified by the 'sybsystem' link
>> and the SUBSYSTEM property in the event environment. The device should
>> not really rely on finding itself linked in class. The class and bus
>> link collections are only to find a group of devices of the same
>> subsystem, and this should not be a real world problem here.
>
> I agree that there should be no real world problems.
> The bottom line is that every sysfs attribute should be created
> before we send the uevent or else we get horribly subtle
> races or we get user space code that starts looping looking
> for the interface.

Yeah, but most of these things we should have fixed over the last
years. There is no single WAIT_FOR instruction left in udev rules. :)

>> Also there are plans to merge struct class and struct bus_type
>> completely and keep only a few flags around where stuff should show up
>> for compatibility. At that point all stuff will be created at the same
>> time.
>
> That part seems reasonable.
>
>>> As for the rest the bus compat code is similar but not quite the same
>>> as the class code, so I would be extremely reluctant to deploy it
>>> except in extremely limited cases.  Backwards compatibility is
>>> important, and we should strive our best to maintain backwards
>>> compatibility it for the kernel<->userspace ABIs.
>>
>> Today, the only real difference between class and bus devices are
>> these 'collection links', the devices otherwise look completely the
>> same. There should be no important difference.
>
> I don't see the class subdirectories created for bus devices, and I
> don't see any equivalent.

Sure, I meant, if the class-compat stuff is used when a conversion is done.

> At least for the network devices this is a
> huge difference, because the device namespace is controlled by
> userspace and it is NOT ok to have namespace conflicts with arbitrary
> sysfs attributes.

Yeah, that's why the class-glue directories exist. But so far we did
not support stacking of classes of different types.

>> Buses are very much preferred over classes today, no new stuff should
>> create any class. The bus directories are extendable and have a
>> reasonable layout with the devices/ subdirectory, unlike the flat
>> class directories where people got the silly idea to mix devices lists
>> with attributes to confuse everything.
>
> Which is generally reasonable.  However buses appear to have the silly
> idea that it is ok to mix child child device lists of different kinds
> of children with attributes and confuse everything.

Classes do the same. They are usually distinguished by its name, like
mouseX, inputX, sdX, sdX3, they are all different, but still belong to
the same 'subsystem'.

> At the subsystem level bus devices look better.
> At the individual device level bus devices stacked on bus devices
> appear to be a namespace disaster.

They are usually created by the same code, in many cases by the same
drivers, and have not been a real problem so far. As you said, network
devices are special here, because of the ability to rename them from
userspace.

At some time in the future, when buses and classes are merged, I
expect stuff can just set a flag to have a 'glue dir' created or not.

For now 'glue dirs' are limited to be created between a bus and a
class device. It could possibly be extended to be created between
classes of different types to handle issues like this.

Kay

^ permalink raw reply

* [RFC PATCH v4 0/5] netdev: show a process of packets
From: Koki Sanagi @ 2010-07-28  6:26 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro,
	nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec,
	mathieu.desnoyers

CHANGE-LOG since v3:
    1) change arguments of softirq tracepoint into original one.
    2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked
       and add trace_kfree_skb before __kfree_skb instead of them.
    3) add tracepoint to netif_rx and display it by netdev-times script.

These patch-set adds tracepoints to show us a process of packets.
Using these tracepoints and existing points, we can get the time when
packet passes through some points in transmit or receive sequence.
For example, this is an output of perf script which is attached by patch 5/5.

106133.171439sec cpu=0
  irq_entry(+0.000msec irq=24:eth4)
         |
  softirq_entry(+0.006msec)
         |
         |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
         |            |
         |      skb_copy_datagram_iovec(+0.039msec 10291::10291)
         |
  napi_poll_exit(+0.022msec eth4)

106134.175634sec cpu=1
  irq_entry(+0.000msec irq=28:eth1)
         |
         |---netif_rx(+0.009msec skb=f3ef0a00)
         |
  softirq_entry(+0.018msec)
         |
         |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84)
         |            |
         |      skb_copy_datagram_iovec(+0.033msec 0:swapper)
         |
  napi_poll_exit(+0.035msec (no_device))

The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can
show receive sequence frominterrupt(irq_entry) to application
(skb_copy_datagram_iovec). 
This script shows one NET_RX softirq and events related to it. All relative
time bases on first irq_entry which raise NET_RX softirq.

   dev    len      Qdisc               netdevice             free
   eth4    74 106125.030004sec        0.006msec             0.009msec
   eth4    87 106125.041020sec        0.007msec             0.023msec
   eth4    66 106125.042291sec        0.003msec             0.012msec
   eth4    66 106125.043274sec        0.006msec             0.004msec
   eth4   850 106125.044283sec        0.007msec             0.018msec

The above is a transmit side. There are three check-time-points.
Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in
dev_hard_start_xmit. It indicates finishing putting a packet to driver.
point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet.
Values of this script are, from left, device name, length of a packet, a time of
point1, an interval time between point1 and point2 and an interval time between
point2 and point3.

These times are useful to analyze a performance or to detect a point where
packet delays. For example,
- NET_RX softirq calling is late.
- Application is late to take a packet.
- It takes much time to put a transmitting packet to driver
  (It may be caused by packed queue)

And also, these tracepoint help us to investigate a network driver's trouble
from memory dump because ftrace records it to memory. And ftrace is so light
even if always trace on. So, in a case investigating a problem which doesn't
reproduce, it is useful.

Thanks,
Koki Sanagi.

^ permalink raw reply

* [RFC PATCH v4 1/5] irq: add tracepoint to softirq_raise
From: Koki Sanagi @ 2010-07-28  6:29 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro,
	nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec,
	mathieu.desnoyers
In-Reply-To: <4C4FCDA1.3000803@jp.fujitsu.com>

CHANGE-LOG since v3:
  -change arguments of softirq tracepoint into original one.

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Add a tracepoint for tracing when softirq action is raised.

It and the existed tracepoints complete softirq's tracepoints:
softirq_raise, softirq_entry and softirq_exit.

And when this tracepoint is used in combination with
the softirq_entry tracepoint we can determine
the softirq raise latency.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

[ factorize softirq events with DECLARE_EVENT_CLASS ]
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/linux/interrupt.h  |    8 +++++++-
 include/trace/events/irq.h |   26 ++++++++++++++++++++++++--
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index c233113..5460db4 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -18,6 +18,7 @@
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
 #include <asm/system.h>
+#include <trace/events/irq.h>
 
 /*
  * These correspond to the IORESOURCE_IRQ_* defines in
@@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
 asmlinkage void __do_softirq(void);
 extern void open_softirq(int nr, void (*action)(struct softirq_action *));
 extern void softirq_init(void);
-#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
+static inline void __raise_softirq_irqoff(unsigned int nr)
+{
+	trace_softirq_raise((struct softirq_action *)&nr, NULL);
+	or_softirq_pending(1UL << nr);
+}
+
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 extern void wakeup_softirqd(void);
diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 0e4cfb6..3ddda02 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -5,7 +5,9 @@
 #define _TRACE_IRQ_H
 
 #include <linux/tracepoint.h>
-#include <linux/interrupt.h>
+
+struct irqaction;
+struct softirq_action;
 
 #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
 #define show_softirq_name(val)				\
@@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
 	),
 
 	TP_fast_assign(
-		__entry->vec = (int)(h - vec);
+		if (vec)
+			__entry->vec = (int)(h - vec);
+		else
+			__entry->vec = *((int *)h);
 	),
 
 	TP_printk("vec=%d [action=%s]", __entry->vec,
@@ -136,6 +141,23 @@ DEFINE_EVENT(softirq, softirq_exit,
 	TP_ARGS(h, vec)
 );
 
+/**
+ * softirq_raise - called immediately when a softirq is raised
+ * @h: pointer to struct softirq_action
+ * @vec: pointer to first struct softirq_action in softirq_vec array
+ *
+ * The @h parameter contains a pointer to the softirq vector number which is
+ * raised. @vec is NULL and it means @h includes vector number not
+ * softirq_action. When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq raise latency.
+ */
+DEFINE_EVENT(softirq, softirq_raise,
+
+	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+
+	TP_ARGS(h, vec)
+);
+
 #endif /*  _TRACE_IRQ_H */
 
 /* This part must be outside protection */


^ permalink raw reply related

* [RFC PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT
From: Koki Sanagi @ 2010-07-28  6:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro,
	nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec,
	mathieu.desnoyers
In-Reply-To: <4C4FCDA1.3000803@jp.fujitsu.com>

From: Neil Horman <nhorman@tuxdriver.com>

This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
the usability of napi_poll tracepoint.

          <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
          <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1

An original patch is below.
http://marc.info/?l=linux-kernel&m=126021713809450&w=2
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

And add a fix by Steven Rostedt.
http://marc.info/?l=linux-kernel&m=126150506519173&w=2

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/napi.h |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
index 188deca..8fe1e93 100644
--- a/include/trace/events/napi.h
+++ b/include/trace/events/napi.h
@@ -6,10 +6,31 @@
 
 #include <linux/netdevice.h>
 #include <linux/tracepoint.h>
+#include <linux/ftrace.h>
+
+#define NO_DEV "(no_device)"
+
+TRACE_EVENT(napi_poll,
 
-DECLARE_TRACE(napi_poll,
 	TP_PROTO(struct napi_struct *napi),
-	TP_ARGS(napi));
+
+	TP_ARGS(napi),
+
+	TP_STRUCT__entry(
+		__field(	struct napi_struct *,	napi)
+		__string(	dev_name, napi->dev ? napi->dev->name : NO_DEV)
+	),
+
+	TP_fast_assign(
+		__entry->napi = napi;
+		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+	),
+
+	TP_printk("napi poll on napi struct %p for device %s",
+		__entry->napi, __get_str(dev_name))
+);
+
+#undef NO_DEV
 
 #endif /* _TRACE_NAPI_H_ */
 


^ permalink raw reply related

* [RFC PATCH v4 3/5] netdev: add tracepoints to netdev layer
From: Koki Sanagi @ 2010-07-28  6:31 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro,
	nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec,
	mathieu.desnoyers
In-Reply-To: <4C4FCDA1.3000803@jp.fujitsu.com>

CHANGE-LOG since v3:
   -add tracepoint to netif_rx.


This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and
netif_receive_skb. These tracepoints help you to monitor network driver's
input/output.

          <idle>-0     [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84
          <idle>-0     [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84
            sshd-6828  [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226
            sshd-6828  [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/net.h |   82 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             |    6 +++
 net/core/net-traces.c      |    1 +
 3 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
new file mode 100644
index 0000000..5f247f5
--- /dev/null
+++ b/include/trace/events/net.h
@@ -0,0 +1,82 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net
+
+#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_H
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/ip.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(net_dev_xmit,
+
+	TP_PROTO(struct sk_buff *skb,
+		 int rc),
+
+	TP_ARGS(skb, rc),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+		__field(	unsigned int,	len		)
+		__field(	int,		rc		)
+		__string(	name,		skb->dev->name	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__entry->rc = rc;
+		__assign_str(name, skb->dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
+		__get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
+);
+
+DECLARE_EVENT_CLASS(net_dev_template,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+		__field(	unsigned int,	len		)
+		__string(	name,		skb->dev->name	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__assign_str(name, skb->dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u",
+		__get_str(name), __entry->skbaddr, __entry->len)
+)
+
+DEFINE_EVENT(net_dev_template, net_dev_queue,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_receive_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_rx,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+#endif /* _TRACE_NET_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/dev.c b/net/core/dev.c
index 6e1b437..9ea9c1e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -130,6 +130,7 @@
 #include <linux/jhash.h>
 #include <linux/random.h>
 #include <trace/events/napi.h>
+#include <trace/events/net.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -1981,6 +1982,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		rc = ops->ndo_start_xmit(skb, dev);
+		trace_net_dev_xmit(skb, rc);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
 		return rc;
@@ -2001,6 +2003,7 @@ gso:
 			skb_dst_drop(nskb);
 
 		rc = ops->ndo_start_xmit(nskb, dev);
+		trace_net_dev_xmit(nskb, rc);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
 				goto out_kfree_gso_skb;
@@ -2189,6 +2192,7 @@ int dev_queue_xmit(struct sk_buff *skb)
 #ifdef CONFIG_NET_CLS_ACT
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
 #endif
+	trace_net_dev_queue(skb);
 	if (q->enqueue) {
 		rc = __dev_xmit_skb(skb, q, dev, txq);
 		goto out;
@@ -2515,6 +2519,7 @@ int netif_rx(struct sk_buff *skb)
 	if (netdev_tstamp_prequeue)
 		net_timestamp_check(skb);
 
+	trace_netif_rx(skb);
 #ifdef CONFIG_RPS
 	{
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
@@ -2829,6 +2834,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	if (!netdev_tstamp_prequeue)
 		net_timestamp_check(skb);
 
+	trace_netif_receive_skb(skb);
 	if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
 		return NET_RX_SUCCESS;
 
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index afa6380..7f1bb2a 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -26,6 +26,7 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/skb.h>
+#include <trace/events/net.h>
 #include <trace/events/napi.h>
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);


^ permalink raw reply related

* [RFC PATCH v4 4/5] skb: add tracepoints to freeing skb
From: Koki Sanagi @ 2010-07-28  6:33 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro,
	nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec,
	mathieu.desnoyers
In-Reply-To: <4C4FCDA1.3000803@jp.fujitsu.com>

CHANGE-LOG since v3:
  -remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked.
   Insead of them, add trace_kfree_skb before __kfree_skb in net_tx_action and
   skb_free_datagram_locked. And there is no change of about the location of
   trace_consume_skb to keep a symmetry against trace_kfree_skb.

This patch adds tracepoint to consume_skb and add trace_kfree_skb before
__kfree_skb in skb_free_datagram_locked and net_tx_action.
Combinating with tracepoint on dev_hard_start_xmit, we can check how long it
takes to free transmited packets. And using it, we can calculate how many
packets driver had at that time. It is useful when a drop of transmited packet
is a problem.

            sshd-6828  [000] 112689.258154: consume_skb: skbaddr=f2d99bb8

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/skb.h |   17 +++++++++++++++++
 net/core/datagram.c        |    1 +
 net/core/dev.c             |    2 ++
 net/core/skbuff.c          |    1 +
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..75ce9d5 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb,
 		__entry->skbaddr, __entry->protocol, __entry->location)
 );
 
+TRACE_EVENT(consume_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,	skbaddr	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+	),
+
+	TP_printk("skbaddr=%p", __entry->skbaddr)
+);
+
 TRACE_EVENT(skb_copy_datagram_iovec,
 
 	TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 251997a..282806b 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 	unlock_sock_fast(sk, slow);
 
 	/* skb is now orphaned, can be freed outside of locked section */
+	trace_kfree_skb(skb, skb_free_datagram_locked);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(skb_free_datagram_locked);
diff --git a/net/core/dev.c b/net/core/dev.c
index 9ea9c1e..92d79bf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -131,6 +131,7 @@
 #include <linux/random.h>
 #include <trace/events/napi.h>
 #include <trace/events/net.h>
+#include <trace/events/skb.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -2577,6 +2578,7 @@ static void net_tx_action(struct softirq_action *h)
 			clist = clist->next;
 
 			WARN_ON(atomic_read(&skb->users));
+			trace_kfree_skb(skb, net_tx_action);
 			__kfree_skb(skb);
 		}
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7da58a2..9cb91a5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
+	trace_consume_skb(skb);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(consume_skb);


^ permalink raw reply related

* [RFC PATCH v4 5/5] perf:add a script shows a process of packet
From: Koki Sanagi @ 2010-07-28  6:35 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro,
	nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec,
	mathieu.desnoyers
In-Reply-To: <4C4FCDA1.3000803@jp.fujitsu.com>

Add a perf script which shows a process of packets and processed time.
It helps us to investigate networking or network device.

If you want to use it, install perf and record perf.data like following.

#perf trace record netdev-times [script]

If you set script, perf gathers records until it ends.
If not, you must Ctrl-C to stop recording.

And if you want a report from record,

#perf trace report netdev-times [options]

If you use some options, you can limit an output.
Option is below.

tx: show only process of tx packets
rx: show only process of rx packets
dev=: show a process specified with this option
debug: work with debug mode. It shows buffer status.

For example, if you want to show a process of received packets associated
with eth4,

#perf trace report netdev-times rx dev=eth4
106133.171439sec cpu=0
  irq_entry(+0.000msec irq=24:eth4)
         |
  softirq_entry(+0.006msec)
         |
         |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
         |            |
         |      skb_copy_datagram_iovec(+0.039msec 10291::10291)
         |
  napi_poll_exit(+0.022msec eth4)

This perf script helps us to analyze a process time of transmit/receive
sequence.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 tools/perf/scripts/python/bin/netdev-times-record |    8 +
 tools/perf/scripts/python/bin/netdev-times-report |    5 +
 tools/perf/scripts/python/netdev-times.py         |  464 +++++++++++++++++++++
 3 files changed, 477 insertions(+), 0 deletions(-)

diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record
new file mode 100644
index 0000000..2b59511
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-record
@@ -0,0 +1,8 @@
+#!/bin/bash
+perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue	\
+		-e net:netif_receive_skb -e net:netif_rx		\
+		-e skb:consume_skb -e skb:kfree_skb			\
+		-e skb:skb_copy_datagram_iovec -e napi:napi_poll	\
+		-e irq:irq_handler_entry -e irq:irq_handler_exit	\
+		-e irq:softirq_entry -e irq:softirq_exit		\
+		-e irq:softirq_raise $@
diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report
new file mode 100644
index 0000000..c3d0a63
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-report
@@ -0,0 +1,5 @@
+#!/bin/bash
+# description: display a process of packet and processing time
+# args: [tx] [rx] [dev=] [debug]
+
+perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@
diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py
new file mode 100644
index 0000000..9aa0a32
--- /dev/null
+++ b/tools/perf/scripts/python/netdev-times.py
@@ -0,0 +1,464 @@
+# Display a process of packets and processed time.
+# It helps us to investigate networking or network device.
+#
+# options
+# tx: show only tx chart
+# rx: show only rx chart
+# dev=: show only thing related to specified device
+# debug: work with debug mode. It shows buffer status.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+all_event_list = []; # insert all tracepoint event related with this script
+irq_dic = {}; # key is cpu and value is a list which stacks irqs
+              # which raise NET_RX softirq
+net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry
+		 # and a list which stacks receive
+receive_hunk_list = []; # a list which include a sequence of receive events
+rx_skb_list = []; # received packet list for matching
+		       # skb_copy_datagram_iovec
+
+buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and
+		       # tx_xmit_list
+of_count_rx_skb_list = 0; # overflow count
+
+tx_queue_list = []; # list of packets which pass through dev_queue_xmit
+of_count_tx_queue_list = 0; # overflow count
+
+tx_xmit_list = [];  # list of packets which pass through dev_hard_start_xmit
+of_count_tx_xmit_list = 0; # overflow count
+
+tx_free_list = [];  # list of packets which is freed
+
+# options
+show_tx = 0;
+show_rx = 0;
+dev = 0; # store a name of device specified by option "dev="
+debug = 0;
+
+# indices of event_info tuple
+EINFO_IDX_NAME=   0
+EINFO_IDX_CONTEXT=1
+EINFO_IDX_CPU=    2
+EINFO_IDX_TIME=   3
+EINFO_IDX_PID=    4
+EINFO_IDX_COMM=   5
+
+# Calculate a time interval(msec) from src(nsec) to dst(nsec)
+def diff_msec(src, dst):
+	return (dst - src) / 1000000.0
+
+# Display a process of transmitting a packet
+def print_transmit(hunk):
+	if dev != 0 and hunk['dev'].find(dev) < 0:
+		return
+	print "%7s %5d %6d.%06dsec %12.3fmsec      %12.3fmsec" % \
+		(hunk['dev'], hunk['len'],
+		nsecs_secs(hunk['queue_t']),
+		nsecs_nsecs(hunk['queue_t'])/1000,
+		diff_msec(hunk['queue_t'], hunk['xmit_t']),
+		diff_msec(hunk['xmit_t'], hunk['free_t']))
+
+# Format for displaying rx packet processing
+PF_IRQ_ENTRY= "  irq_entry(+%.3fmsec irq=%d:%s)"
+PF_SOFT_ENTRY="  softirq_entry(+%.3fmsec)"
+PF_NAPI_POLL= "  napi_poll_exit(+%.3fmsec %s)"
+PF_JOINT=     "         |"
+PF_WJOINT=    "         |            |"
+PF_NET_RECV=  "         |---netif_receive_skb(+%.3fmsec skb=%x len=%d)"
+PF_NET_RX=    "         |---netif_rx(+%.3fmsec skb=%x)"
+PF_CPY_DGRAM= "         |      skb_copy_datagram_iovec(+%.3fmsec %d:%s)"
+PF_KFREE_SKB= "         |      kfree_skb(+%.3fmsec location=%x)"
+PF_CONS_SKB=  "         |      consume_skb(+%.3fmsec)"
+
+# Display a process of received packets and interrputs associated with
+# a NET_RX softirq
+def print_receive(hunk):
+	show_hunk = 0
+	irq_list = hunk['irq_list']
+	cpu = irq_list[0]['cpu']
+	base_t = irq_list[0]['irq_ent_t']
+	# check if this hunk should be showed
+	if dev != 0:
+		for i in range(len(irq_list)):
+			if irq_list[i]['name'].find(dev) >= 0:
+				show_hunk = 1
+				break
+	else:
+		show_hunk = 1
+	if show_hunk == 0:
+		return
+
+	print "%d.%06dsec cpu=%d" % \
+		(nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu)
+	for i in range(len(irq_list)):
+		print PF_IRQ_ENTRY % \
+			(diff_msec(base_t, irq_list[i]['irq_ent_t']),
+			irq_list[i]['irq'], irq_list[i]['name'])
+		print PF_JOINT
+		irq_event_list = irq_list[i]['event_list']
+		for j in range(len(irq_event_list)):
+			irq_event = irq_event_list[j]
+			if irq_event['event'] == 'netif_rx':
+				print PF_NET_RX % \
+					(diff_msec(base_t, irq_event['time']),
+					irq_event['skbaddr'])
+				print PF_JOINT
+	print PF_SOFT_ENTRY % \
+		diff_msec(base_t, hunk['sirq_ent_t'])
+	print PF_JOINT
+	event_list = hunk['event_list']
+	for i in range(len(event_list)):
+		event = event_list[i]
+		if event['event_name'] == 'napi_poll':
+			print PF_NAPI_POLL % \
+			    (diff_msec(base_t, event['event_t']), event['dev'])
+			if i == len(event_list) - 1:
+				print ""
+			else:
+				print PF_JOINT
+		else:
+			print PF_NET_RECV % \
+			    (diff_msec(base_t, event['event_t']), event['skbaddr'],
+				event['len'])
+			if 'comm' in event.keys():
+				print PF_WJOINT
+				print PF_CPY_DGRAM % \
+					(diff_msec(base_t, event['comm_t']),
+					event['pid'], event['comm'])
+			elif 'handle' in event.keys():
+				print PF_WJOINT
+				if event['handle'] == "kfree_skb":
+					print PF_KFREE_SKB % \
+						(diff_msec(base_t,
+						event['comm_t']),
+						event['location'])
+				elif event['handle'] == "consume_skb":
+					print PF_CONS_SKB % \
+						diff_msec(base_t,
+							event['comm_t'])
+			print PF_JOINT
+
+def trace_begin():
+	global show_tx
+	global show_rx
+	global dev
+	global debug
+
+	for i in range(len(sys.argv)):
+		if i == 0:
+			continue
+		arg = sys.argv[i]
+		if arg == 'tx':
+			show_tx = 1
+		elif arg =='rx':
+			show_rx = 1
+		elif arg.find('dev=',0, 4) >= 0:
+			dev = arg[4:]
+		elif arg == 'debug':
+			debug = 1
+	if show_tx == 0  and show_rx == 0:
+		show_tx = 1
+		show_rx = 1
+
+def trace_end():
+	# order all events in time
+	all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME],
+					    b[EINFO_IDX_TIME]))
+	# process all events
+	for i in range(len(all_event_list)):
+		event_info = all_event_list[i]
+		name = event_info[EINFO_IDX_NAME]
+		if name == 'irq__softirq_exit':
+			handle_irq_softirq_exit(event_info)
+		elif name == 'irq__softirq_entry':
+			handle_irq_softirq_entry(event_info)
+		elif name == 'irq__softirq_raise':
+			handle_irq_softirq_raise(event_info)
+		elif name == 'irq__irq_handler_entry':
+			handle_irq_handler_entry(event_info)
+		elif name == 'irq__irq_handler_exit':
+			handle_irq_handler_exit(event_info)
+		elif name == 'napi__napi_poll':
+			handle_napi_poll(event_info)
+		elif name == 'net__netif_receive_skb':
+			handle_netif_receive_skb(event_info)
+		elif name == 'net__netif_rx':
+			handle_netif_rx(event_info)
+		elif name == 'skb__skb_copy_datagram_iovec':
+			handle_skb_copy_datagram_iovec(event_info)
+		elif name == 'net__net_dev_queue':
+			handle_net_dev_queue(event_info)
+		elif name == 'net__net_dev_xmit':
+			handle_net_dev_xmit(event_info)
+		elif name == 'skb__kfree_skb':
+			handle_kfree_skb(event_info)
+		elif name == 'skb__consume_skb':
+			handle_consume_skb(event_info)
+	# display receive hunks
+	if show_rx:
+		for i in range(len(receive_hunk_list)):
+			print_receive(receive_hunk_list[i])
+	# display transmit hunks
+	if show_tx:
+		print "   dev    len      Qdisc        " \
+			"       netdevice             free"
+		for i in range(len(tx_free_list)):
+			print_transmit(tx_free_list[i])
+	if debug:
+		print "debug buffer status"
+		print "----------------------------"
+		print "xmit Qdisc:remain:%d overflow:%d" % \
+			(len(tx_queue_list), of_count_tx_queue_list)
+		print "xmit netdevice:remain:%d overflow:%d" % \
+			(len(tx_xmit_list), of_count_tx_xmit_list)
+		print "receive:remain:%d overflow:%d" % \
+			(len(rx_skb_list), of_count_rx_skb_list)
+
+# called from perf, when it finds a correspoinding event
+def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm,
+			irq, irq_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			irq, irq_name)
+	all_event_list.append(event_info)
+
+def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret)
+	all_event_list.append(event_info)
+
+def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			napi, dev_name)
+	all_event_list.append(event_info)
+
+def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+			skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+			skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, skblen, rc, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, rc ,dev_name)
+	all_event_list.append(event_info)
+
+def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, protocol, location):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, protocol, location)
+	all_event_list.append(event_info)
+
+def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr)
+	all_event_list.append(event_info)
+
+def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm,
+	skbaddr, skblen):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen)
+	all_event_list.append(event_info)
+
+def handle_irq_handler_entry(event_info):
+	(name, context, cpu, time, pid, comm, irq, irq_name) = event_info
+	if cpu not in irq_dic.keys():
+		irq_dic[cpu] = []
+	irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time}
+	irq_dic[cpu].append(irq_record)
+
+def handle_irq_handler_exit(event_info):
+	(name, context, cpu, time, pid, comm, irq, ret) = event_info
+	if cpu not in irq_dic.keys():
+		return
+	irq_record = irq_dic[cpu].pop()
+	if irq != irq_record['irq']:
+		return
+	irq_record.update({'irq_ext_t':time})
+	# if an irq doesn't include NET_RX softirq, drop.
+	if 'event_list' in irq_record.keys():
+		irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_raise(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	if cpu not in irq_dic.keys() \
+	or len(irq_dic[cpu]) == 0:
+		return
+	irq_record = irq_dic[cpu].pop()
+	if 'event_list' in irq_record.keys():
+		irq_event_list = irq_record['event_list']
+	else:
+		irq_event_list = []
+	irq_event_list.append({'time':time, 'event':'sirq_raise'})
+	irq_record.update({'event_list':irq_event_list})
+	irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_entry(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]}
+
+def handle_irq_softirq_exit(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	irq_list = []
+	event_list = 0
+	if cpu in irq_dic.keys():
+		irq_list = irq_dic[cpu]
+		del irq_dic[cpu]
+	if cpu in net_rx_dic.keys():
+		sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t']
+		event_list = net_rx_dic[cpu]['event_list']
+		del net_rx_dic[cpu]
+	if irq_list == [] or event_list == 0:
+		return
+	rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time,
+		    'irq_list':irq_list, 'event_list':event_list}
+	# merge information realted to a NET_RX softirq
+	receive_hunk_list.append(rec_data)
+
+def handle_napi_poll(event_info):
+	(name, context, cpu, time, pid, comm, napi, dev_name) = event_info
+	if cpu in net_rx_dic.keys():
+		event_list = net_rx_dic[cpu]['event_list']
+		rec_data = {'event_name':'napi_poll',
+				'dev':dev_name, 'event_t':time}
+		event_list.append(rec_data)
+
+def handle_netif_rx(event_info):
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	if cpu not in irq_dic.keys() \
+	or len(irq_dic[cpu]) == 0:
+		return
+	irq_record = irq_dic[cpu].pop()
+	if 'event_list' in irq_record.keys():
+		irq_event_list = irq_record['event_list']
+	else:
+		irq_event_list = []
+	irq_event_list.append({'time':time, 'event':'netif_rx',
+		'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name})
+	irq_record.update({'event_list':irq_event_list})
+	irq_dic[cpu].append(irq_record)
+
+def handle_netif_receive_skb(event_info):
+	global of_count_rx_skb_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	if cpu in net_rx_dic.keys():
+		rec_data = {'event_name':'netif_receive_skb',
+			    'event_t':time, 'skbaddr':skbaddr, 'len':skblen}
+		event_list = net_rx_dic[cpu]['event_list']
+		event_list.append(rec_data)
+		rx_skb_list.insert(0, rec_data)
+		if len(rx_skb_list) > buffer_budget:
+			rx_skb_list.pop()
+			of_count_rx_skb_list += 1
+
+def handle_net_dev_queue(event_info):
+	global of_count_tx_queue_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time}
+	tx_queue_list.insert(0, skb)
+	if len(tx_queue_list) > buffer_budget:
+		tx_queue_list.pop()
+		of_count_tx_queue_list += 1
+
+def handle_net_dev_xmit(event_info):
+	global of_count_tx_xmit_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, rc, dev_name) = event_info
+	if rc == 0: # NETDEV_TX_OK
+		for i in range(len(tx_queue_list)):
+			skb = tx_queue_list[i]
+			if skb['skbaddr'] == skbaddr:
+				skb['xmit_t'] = time
+				tx_xmit_list.insert(0, skb)
+				del tx_queue_list[i]
+				if len(tx_xmit_list) > buffer_budget:
+					tx_xmit_list.pop()
+					of_count_tx_xmit_list += 1
+				return
+
+def handle_kfree_skb(event_info):
+	(name, context, cpu, time, pid, comm,
+		skbaddr, protocol, location) = event_info
+	for i in range(len(tx_queue_list)):
+		skb = tx_queue_list[i]
+		if skb['skbaddr'] == skbaddr:
+			del tx_queue_list[i]
+			return
+	for i in range(len(tx_xmit_list)):
+		skb = tx_xmit_list[i]
+		if skb['skbaddr'] == skbaddr:
+			skb['free_t'] = time
+			tx_free_list.append(skb)
+			del tx_xmit_list[i]
+			return
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if rec_data['skbaddr'] == skbaddr:
+			rec_data.update({'handle':"kfree_skb",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return
+
+def handle_consume_skb(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr) = event_info
+	for i in range(len(tx_xmit_list)):
+		skb = tx_xmit_list[i]
+		if skb['skbaddr'] == skbaddr:
+			skb['free_t'] = time
+			tx_free_list.append(skb)
+			del tx_xmit_list[i]
+			return
+
+def handle_skb_copy_datagram_iovec(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if skbaddr == rec_data['skbaddr']:
+			rec_data.update({'handle':"skb_copy_datagram_iovec",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox