* Re: [PATCH] don't allow CAP_NET_ADMIN to load non-netdev kernel modules
From: Michael Tokarev @ 2011-02-28 9:29 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Michał Mirosław, Ben Hutchings, David Miller, segoon,
netdev, linux-kernel, kuznet, pekkas, jmorris, yoshfuji, kaber,
eric.dumazet, therbert, xiaosuo, jesse, kees.cook, eugene,
dan.j.rosenberg, akpm
In-Reply-To: <201102272122.52643.arnd@arndb.de>
27.02.2011 23:22, Arnd Bergmann wrote:
> On Friday 25 February 2011, Michał Mirosław wrote:
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 54aaca6..0d09baa 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -1120,8 +1120,20 @@ void dev_load(struct net *net, const char *name)
>>> dev = dev_get_by_name_rcu(net, name);
>>> rcu_read_unlock();
>>>
>>> - if (!dev && capable(CAP_NET_ADMIN))
>>> - request_module("%s", name);
>>> + if (!dev && capable(CAP_NET_ADMIN)) {
>>> + /* Check whether the name looks like one that a net
>>> + * driver will generate initially. If not, require a
>>> + * module alias with a suitable prefix, so that this
>>> + * can't be used to load arbitrary modules.
>>> + */
>>> + if ((strncmp(name, "eth", 3) == 0 &&
>>> + isdigit((unsigned char)name[3])) ||
>>> + (strncmp(name, "wlan", 4) == 0 &&
>>> + isdigit((unsigned char)name[4])))
>>> + request_module("%s", name);
>>> + else
>>> + request_module("netdev-%s", name);
>>> + }
>>> }
>>> EXPORT_SYMBOL(dev_load);
>>>
>>
>> This might be better as:
>>
>> if (request_module("netdev-%s", name))
>> ... fallback
>>
>> Then after some years the fallback could be removed if announced properly.
>
> The backwards compatibility should mostly be for systems that today don't
> use split capabilities, right?
>
> The fallback could therefore rely on CAP_SYS_MODULE as well:
>
> if (request_module("netdev-%s", name)) {
> if (capable(CAP_SYS_MODULE))
> request_module("%s", name);
> }
>
> Not 100% solution, but should solve the capability escalation nicely without
> causing much pain.
To me this looks like the best solution so far - trivial and
compatible.
Thanks!
/mjt
^ permalink raw reply
* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-28 9:22 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20110227.233013.226766890.davem@davemloft.net>
Mon, Feb 28, 2011 at 08:30:13AM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Mon, 28 Feb 2011 08:07:33 +0100
>
>> Mon, Feb 28, 2011 at 12:22:08AM CET, davem@davemloft.net wrote:
>>>From: Jiri Pirko <jpirko@redhat.com>
>>>Date: Sun, 27 Feb 2011 13:58:17 +0100
>>>
>>>> That is very true. And given that af_packet uses orig_dev to obtain
>>>> ifindex, it can be replaced by skb->skb_iif. That way we can get rid of
>>>> orig_dev parameter for good.
>>>
>>>I would rather see a complete patch set submitting at a unit, thanks.
>>>
>>>I've already marked your V3 last night as "changes requested" in
>>>patchwork for this reason.
>>
>> That's a pity. V3 is complete patch. The changes we talk about in
>> discussion are just taking advantage of changes in patch V3. Would be in
>> my opinion better to apply V3 now and followup with the rest after that.
>
>Fair enough, I'll aply it now, thanks.
Applied incorrectly. net/core/dev.c part is missing
^ permalink raw reply
* Re: Bug inkvm_set_irq
From: Jean-Philippe Menil @ 2011-02-28 8:56 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: kvm, netdev, virtualization
In-Reply-To: <20110227170031.GC17973@redhat.com>
Le 27/02/2011 18:00, Michael S. Tsirkin a écrit :
> On Fri, Feb 25, 2011 at 10:07:22AM +0100, Jean-Philippe Menil wrote:
>> Hi,
>>
>> Each time i try tou use vhost_net, i'm facing a kernel bug.
>> I do a "modprobe vhost_net", and start guest whith vhost=on.
>>
>> Following is a trace with a kernel 2.6.37, but i had the same
>> problem with 2.6.36 (cf https://lkml.org/lkml/2010/11/30/29).
> 2.6.36 had a theorectical race that could explain this,
> but it should be ok in 2.6.37.
>
>> The bug only occurs whith vhost_net charged, so i don't know if this
>> is a bug in kvm module code or in the vhost_net code.
> It could be a bug in eventfd which is the interface
> used by both kvm and vhost_net.
> Just for fun, you can try 3.6.38 - eventfd code has been changed
> a lot in 2.6.38 and if it does not trigger there
> it's a hint that irqfd is the reason.
>
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.243100] BUG: unable to handle kernel paging request at
>> 0000000000002458
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.243250] IP: [<ffffffffa041aa8a>] kvm_set_irq+0x2a/0x130 [kvm]
>
> Could you run markup_oops/ ksymoops on this please?
> As far as I can see kvm_set_irq can only get a wrong
> kvm pointer. Unless there's some general memory corruption,
> I'd guess
>
> You can also try comparing the irqfd->kvm pointer in
> kvm_irqfd_assign irqfd_wakeup and kvm_set_irq in
> virt/kvm/eventfd.c.
>
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.243378] PGD 45d363067 PUD 45e77a067 PMD 0
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.243556] Oops: 0000 [#1] SMP
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.243692] last sysfs file:
>> /sys/devices/pci0000:00/0000:00:0d.0/0000:05:00.0/0000:06:00.0/irq
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [ 685.243777] CPU 0
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.243820] Modules linked in: vhost_net macvtap macvlan tun
>> powernow_k8 mperf cpufreq_userspace cpufreq_stats cpufreq_powersave
>> cpufreq_ondemand fre
>> q_table cpufreq_conservative fuse xt_physdev ip6t_LOG
>> ip6table_filter ip6_tables ipt_LOG xt_multiport xt_limit xt_tcpudp
>> xt_state iptable_filter ip_tables x_tables nf_conntrack_tftp
>> nf_conntrack_ftp nf_connt
>> rack_ipv4 nf_defrag_ipv4 8021q bridge stp ext2 mbcache
>> dm_round_robin dm_multipath nf_conntrack_ipv6 nf_conntrack
>> nf_defrag_ipv6 kvm_amd kvm ipv6 snd_pcm snd_timer snd soundcore
>> snd_page_alloc tpm_tis tpm ps
>> mouse dcdbas tpm_bios processor i2c_nforce2 shpchp pcspkr ghes
>> serio_raw joydev evdev pci_hotplug i2c_core hed button thermal_sys
>> xfs exportfs dm_mod sg sr_mod cdrom usbhid hid usb_storage ses
>> sd_mod enclosu
>> re megaraid_sas ohci_hcd lpfc scsi_transport_fc scsi_tgt bnx2
>> scsi_mod ehci_hcd [last unloaded: scsi_wait_scan]
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [ 685.246123]
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] Pid: 10, comm: kworker/0:1 Not tainted
>> 2.6.37-dsiun-110105 #17 0K543T/PowerEdge M605
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] RIP: 0010:[<ffffffffa041aa8a>] [<ffffffffa041aa8a>]
>> kvm_set_irq+0x2a/0x130 [kvm]
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] RSP: 0018:ffff88045fc89d30 EFLAGS: 00010246
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] RAX: 0000000000000000 RBX: 000000000000001a RCX:
>> 0000000000000001
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> 0000000000000000
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] RBP: 0000000000000000 R08: 0000000000000001 R09:
>> ffff880856a91e48
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] R10: 0000000000000000 R11: 00000000ffffffff R12:
>> 0000000000000000
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] R13: 0000000000000001 R14: 0000000000000000 R15:
>> 0000000000000000
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] FS: 00007f617986c710(0000) GS:ffff88007f800000(0000)
>> knlGS:0000000000000000
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] CR2: 0000000000002458 CR3: 000000045d197000 CR4:
>> 00000000000006f0
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] Process kworker/0:1 (pid: 10, threadinfo
>> ffff88045fc88000, task ffff88085fc53c30)
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [ 685.246123] Stack:
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] ffff88045fc89fd8 00000000000119c0 ffff88045fc88010
>> ffff88085fc53ee8
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] ffff88045fc89fd8 ffff88085fc53ee0 ffff88085fc53c30
>> 00000000000119c0
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] 00000000000119c0 ffffffff8137f7ce ffff88007f80df40
>> 00000000ffffffff
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] Call Trace:
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff8137f7ce>] ? common_interrupt+0xe/0x13
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffffa041bc30>] ? irqfd_inject+0x0/0x50 [kvm]
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffffa041bc57>] ? irqfd_inject+0x27/0x50 [kvm]
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffffa041bc30>] ? irqfd_inject+0x0/0x50 [kvm]
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff8106b6f2>] ? process_one_work+0x112/0x460
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff8106be25>] ? worker_thread+0x145/0x410
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff8103a3d0>] ? __wake_up_common+0x50/0x80
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff8106bce0>] ? worker_thread+0x0/0x410
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff8106bce0>] ? worker_thread+0x0/0x410
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff8106f786>] ? kthread+0x96/0xa0
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff81003ce4>] ? kernel_thread_helper+0x4/0x10
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff8106f6f0>] ? kthread+0x0/0xa0
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] [<ffffffff81003ce0>] ? kernel_thread_helper+0x0/0x10
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] Code: ff 41 57 41 89 f7 41 56 41 55 41 89 cd 41 54 49 89
>> fc 55 53 89 d3 48 81 ec 98 00 00 00 8b 15 c6 79 03 00 85 d2 0f 85 c4
>> 00 00 00<4
>> 9> 8b 84 24 58 24 00 00 3b 98 28 01 00 00 73 5e 89 db 48 8b 84
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] RIP [<ffffffffa041aa8a>] kvm_set_irq+0x2a/0x130 [kvm]
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] RSP<ffff88045fc89d30>
>> Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
>> 685.246123] CR2: 0000000000002458
>>
>>
>> If someone can help me, on how to solve this.
>>
>> Regards.
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/virtualization
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi,
thanks for your response.
This is what markup_oops.pl return me:
"No matching code found "
So this is not a vhost_net bug, or my oops is incomplete and markup_oops
can't find the good vma offset.
I will try to compare the pointers you indicate me, even it could be a
little difficult for me.
Maybe i will try a 2.6.38, will wait a response from the kvm team.
Regards.
--
Jean-Philippe Menil - Pôle réseau Service IRTS
DSI Université de Nantes
jean-philippe.menil@univ-nantes.fr
Tel : 02.53.48.49.27 - Fax : 02.53.48.49.09
^ permalink raw reply
* Re: [PATCH] cxgb{3,4}: streamline Kconfig options
From: Jan Beulich @ 2011-02-28 8:19 UTC (permalink / raw)
To: Dimitris Michailidis; +Cc: divy, David Miller, linux-kbuild, netdev
In-Reply-To: <4D680842.6050309@chelsio.com>
>>> On 25.02.11 at 20:51, Dimitris Michailidis <dm@chelsio.com> wrote:
> Dimitris Michailidis wrote:
>> Jan Beulich wrote:
>>
>>> As to that INET vs NET dependency - is it possible that the
>>> network drivers really just need NET, but the iSCSI ones need
>>> INET? In which case the only common dependency would be
>>> PCI - certainly not worth a custom helper option.
>>
>> Reading the commit message that introduced CHELSIO_T3_DEPENDS, it talks
>> of hidden dependencies that select does not see. I am not sure which
>> exactly but since it's been a few years since that commit I'll try to
>> see what the situation is today without the *_DEPENDS symbols and let
>> you know.
>
> I looked into this and found that with the current Kconfig the iSCSI driver
> does not appear in the SCSI menu until one first enables NETDEVICES and
> NETDEV_10000 in the network driver menu. It appears that the *_DEPENDS
> symbols were added to capture dependencies on such symbols within the
> network driver Kconfig, besides the dependencies the driver's entry listed
> explicitly.
>
> The patch below removes *T4*_DEPENDS and the network drivers' unnecessary
> dependency on INET, and updates the iSCSI driver's entry so it is visible
> without requiring any net driver options to be enabled first and has
> adequate selects to be able to build the net driver (this part is adapted
> from bnx2i's Kconfig entry). I still need to do the T3 part of this and
> check that there isn't a conflict with the current scsi tree. Just for
> review at this time.
Thanks, this looks good to me.
Jan
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 0382332..0d314d5 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -2619,14 +2619,9 @@ config CHELSIO_T3
> To compile this driver as a module, choose M here: the module
> will be called cxgb3.
>
> -config CHELSIO_T4_DEPENDS
> - tristate
> - depends on PCI && INET
> - default y
> -
> config CHELSIO_T4
> tristate "Chelsio Communications T4 Ethernet support"
> - depends on CHELSIO_T4_DEPENDS
> + depends on PCI
> select FW_LOADER
> select MDIO
> help
> @@ -2644,14 +2639,9 @@ config CHELSIO_T4
> To compile this driver as a module choose M here; the module
> will be called cxgb4.
>
> -config CHELSIO_T4VF_DEPENDS
> - tristate
> - depends on PCI && INET
> - default y
> -
> config CHELSIO_T4VF
> tristate "Chelsio Communications T4 Virtual Function Ethernet support"
> - depends on CHELSIO_T4VF_DEPENDS
> + depends on PCI
> help
> This driver supports Chelsio T4-based gigabit and 10Gb Ethernet
> adapters with PCI-E SR-IOV Virtual Functions.
> diff --git a/drivers/scsi/cxgbi/cxgb4i/Kconfig
> b/drivers/scsi/cxgbi/cxgb4i/Kconfig
> index bb94b39..d5302c2 100644
> --- a/drivers/scsi/cxgbi/cxgb4i/Kconfig
> +++ b/drivers/scsi/cxgbi/cxgb4i/Kconfig
> @@ -1,6 +1,8 @@
> config SCSI_CXGB4_ISCSI
> tristate "Chelsio T4 iSCSI support"
> - depends on CHELSIO_T4_DEPENDS
> + depends on PCI && INET
> + select NETDEVICES
> + select NETDEV_10000
> select CHELSIO_T4
> select SCSI_ISCSI_ATTRS
> ---help---
^ permalink raw reply
* [PATCH v2] davinci_emac: Add Carrier Link OK check in Davinci RX Handler
From: Hegde, Vinay @ 2011-02-28 8:15 UTC (permalink / raw)
To: netdev; +Cc: khilman, davinci-linux-open-source, linux-arm-kernel,
Hegde, Vinay
This patch adds an additional check in the Davinci EMAC RX Handler,
which tests the __LINK_STATE_NOCARRIER flag along with the
__LINK_STATE_START flag as part EMAC shutting down procedure.
This avoids
WARNING: at drivers/net/davinci_emac.c:1040 emac_rx_handler+0xf8/0x120()
during rtcwake used to suspend the target for a specified duration.
Signed-off-by: Hegde, Vinay <vinay.hegde@ti.com>
Acked-by: Cyril Chemparathy <cyril@ti.com>
---
since v1: Changed the subject line
drivers/net/davinci_emac.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/davinci_emac.c b/drivers/net/davinci_emac.c
index 2a628d1..7018bfe 100644
--- a/drivers/net/davinci_emac.c
+++ b/drivers/net/davinci_emac.c
@@ -1008,7 +1008,7 @@ static void emac_rx_handler(void *token, int len, int status)
int ret;
/* free and bail if we are shutting down */
- if (unlikely(!netif_running(ndev))) {
+ if (unlikely(!netif_running(ndev) || !netif_carrier_ok(ndev))) {
dev_kfree_skb_any(skb);
return;
}
--
1.7.1
^ permalink raw reply related
* [PATCH V12 4/4] ptp: Added a clock driver for the National Semiconductor PHYTER.
From: Richard Cochran @ 2011-02-28 7:58 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Alan Cox, Arnd Bergmann,
Christoph Lameter, David Miller, John Stultz, Krzysztof Halasa,
Peter Zijlstra, Rodolfo Giometti, Thomas Gleixner,
Benjamin Herrenschmidt, Mike Frysinger, Paul Mackerras,
Russell King
In-Reply-To: <cover.1298878618.git.richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>
This patch adds support for the PTP clock found on the DP83640.
The basic clock operations and one external time stamp have
been implemented.
Signed-off-by: Richard Cochran <richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>
---
drivers/net/phy/Makefile | 1 +
drivers/net/phy/dp83640.c | 1012 +++++++++++++++++++++++++++++++++++++++++
drivers/net/phy/dp83640_reg.h | 267 +++++++++++
drivers/ptp/Kconfig | 19 +
4 files changed, 1299 insertions(+), 0 deletions(-)
create mode 100644 drivers/net/phy/dp83640.c
create mode 100644 drivers/net/phy/dp83640_reg.h
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 13bebab..2333215 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_FIXED_PHY) += fixed.o
obj-$(CONFIG_MDIO_BITBANG) += mdio-bitbang.o
obj-$(CONFIG_MDIO_GPIO) += mdio-gpio.o
obj-$(CONFIG_NATIONAL_PHY) += national.o
+obj-$(CONFIG_DP83640_PHY) += dp83640.o
obj-$(CONFIG_STE10XP) += ste10Xp.o
obj-$(CONFIG_MICREL_PHY) += micrel.o
obj-$(CONFIG_MDIO_OCTEON) += mdio-octeon.o
diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
new file mode 100644
index 0000000..74f751e
--- /dev/null
+++ b/drivers/net/phy/dp83640.c
@@ -0,0 +1,1012 @@
+/*
+ * Driver for the National Semiconductor DP83640 PHYTER
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <linux/ethtool.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mii.h>
+#include <linux/module.h>
+#include <linux/net_tstamp.h>
+#include <linux/netdevice.h>
+#include <linux/phy.h>
+#include <linux/ptp_classify.h>
+#include <linux/ptp_clock_kernel.h>
+
+#include "dp83640_reg.h"
+
+#define DP83640_PHY_ID 0x20005ce1
+#define PAGESEL 0x13
+#define LAYER4 0x02
+#define LAYER2 0x01
+#define MAX_RXTS 4
+#define MAX_TXTS 4
+#define N_EXT_TS 1
+#define PSF_PTPVER 2
+#define PSF_EVNT 0x4000
+#define PSF_RX 0x2000
+#define PSF_TX 0x1000
+#define EXT_EVENT 1
+#define EXT_GPIO 1
+#define CAL_EVENT 2
+#define CAL_GPIO 9
+#define CAL_TRIGGER 2
+
+/* phyter seems to miss the mark by 16 ns */
+#define ADJTIME_FIX 16
+
+#if defined(__BIG_ENDIAN)
+#define ENDIAN_FLAG 0
+#elif defined(__LITTLE_ENDIAN)
+#define ENDIAN_FLAG PSF_ENDIAN
+#endif
+
+#define SKB_PTP_TYPE(__skb) (*(unsigned int *)((__skb)->cb))
+
+struct phy_rxts {
+ u16 ns_lo; /* ns[15:0] */
+ u16 ns_hi; /* overflow[1:0], ns[29:16] */
+ u16 sec_lo; /* sec[15:0] */
+ u16 sec_hi; /* sec[31:16] */
+ u16 seqid; /* sequenceId[15:0] */
+ u16 msgtype; /* messageType[3:0], hash[11:0] */
+};
+
+struct phy_txts {
+ u16 ns_lo; /* ns[15:0] */
+ u16 ns_hi; /* overflow[1:0], ns[29:16] */
+ u16 sec_lo; /* sec[15:0] */
+ u16 sec_hi; /* sec[31:16] */
+};
+
+struct rxts {
+ struct list_head list;
+ unsigned long tmo;
+ u64 ns;
+ u16 seqid;
+ u8 msgtype;
+ u16 hash;
+};
+
+struct dp83640_clock;
+
+struct dp83640_private {
+ struct list_head list;
+ struct dp83640_clock *clock;
+ struct phy_device *phydev;
+ struct work_struct ts_work;
+ int hwts_tx_en;
+ int hwts_rx_en;
+ int layer;
+ int version;
+ /* remember state of cfg0 during calibration */
+ int cfg0;
+ /* remember the last event time stamp */
+ struct phy_txts edata;
+ /* list of rx timestamps */
+ struct list_head rxts;
+ struct list_head rxpool;
+ struct rxts rx_pool_data[MAX_RXTS];
+ /* protects above three fields from concurrent access */
+ spinlock_t rx_lock;
+ /* queues of incoming and outgoing packets */
+ struct sk_buff_head rx_queue;
+ struct sk_buff_head tx_queue;
+};
+
+struct dp83640_clock {
+ /* protects extended registers from concurrent access */
+ struct mutex extreg_lock;
+ /* remembers which page was last selected */
+ int page;
+ /* our advertised capabilities */
+ struct ptp_clock_info caps;
+ /* the one phyter from which we shall read */
+ struct dp83640_private *chosen;
+ /* list of the other attached phyters, not chosen */
+ struct list_head phylist;
+ /* reference to our PTP hardware clock */
+ struct ptp_clock *ptp_clock;
+};
+
+/* globals */
+
+static int chosen_phy = -1;
+static ushort cal_gpio = 4;
+static char *mii_busid = "0";
+
+module_param(chosen_phy, int, 0444);
+module_param(cal_gpio, ushort, 0444);
+module_param(mii_busid, charp, 0444);
+
+MODULE_PARM_DESC(chosen_phy, \
+ "The address of the PHY to use for the ancillary clock features");
+MODULE_PARM_DESC(cal_gpio, \
+ "Which GPIO line to use for synchronizing multiple PHYs");
+MODULE_PARM_DESC(mii_busid, \
+ "The ID of the MII bus to which the PHYs are connected");
+
+static void rx_timestamp_work(struct work_struct *work);
+
+/* extended register access functions */
+
+#define BROADCAST_ADDR 31
+
+static inline int broadcast_write(struct mii_bus *bus, u32 regnum, u16 val)
+{
+ return mdiobus_write(bus, BROADCAST_ADDR, regnum, val);
+}
+
+static int ext_read(struct phy_device *phydev, int page, u32 regnum)
+{
+ struct dp83640_private *dp83640 = phydev->priv;
+ int val;
+
+ if (dp83640->clock->page != page) {
+ broadcast_write(phydev->bus, PAGESEL, page);
+ dp83640->clock->page = page;
+ }
+ val = phy_read(phydev, regnum);
+
+ return val;
+}
+
+static void ext_write(int broadcast, struct phy_device *phydev,
+ int page, u32 regnum, u16 val)
+{
+ struct dp83640_private *dp83640 = phydev->priv;
+
+ if (dp83640->clock->page != page) {
+ broadcast_write(phydev->bus, PAGESEL, page);
+ dp83640->clock->page = page;
+ }
+ if (broadcast)
+ broadcast_write(phydev->bus, regnum, val);
+ else
+ phy_write(phydev, regnum, val);
+}
+
+static int tdr_write(int bc, struct phy_device *dev,
+ const struct timespec *ts, u16 cmd)
+{
+ ext_write(bc, dev, PAGE4, PTP_TDR, ts->tv_nsec & 0xffff);/* ns[15:0] */
+ ext_write(bc, dev, PAGE4, PTP_TDR, ts->tv_nsec >> 16); /* ns[31:16] */
+ ext_write(bc, dev, PAGE4, PTP_TDR, ts->tv_sec & 0xffff); /* sec[15:0] */
+ ext_write(bc, dev, PAGE4, PTP_TDR, ts->tv_sec >> 16); /* sec[31:16]*/
+
+ ext_write(bc, dev, PAGE4, PTP_CTL, cmd);
+
+ return 0;
+}
+
+/* convert phy timestamps into driver timestamps */
+
+static void phy2rxts(struct phy_rxts *p, struct rxts *rxts)
+{
+ u32 sec;
+
+ sec = p->sec_lo;
+ sec |= p->sec_hi << 16;
+
+ rxts->ns = p->ns_lo;
+ rxts->ns |= (p->ns_hi & 0x3fff) << 16;
+ rxts->ns += ((u64)sec) * 1000000000ULL;
+ rxts->seqid = p->seqid;
+ rxts->msgtype = (p->msgtype >> 12) & 0xf;
+ rxts->hash = p->msgtype & 0x0fff;
+ rxts->tmo = jiffies + HZ;
+}
+
+static u64 phy2txts(struct phy_txts *p)
+{
+ u64 ns;
+ u32 sec;
+
+ sec = p->sec_lo;
+ sec |= p->sec_hi << 16;
+
+ ns = p->ns_lo;
+ ns |= (p->ns_hi & 0x3fff) << 16;
+ ns += ((u64)sec) * 1000000000ULL;
+
+ return ns;
+}
+
+/* ptp clock methods */
+
+static int ptp_dp83640_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
+{
+ struct dp83640_clock *clock =
+ container_of(ptp, struct dp83640_clock, caps);
+ struct phy_device *phydev = clock->chosen->phydev;
+ u64 rate;
+ int neg_adj = 0;
+ u16 hi, lo;
+
+ if (ppb < 0) {
+ neg_adj = 1;
+ ppb = -ppb;
+ }
+ rate = ppb;
+ rate <<= 26;
+ rate = div_u64(rate, 1953125);
+
+ hi = (rate >> 16) & PTP_RATE_HI_MASK;
+ if (neg_adj)
+ hi |= PTP_RATE_DIR;
+
+ lo = rate & 0xffff;
+
+ mutex_lock(&clock->extreg_lock);
+
+ ext_write(1, phydev, PAGE4, PTP_RATEH, hi);
+ ext_write(1, phydev, PAGE4, PTP_RATEL, lo);
+
+ mutex_unlock(&clock->extreg_lock);
+
+ return 0;
+}
+
+static int ptp_dp83640_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+ struct dp83640_clock *clock =
+ container_of(ptp, struct dp83640_clock, caps);
+ struct phy_device *phydev = clock->chosen->phydev;
+ struct timespec ts;
+ int err;
+
+ delta += ADJTIME_FIX;
+
+ ts = ns_to_timespec(delta);
+
+ mutex_lock(&clock->extreg_lock);
+
+ err = tdr_write(1, phydev, &ts, PTP_STEP_CLK);
+
+ mutex_unlock(&clock->extreg_lock);
+
+ return err;
+}
+
+static int ptp_dp83640_gettime(struct ptp_clock_info *ptp, struct timespec *ts)
+{
+ struct dp83640_clock *clock =
+ container_of(ptp, struct dp83640_clock, caps);
+ struct phy_device *phydev = clock->chosen->phydev;
+ unsigned int val[4];
+
+ mutex_lock(&clock->extreg_lock);
+
+ ext_write(0, phydev, PAGE4, PTP_CTL, PTP_RD_CLK);
+
+ val[0] = ext_read(phydev, PAGE4, PTP_TDR); /* ns[15:0] */
+ val[1] = ext_read(phydev, PAGE4, PTP_TDR); /* ns[31:16] */
+ val[2] = ext_read(phydev, PAGE4, PTP_TDR); /* sec[15:0] */
+ val[3] = ext_read(phydev, PAGE4, PTP_TDR); /* sec[31:16] */
+
+ mutex_unlock(&clock->extreg_lock);
+
+ ts->tv_nsec = val[0] | (val[1] << 16);
+ ts->tv_sec = val[2] | (val[3] << 16);
+
+ return 0;
+}
+
+static int ptp_dp83640_settime(struct ptp_clock_info *ptp,
+ const struct timespec *ts)
+{
+ struct dp83640_clock *clock =
+ container_of(ptp, struct dp83640_clock, caps);
+ struct phy_device *phydev = clock->chosen->phydev;
+ int err;
+
+ mutex_lock(&clock->extreg_lock);
+
+ err = tdr_write(1, phydev, ts, PTP_LOAD_CLK);
+
+ mutex_unlock(&clock->extreg_lock);
+
+ return err;
+}
+
+static int ptp_dp83640_enable(struct ptp_clock_info *ptp,
+ struct ptp_clock_request *rq, int on)
+{
+ struct dp83640_clock *clock =
+ container_of(ptp, struct dp83640_clock, caps);
+ struct phy_device *phydev = clock->chosen->phydev;
+ u16 evnt;
+
+ switch (rq->type) {
+ case PTP_CLK_REQ_EXTTS:
+ if (rq->extts.index != 0)
+ return -EINVAL;
+ evnt = EVNT_WR | (EXT_EVENT & EVNT_SEL_MASK) << EVNT_SEL_SHIFT;
+ if (on) {
+ evnt |= (EXT_GPIO & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
+ evnt |= EVNT_RISE;
+ }
+ ext_write(0, phydev, PAGE5, PTP_EVNT, evnt);
+ return 0;
+ default:
+ break;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static u8 status_frame_dst[6] = { 0x01, 0x1B, 0x19, 0x00, 0x00, 0x00 };
+static u8 status_frame_src[6] = { 0x08, 0x00, 0x17, 0x0B, 0x6B, 0x0F };
+
+static void enable_status_frames(struct phy_device *phydev, bool on)
+{
+ u16 cfg0 = 0, ver;
+
+ if (on)
+ cfg0 = PSF_EVNT_EN | PSF_RXTS_EN | PSF_TXTS_EN | ENDIAN_FLAG;
+
+ ver = (PSF_PTPVER & VERSIONPTP_MASK) << VERSIONPTP_SHIFT;
+
+ ext_write(0, phydev, PAGE5, PSF_CFG0, cfg0);
+ ext_write(0, phydev, PAGE6, PSF_CFG1, ver);
+
+ if (!phydev->attached_dev) {
+ pr_err("expected to find an attached netdevice\n");
+ BUG();
+ }
+
+ if (on) {
+ if (dev_mc_add(phydev->attached_dev, status_frame_dst))
+ pr_warning("dp83640: failed to add mc address\n");
+ } else {
+ if (dev_mc_del(phydev->attached_dev, status_frame_dst))
+ pr_warning("dp83640: failed to delete mc address\n");
+ }
+}
+
+static bool is_status_frame(struct sk_buff *skb, int type)
+{
+ struct ethhdr *h = eth_hdr(skb);
+
+ if (PTP_CLASS_V2_L2 == type &&
+ !memcmp(h->h_source, status_frame_src, sizeof(status_frame_src)))
+ return true;
+ else
+ return false;
+}
+
+static int expired(struct rxts *rxts)
+{
+ return time_after(jiffies, rxts->tmo);
+}
+
+/* Caller must hold rx_lock. */
+static void prune_rx_ts(struct dp83640_private *dp83640)
+{
+ struct list_head *this, *next;
+ struct rxts *rxts;
+
+ list_for_each_safe(this, next, &dp83640->rxts) {
+ rxts = list_entry(this, struct rxts, list);
+ if (expired(rxts)) {
+ list_del_init(&rxts->list);
+ list_add(&rxts->list, &dp83640->rxpool);
+ }
+ }
+}
+
+/* synchronize the phyters so they act as one clock */
+
+static void enable_broadcast(struct phy_device *phydev, int init_page, int on)
+{
+ int val;
+ phy_write(phydev, PAGESEL, 0);
+ val = phy_read(phydev, PHYCR2);
+ if (on)
+ val |= BC_WRITE;
+ else
+ val &= ~BC_WRITE;
+ phy_write(phydev, PHYCR2, val);
+ phy_write(phydev, PAGESEL, init_page);
+}
+
+static void recalibrate(struct dp83640_clock *clock)
+{
+ s64 now, diff;
+ struct phy_txts event_ts;
+ struct timespec ts;
+ struct list_head *this;
+ struct dp83640_private *tmp;
+ struct phy_device *master = clock->chosen->phydev;
+ u16 cfg0, evnt, ptp_trig, trigger, val;
+
+ trigger = CAL_TRIGGER;
+
+ mutex_lock(&clock->extreg_lock);
+
+ /*
+ * enable broadcast, disable status frames, enable ptp clock
+ */
+ list_for_each(this, &clock->phylist) {
+ tmp = list_entry(this, struct dp83640_private, list);
+ enable_broadcast(tmp->phydev, clock->page, 1);
+ tmp->cfg0 = ext_read(tmp->phydev, PAGE5, PSF_CFG0);
+ ext_write(0, tmp->phydev, PAGE5, PSF_CFG0, 0);
+ ext_write(0, tmp->phydev, PAGE4, PTP_CTL, PTP_ENABLE);
+ }
+ enable_broadcast(master, clock->page, 1);
+ cfg0 = ext_read(master, PAGE5, PSF_CFG0);
+ ext_write(0, master, PAGE5, PSF_CFG0, 0);
+ ext_write(0, master, PAGE4, PTP_CTL, PTP_ENABLE);
+
+ /*
+ * enable an event timestamp
+ */
+ evnt = EVNT_WR | EVNT_RISE | EVNT_SINGLE;
+ evnt |= (CAL_EVENT & EVNT_SEL_MASK) << EVNT_SEL_SHIFT;
+ evnt |= (cal_gpio & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
+
+ list_for_each(this, &clock->phylist) {
+ tmp = list_entry(this, struct dp83640_private, list);
+ ext_write(0, tmp->phydev, PAGE5, PTP_EVNT, evnt);
+ }
+ ext_write(0, master, PAGE5, PTP_EVNT, evnt);
+
+ /*
+ * configure a trigger
+ */
+ ptp_trig = TRIG_WR | TRIG_IF_LATE | TRIG_PULSE;
+ ptp_trig |= (trigger & TRIG_CSEL_MASK) << TRIG_CSEL_SHIFT;
+ ptp_trig |= (cal_gpio & TRIG_GPIO_MASK) << TRIG_GPIO_SHIFT;
+ ext_write(0, master, PAGE5, PTP_TRIG, ptp_trig);
+
+ /* load trigger */
+ val = (trigger & TRIG_SEL_MASK) << TRIG_SEL_SHIFT;
+ val |= TRIG_LOAD;
+ ext_write(0, master, PAGE4, PTP_CTL, val);
+
+ /* enable trigger */
+ val &= ~TRIG_LOAD;
+ val |= TRIG_EN;
+ ext_write(0, master, PAGE4, PTP_CTL, val);
+
+ /* disable trigger */
+ val = (trigger & TRIG_SEL_MASK) << TRIG_SEL_SHIFT;
+ val |= TRIG_DIS;
+ ext_write(0, master, PAGE4, PTP_CTL, val);
+
+ /*
+ * read out and correct offsets
+ */
+ val = ext_read(master, PAGE4, PTP_STS);
+ pr_info("master PTP_STS 0x%04hx", val);
+ val = ext_read(master, PAGE4, PTP_ESTS);
+ pr_info("master PTP_ESTS 0x%04hx", val);
+ event_ts.ns_lo = ext_read(master, PAGE4, PTP_EDATA);
+ event_ts.ns_hi = ext_read(master, PAGE4, PTP_EDATA);
+ event_ts.sec_lo = ext_read(master, PAGE4, PTP_EDATA);
+ event_ts.sec_hi = ext_read(master, PAGE4, PTP_EDATA);
+ now = phy2txts(&event_ts);
+
+ list_for_each(this, &clock->phylist) {
+ tmp = list_entry(this, struct dp83640_private, list);
+ val = ext_read(tmp->phydev, PAGE4, PTP_STS);
+ pr_info("slave PTP_STS 0x%04hx", val);
+ val = ext_read(tmp->phydev, PAGE4, PTP_ESTS);
+ pr_info("slave PTP_ESTS 0x%04hx", val);
+ event_ts.ns_lo = ext_read(tmp->phydev, PAGE4, PTP_EDATA);
+ event_ts.ns_hi = ext_read(tmp->phydev, PAGE4, PTP_EDATA);
+ event_ts.sec_lo = ext_read(tmp->phydev, PAGE4, PTP_EDATA);
+ event_ts.sec_hi = ext_read(tmp->phydev, PAGE4, PTP_EDATA);
+ diff = now - (s64) phy2txts(&event_ts);
+ pr_info("slave offset %lld nanoseconds\n", diff);
+ diff += ADJTIME_FIX;
+ ts = ns_to_timespec(diff);
+ tdr_write(0, tmp->phydev, &ts, PTP_STEP_CLK);
+ }
+
+ /*
+ * restore status frames
+ */
+ list_for_each(this, &clock->phylist) {
+ tmp = list_entry(this, struct dp83640_private, list);
+ ext_write(0, tmp->phydev, PAGE5, PSF_CFG0, tmp->cfg0);
+ }
+ ext_write(0, master, PAGE5, PSF_CFG0, cfg0);
+
+ mutex_unlock(&clock->extreg_lock);
+}
+
+/* time stamping methods */
+
+static void decode_evnt(struct dp83640_private *dp83640,
+ struct phy_txts *phy_txts, u16 ests)
+{
+ struct ptp_clock_event event;
+ int words = (ests >> EVNT_TS_LEN_SHIFT) & EVNT_TS_LEN_MASK;
+
+ switch (words) { /* fall through in every case */
+ case 3:
+ dp83640->edata.sec_hi = phy_txts->sec_hi;
+ case 2:
+ dp83640->edata.sec_lo = phy_txts->sec_lo;
+ case 1:
+ dp83640->edata.ns_hi = phy_txts->ns_hi;
+ case 0:
+ dp83640->edata.ns_lo = phy_txts->ns_lo;
+ }
+
+ event.type = PTP_CLOCK_EXTTS;
+ event.index = 0;
+ event.timestamp = phy2txts(&dp83640->edata);
+
+ ptp_clock_event(dp83640->clock->ptp_clock, &event);
+}
+
+static void decode_rxts(struct dp83640_private *dp83640,
+ struct phy_rxts *phy_rxts)
+{
+ struct rxts *rxts;
+ unsigned long flags;
+
+ spin_lock_irqsave(&dp83640->rx_lock, flags);
+
+ prune_rx_ts(dp83640);
+
+ if (list_empty(&dp83640->rxpool)) {
+ pr_warning("dp83640: rx timestamp pool is empty\n");
+ goto out;
+ }
+ rxts = list_first_entry(&dp83640->rxpool, struct rxts, list);
+ list_del_init(&rxts->list);
+ phy2rxts(phy_rxts, rxts);
+ list_add_tail(&rxts->list, &dp83640->rxts);
+out:
+ spin_unlock_irqrestore(&dp83640->rx_lock, flags);
+}
+
+static void decode_txts(struct dp83640_private *dp83640,
+ struct phy_txts *phy_txts)
+{
+ struct skb_shared_hwtstamps shhwtstamps;
+ struct sk_buff *skb;
+ u64 ns;
+
+ /* We must already have the skb that triggered this. */
+
+ skb = skb_dequeue(&dp83640->tx_queue);
+
+ if (!skb) {
+ pr_warning("dp83640: have timestamp but tx_queue empty\n");
+ return;
+ }
+ ns = phy2txts(phy_txts);
+ memset(&shhwtstamps, 0, sizeof(shhwtstamps));
+ shhwtstamps.hwtstamp = ns_to_ktime(ns);
+ skb_complete_tx_timestamp(skb, &shhwtstamps);
+}
+
+static void decode_status_frame(struct dp83640_private *dp83640,
+ struct sk_buff *skb)
+{
+ struct phy_rxts *phy_rxts;
+ struct phy_txts *phy_txts;
+ u8 *ptr;
+ int len, size;
+ u16 ests, type;
+
+ ptr = skb->data + 2;
+
+ for (len = skb_headlen(skb) - 2; len > sizeof(type); len -= size) {
+
+ type = *(u16 *)ptr;
+ ests = type & 0x0fff;
+ type = type & 0xf000;
+ len -= sizeof(type);
+ ptr += sizeof(type);
+
+ if (PSF_RX == type && len >= sizeof(*phy_rxts)) {
+
+ phy_rxts = (struct phy_rxts *) ptr;
+ decode_rxts(dp83640, phy_rxts);
+ size = sizeof(*phy_rxts);
+
+ } else if (PSF_TX == type && len >= sizeof(*phy_txts)) {
+
+ phy_txts = (struct phy_txts *) ptr;
+ decode_txts(dp83640, phy_txts);
+ size = sizeof(*phy_txts);
+
+ } else if (PSF_EVNT == type && len >= sizeof(*phy_txts)) {
+
+ phy_txts = (struct phy_txts *) ptr;
+ decode_evnt(dp83640, phy_txts, ests);
+ size = sizeof(*phy_txts);
+
+ } else {
+ size = 0;
+ break;
+ }
+ ptr += size;
+ }
+}
+
+static int match(struct sk_buff *skb, unsigned int type, struct rxts *rxts)
+{
+ u16 *seqid;
+ u8 *msgtype, *data = skb_mac_header(skb);
+
+ /* check sequenceID, messageType, 12 bit hash of offset 20-29 */
+ /* We assume that the IPv4 header has no options. */
+
+ switch (type) {
+ case PTP_CLASS_V1_IPV4:
+ msgtype = data + 42 + 32;
+ seqid = (u16 *)(data + 42 + 30);
+ break;
+ case PTP_CLASS_V1_IPV6:
+ msgtype = data + 62 + 32;
+ seqid = (u16 *)(data + 62 + 30);
+ break;
+ case PTP_CLASS_V2_IPV4:
+ msgtype = data + 42 + 0;
+ seqid = (u16 *)(data + 42 + 30);
+ break;
+ case PTP_CLASS_V2_IPV6:
+ msgtype = data + 62 + 0;
+ seqid = (u16 *)(data + 62 + 30);
+ break;
+ case PTP_CLASS_V2_L2:
+ msgtype = data + 14 + 0;
+ seqid = (u16 *)(data + 14 + 30);
+ break;
+ case PTP_CLASS_V2_VLAN:
+ msgtype = data + 18 + 0;
+ seqid = (u16 *)(data + 18 + 30);
+ break;
+ default:
+ return 0;
+ }
+
+ return ((*msgtype & 0xf) == rxts->msgtype && *seqid == rxts->seqid);
+}
+
+static struct dp83640_clock phyter_clock = {
+ .extreg_lock = __MUTEX_INITIALIZER(phyter_clock.extreg_lock),
+ .phylist = LIST_HEAD_INIT(phyter_clock.phylist),
+ .caps = {
+ .owner = THIS_MODULE,
+ .name = "dp83640 timer",
+ .max_adj = 1953124,
+ .n_alarm = 0,
+ .n_ext_ts = N_EXT_TS,
+ .n_per_out = 0,
+ .pps = 0,
+ .adjfreq = ptp_dp83640_adjfreq,
+ .adjtime = ptp_dp83640_adjtime,
+ .gettime = ptp_dp83640_gettime,
+ .settime = ptp_dp83640_settime,
+ .enable = ptp_dp83640_enable,
+ },
+};
+DEFINE_MUTEX(clock_lock); /* protects the one and only dp83640_clock */
+
+static int choose_this_phy(struct phy_device *phydev)
+{
+ if (chosen_phy == -1 && !phyter_clock.chosen)
+ return 1;
+
+ if (chosen_phy == phydev->addr)
+ return 1;
+
+ return 0;
+}
+
+static int dp83640_probe(struct phy_device *phydev)
+{
+ struct dp83640_private *dp83640;
+ int err = -ENOMEM, i;
+
+ if (strcmp(phydev->bus->id, mii_busid)) {
+ pr_err("phyter on bus %s but expected bus %s\n",
+ phydev->bus->id, mii_busid);
+ return -EINVAL;
+ }
+
+ if (phydev->addr == BROADCAST_ADDR)
+ return 0;
+
+ dp83640 = kzalloc(sizeof(struct dp83640_private), GFP_KERNEL);
+ if (!dp83640)
+ return err;
+
+ dp83640->phydev = phydev;
+ INIT_WORK(&dp83640->ts_work, rx_timestamp_work);
+
+ INIT_LIST_HEAD(&dp83640->rxts);
+ INIT_LIST_HEAD(&dp83640->rxpool);
+ for (i = 0; i < MAX_RXTS; i++)
+ list_add(&dp83640->rx_pool_data[i].list, &dp83640->rxpool);
+
+ phydev->priv = dp83640;
+
+ spin_lock_init(&dp83640->rx_lock);
+ skb_queue_head_init(&dp83640->rx_queue);
+ skb_queue_head_init(&dp83640->tx_queue);
+
+ mutex_lock(&clock_lock);
+
+ dp83640->clock = &phyter_clock;
+
+ if (choose_this_phy(phydev)) {
+ phyter_clock.chosen = dp83640;
+ phyter_clock.ptp_clock = ptp_clock_register(&phyter_clock.caps);
+ if (IS_ERR(phyter_clock.ptp_clock)) {
+ err = PTR_ERR(phyter_clock.ptp_clock);
+ goto no_register;
+ }
+ } else
+ list_add_tail(&dp83640->list, &phyter_clock.phylist);
+
+ if (phyter_clock.chosen && !list_empty(&phyter_clock.phylist))
+ recalibrate(&phyter_clock);
+ else
+ enable_broadcast(dp83640->phydev, phyter_clock.page, 1);
+
+ mutex_unlock(&clock_lock);
+
+ return 0;
+
+no_register:
+ phyter_clock.chosen = NULL;
+ kfree(dp83640);
+ mutex_unlock(&clock_lock);
+ return err;
+}
+
+static void dp83640_remove(struct phy_device *phydev)
+{
+ struct list_head *this, *next;
+ struct dp83640_private *tmp, *dp83640 = phydev->priv;
+
+ enable_status_frames(phydev, false);
+ cancel_work_sync(&dp83640->ts_work);
+ mutex_lock(&clock_lock);
+
+ if (dp83640 == phyter_clock.chosen) {
+ ptp_clock_unregister(phyter_clock.ptp_clock);
+ phyter_clock.chosen = NULL;
+ } else {
+ list_for_each_safe(this, next, &phyter_clock.phylist) {
+ tmp = list_entry(this, struct dp83640_private, list);
+ if (tmp == dp83640) {
+ list_del_init(&tmp->list);
+ break;
+ }
+ }
+ }
+
+ mutex_unlock(&clock_lock);
+ kfree(dp83640);
+}
+
+static int dp83640_hwtstamp(struct phy_device *phydev, struct ifreq *ifr)
+{
+ struct dp83640_private *dp83640 = phydev->priv;
+ struct hwtstamp_config cfg;
+ u16 txcfg0, rxcfg0;
+
+ if (copy_from_user(&cfg, ifr->ifr_data, sizeof(cfg)))
+ return -EFAULT;
+
+ if (cfg.flags) /* reserved for future extensions */
+ return -EINVAL;
+
+ switch (cfg.tx_type) {
+ case HWTSTAMP_TX_OFF:
+ dp83640->hwts_tx_en = 0;
+ break;
+ case HWTSTAMP_TX_ON:
+ dp83640->hwts_tx_en = 1;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ switch (cfg.rx_filter) {
+ case HWTSTAMP_FILTER_NONE:
+ dp83640->hwts_rx_en = 0;
+ dp83640->layer = 0;
+ dp83640->version = 0;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+ case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+ dp83640->hwts_rx_en = 1;
+ dp83640->layer = LAYER4;
+ dp83640->version = 1;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+ dp83640->hwts_rx_en = 1;
+ dp83640->layer = LAYER4;
+ dp83640->version = 2;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+ dp83640->hwts_rx_en = 1;
+ dp83640->layer = LAYER2;
+ dp83640->version = 2;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+ dp83640->hwts_rx_en = 1;
+ dp83640->layer = LAYER4|LAYER2;
+ dp83640->version = 2;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ txcfg0 = (dp83640->version & TX_PTP_VER_MASK) << TX_PTP_VER_SHIFT;
+ rxcfg0 = (dp83640->version & TX_PTP_VER_MASK) << TX_PTP_VER_SHIFT;
+
+ if (dp83640->layer & LAYER2) {
+ txcfg0 |= TX_L2_EN;
+ rxcfg0 |= RX_L2_EN;
+ }
+ if (dp83640->layer & LAYER4) {
+ txcfg0 |= TX_IPV6_EN | TX_IPV4_EN;
+ rxcfg0 |= RX_IPV6_EN | RX_IPV4_EN;
+ }
+
+ if (dp83640->hwts_tx_en)
+ txcfg0 |= TX_TS_EN;
+
+ if (dp83640->hwts_rx_en)
+ rxcfg0 |= RX_TS_EN;
+
+ mutex_lock(&dp83640->clock->extreg_lock);
+
+ if (dp83640->hwts_tx_en || dp83640->hwts_rx_en) {
+ enable_status_frames(phydev, true);
+ ext_write(0, phydev, PAGE4, PTP_CTL, PTP_ENABLE);
+ }
+
+ ext_write(0, phydev, PAGE5, PTP_TXCFG0, txcfg0);
+ ext_write(0, phydev, PAGE5, PTP_RXCFG0, rxcfg0);
+
+ mutex_unlock(&dp83640->clock->extreg_lock);
+
+ return copy_to_user(ifr->ifr_data, &cfg, sizeof(cfg)) ? -EFAULT : 0;
+}
+
+static void rx_timestamp_work(struct work_struct *work)
+{
+ struct dp83640_private *dp83640 =
+ container_of(work, struct dp83640_private, ts_work);
+ struct list_head *this, *next;
+ struct rxts *rxts;
+ struct skb_shared_hwtstamps *shhwtstamps;
+ struct sk_buff *skb;
+ unsigned int type;
+ unsigned long flags;
+
+ /* Deliver each deferred packet, with or without a time stamp. */
+
+ while ((skb = skb_dequeue(&dp83640->rx_queue)) != NULL) {
+ type = SKB_PTP_TYPE(skb);
+ spin_lock_irqsave(&dp83640->rx_lock, flags);
+ list_for_each_safe(this, next, &dp83640->rxts) {
+ rxts = list_entry(this, struct rxts, list);
+ if (match(skb, type, rxts)) {
+ shhwtstamps = skb_hwtstamps(skb);
+ memset(shhwtstamps, 0, sizeof(*shhwtstamps));
+ shhwtstamps->hwtstamp = ns_to_ktime(rxts->ns);
+ list_del_init(&rxts->list);
+ list_add(&rxts->list, &dp83640->rxpool);
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&dp83640->rx_lock, flags);
+ netif_rx(skb);
+ }
+
+ /* Clear out expired time stamps. */
+
+ spin_lock_irqsave(&dp83640->rx_lock, flags);
+ prune_rx_ts(dp83640);
+ spin_unlock_irqrestore(&dp83640->rx_lock, flags);
+}
+
+static bool dp83640_rxtstamp(struct phy_device *phydev,
+ struct sk_buff *skb, int type)
+{
+ struct dp83640_private *dp83640 = phydev->priv;
+
+ if (!dp83640->hwts_rx_en)
+ return false;
+
+ if (is_status_frame(skb, type)) {
+ decode_status_frame(dp83640, skb);
+ /* Let the stack drop this frame. */
+ return false;
+ }
+
+ SKB_PTP_TYPE(skb) = type;
+ skb_queue_tail(&dp83640->rx_queue, skb);
+ schedule_work(&dp83640->ts_work);
+
+ return true;
+}
+
+static void dp83640_txtstamp(struct phy_device *phydev,
+ struct sk_buff *skb, int type)
+{
+ struct dp83640_private *dp83640 = phydev->priv;
+
+ if (!dp83640->hwts_tx_en) {
+ kfree_skb(skb);
+ return;
+ }
+ skb_queue_tail(&dp83640->tx_queue, skb);
+ schedule_work(&dp83640->ts_work);
+}
+
+static struct phy_driver dp83640_driver = {
+ .phy_id = DP83640_PHY_ID,
+ .phy_id_mask = 0xfffffff0,
+ .name = "NatSemi DP83640",
+ .features = PHY_BASIC_FEATURES,
+ .flags = 0,
+ .probe = dp83640_probe,
+ .remove = dp83640_remove,
+ .config_aneg = genphy_config_aneg,
+ .read_status = genphy_read_status,
+ .hwtstamp = dp83640_hwtstamp,
+ .rxtstamp = dp83640_rxtstamp,
+ .txtstamp = dp83640_txtstamp,
+ .driver = {.owner = THIS_MODULE,}
+};
+
+static int __init dp83640_init(void)
+{
+ return phy_driver_register(&dp83640_driver);
+}
+
+static void __exit dp83640_exit(void)
+{
+ phy_driver_unregister(&dp83640_driver);
+}
+
+MODULE_DESCRIPTION("National Semiconductor DP83640 PHY driver");
+MODULE_AUTHOR("Richard Cochran <richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>");
+MODULE_LICENSE("GPL");
+
+module_init(dp83640_init);
+module_exit(dp83640_exit);
+
+static struct mdio_device_id dp83640_tbl[] = {
+ { DP83640_PHY_ID, 0xfffffff0 },
+ { }
+};
+
+MODULE_DEVICE_TABLE(mdio, dp83640_tbl);
diff --git a/drivers/net/phy/dp83640_reg.h b/drivers/net/phy/dp83640_reg.h
new file mode 100644
index 0000000..e7fe411
--- /dev/null
+++ b/drivers/net/phy/dp83640_reg.h
@@ -0,0 +1,267 @@
+/* dp83640_reg.h
+ * Generated by regen.tcl on Thu Feb 17 10:02:48 AM CET 2011
+ */
+#ifndef HAVE_DP83640_REGISTERS
+#define HAVE_DP83640_REGISTERS
+
+#define PAGE0 0x0000
+#define PHYCR2 0x001c /* PHY Control Register 2 */
+
+#define PAGE4 0x0004
+#define PTP_CTL 0x0014 /* PTP Control Register */
+#define PTP_TDR 0x0015 /* PTP Time Data Register */
+#define PTP_STS 0x0016 /* PTP Status Register */
+#define PTP_TSTS 0x0017 /* PTP Trigger Status Register */
+#define PTP_RATEL 0x0018 /* PTP Rate Low Register */
+#define PTP_RATEH 0x0019 /* PTP Rate High Register */
+#define PTP_RDCKSUM 0x001a /* PTP Read Checksum */
+#define PTP_WRCKSUM 0x001b /* PTP Write Checksum */
+#define PTP_TXTS 0x001c /* PTP Transmit Timestamp Register, in four 16-bit reads */
+#define PTP_RXTS 0x001d /* PTP Receive Timestamp Register, in six? 16-bit reads */
+#define PTP_ESTS 0x001e /* PTP Event Status Register */
+#define PTP_EDATA 0x001f /* PTP Event Data Register */
+
+#define PAGE5 0x0005
+#define PTP_TRIG 0x0014 /* PTP Trigger Configuration Register */
+#define PTP_EVNT 0x0015 /* PTP Event Configuration Register */
+#define PTP_TXCFG0 0x0016 /* PTP Transmit Configuration Register 0 */
+#define PTP_TXCFG1 0x0017 /* PTP Transmit Configuration Register 1 */
+#define PSF_CFG0 0x0018 /* PHY Status Frame Configuration Register 0 */
+#define PTP_RXCFG0 0x0019 /* PTP Receive Configuration Register 0 */
+#define PTP_RXCFG1 0x001a /* PTP Receive Configuration Register 1 */
+#define PTP_RXCFG2 0x001b /* PTP Receive Configuration Register 2 */
+#define PTP_RXCFG3 0x001c /* PTP Receive Configuration Register 3 */
+#define PTP_RXCFG4 0x001d /* PTP Receive Configuration Register 4 */
+#define PTP_TRDL 0x001e /* PTP Temporary Rate Duration Low Register */
+#define PTP_TRDH 0x001f /* PTP Temporary Rate Duration High Register */
+
+#define PAGE6 0x0006
+#define PTP_COC 0x0014 /* PTP Clock Output Control Register */
+#define PSF_CFG1 0x0015 /* PHY Status Frame Configuration Register 1 */
+#define PSF_CFG2 0x0016 /* PHY Status Frame Configuration Register 2 */
+#define PSF_CFG3 0x0017 /* PHY Status Frame Configuration Register 3 */
+#define PSF_CFG4 0x0018 /* PHY Status Frame Configuration Register 4 */
+#define PTP_SFDCFG 0x0019 /* PTP SFD Configuration Register */
+#define PTP_INTCTL 0x001a /* PTP Interrupt Control Register */
+#define PTP_CLKSRC 0x001b /* PTP Clock Source Register */
+#define PTP_ETR 0x001c /* PTP Ethernet Type Register */
+#define PTP_OFF 0x001d /* PTP Offset Register */
+#define PTP_GPIOMON 0x001e /* PTP GPIO Monitor Register */
+#define PTP_RXHASH 0x001f /* PTP Receive Hash Register */
+
+/* Bit definitions for the PHYCR2 register */
+#define BC_WRITE (1<<11) /* Broadcast Write Enable */
+
+/* Bit definitions for the PTP_CTL register */
+#define TRIG_SEL_SHIFT (10) /* PTP Trigger Select */
+#define TRIG_SEL_MASK (0x7)
+#define TRIG_DIS (1<<9) /* Disable PTP Trigger */
+#define TRIG_EN (1<<8) /* Enable PTP Trigger */
+#define TRIG_READ (1<<7) /* Read PTP Trigger */
+#define TRIG_LOAD (1<<6) /* Load PTP Trigger */
+#define PTP_RD_CLK (1<<5) /* Read PTP Clock */
+#define PTP_LOAD_CLK (1<<4) /* Load PTP Clock */
+#define PTP_STEP_CLK (1<<3) /* Step PTP Clock */
+#define PTP_ENABLE (1<<2) /* Enable PTP Clock */
+#define PTP_DISABLE (1<<1) /* Disable PTP Clock */
+#define PTP_RESET (1<<0) /* Reset PTP Clock */
+
+/* Bit definitions for the PTP_STS register */
+#define TXTS_RDY (1<<11) /* Transmit Timestamp Ready */
+#define RXTS_RDY (1<<10) /* Receive Timestamp Ready */
+#define TRIG_DONE (1<<9) /* PTP Trigger Done */
+#define EVENT_RDY (1<<8) /* PTP Event Timestamp Ready */
+#define TXTS_IE (1<<3) /* Transmit Timestamp Interrupt Enable */
+#define RXTS_IE (1<<2) /* Receive Timestamp Interrupt Enable */
+#define TRIG_IE (1<<1) /* Trigger Interrupt Enable */
+#define EVENT_IE (1<<0) /* Event Interrupt Enable */
+
+/* Bit definitions for the PTP_TSTS register */
+#define TRIG7_ERROR (1<<15) /* Trigger 7 Error */
+#define TRIG7_ACTIVE (1<<14) /* Trigger 7 Active */
+#define TRIG6_ERROR (1<<13) /* Trigger 6 Error */
+#define TRIG6_ACTIVE (1<<12) /* Trigger 6 Active */
+#define TRIG5_ERROR (1<<11) /* Trigger 5 Error */
+#define TRIG5_ACTIVE (1<<10) /* Trigger 5 Active */
+#define TRIG4_ERROR (1<<9) /* Trigger 4 Error */
+#define TRIG4_ACTIVE (1<<8) /* Trigger 4 Active */
+#define TRIG3_ERROR (1<<7) /* Trigger 3 Error */
+#define TRIG3_ACTIVE (1<<6) /* Trigger 3 Active */
+#define TRIG2_ERROR (1<<5) /* Trigger 2 Error */
+#define TRIG2_ACTIVE (1<<4) /* Trigger 2 Active */
+#define TRIG1_ERROR (1<<3) /* Trigger 1 Error */
+#define TRIG1_ACTIVE (1<<2) /* Trigger 1 Active */
+#define TRIG0_ERROR (1<<1) /* Trigger 0 Error */
+#define TRIG0_ACTIVE (1<<0) /* Trigger 0 Active */
+
+/* Bit definitions for the PTP_RATEH register */
+#define PTP_RATE_DIR (1<<15) /* PTP Rate Direction */
+#define PTP_TMP_RATE (1<<14) /* PTP Temporary Rate */
+#define PTP_RATE_HI_SHIFT (0) /* PTP Rate High 10-bits */
+#define PTP_RATE_HI_MASK (0x3ff)
+
+/* Bit definitions for the PTP_ESTS register */
+#define EVNTS_MISSED_SHIFT (8) /* Indicates number of events missed */
+#define EVNTS_MISSED_MASK (0x7)
+#define EVNT_TS_LEN_SHIFT (6) /* Indicates length of the Timestamp field in 16-bit words minus 1 */
+#define EVNT_TS_LEN_MASK (0x3)
+#define EVNT_RF (1<<5) /* Indicates whether the event is a rise or falling event */
+#define EVNT_NUM_SHIFT (2) /* Indicates Event Timestamp Unit which detected an event */
+#define EVNT_NUM_MASK (0x7)
+#define MULT_EVNT (1<<1) /* Indicates multiple events were detected at the same time */
+#define EVENT_DET (1<<0) /* PTP Event Detected */
+
+/* Bit definitions for the PTP_EDATA register */
+#define E7_RISE (1<<15) /* Indicates direction of Event 7 */
+#define E7_DET (1<<14) /* Indicates Event 7 detected */
+#define E6_RISE (1<<13) /* Indicates direction of Event 6 */
+#define E6_DET (1<<12) /* Indicates Event 6 detected */
+#define E5_RISE (1<<11) /* Indicates direction of Event 5 */
+#define E5_DET (1<<10) /* Indicates Event 5 detected */
+#define E4_RISE (1<<9) /* Indicates direction of Event 4 */
+#define E4_DET (1<<8) /* Indicates Event 4 detected */
+#define E3_RISE (1<<7) /* Indicates direction of Event 3 */
+#define E3_DET (1<<6) /* Indicates Event 3 detected */
+#define E2_RISE (1<<5) /* Indicates direction of Event 2 */
+#define E2_DET (1<<4) /* Indicates Event 2 detected */
+#define E1_RISE (1<<3) /* Indicates direction of Event 1 */
+#define E1_DET (1<<2) /* Indicates Event 1 detected */
+#define E0_RISE (1<<1) /* Indicates direction of Event 0 */
+#define E0_DET (1<<0) /* Indicates Event 0 detected */
+
+/* Bit definitions for the PTP_TRIG register */
+#define TRIG_PULSE (1<<15) /* generate a Pulse rather than a single edge */
+#define TRIG_PER (1<<14) /* generate a periodic signal */
+#define TRIG_IF_LATE (1<<13) /* trigger immediately if already past */
+#define TRIG_NOTIFY (1<<12) /* Trigger Notification Enable */
+#define TRIG_GPIO_SHIFT (8) /* Trigger GPIO Connection, value 1-12 */
+#define TRIG_GPIO_MASK (0xf)
+#define TRIG_TOGGLE (1<<7) /* Trigger Toggle Mode Enable */
+#define TRIG_CSEL_SHIFT (1) /* Trigger Configuration Select */
+#define TRIG_CSEL_MASK (0x7)
+#define TRIG_WR (1<<0) /* Trigger Configuration Write */
+
+/* Bit definitions for the PTP_EVNT register */
+#define EVNT_RISE (1<<14) /* Event Rise Detect Enable */
+#define EVNT_FALL (1<<13) /* Event Fall Detect Enable */
+#define EVNT_SINGLE (1<<12) /* enable single event capture operation */
+#define EVNT_GPIO_SHIFT (8) /* Event GPIO Connection, value 1-12 */
+#define EVNT_GPIO_MASK (0xf)
+#define EVNT_SEL_SHIFT (1) /* Event Select */
+#define EVNT_SEL_MASK (0x7)
+#define EVNT_WR (1<<0) /* Event Configuration Write */
+
+/* Bit definitions for the PTP_TXCFG0 register */
+#define SYNC_1STEP (1<<15) /* insert timestamp into transmit Sync Messages */
+#define DR_INSERT (1<<13) /* Insert Delay_Req Timestamp in Delay_Resp (dangerous) */
+#define NTP_TS_EN (1<<12) /* Enable Timestamping of NTP Packets */
+#define IGNORE_2STEP (1<<11) /* Ignore Two_Step flag for One-Step operation */
+#define CRC_1STEP (1<<10) /* Disable checking of CRC for One-Step operation */
+#define CHK_1STEP (1<<9) /* Enable UDP Checksum correction for One-Step Operation */
+#define IP1588_EN (1<<8) /* Enable IEEE 1588 defined IP address filter */
+#define TX_L2_EN (1<<7) /* Layer2 Timestamp Enable */
+#define TX_IPV6_EN (1<<6) /* IPv6 Timestamp Enable */
+#define TX_IPV4_EN (1<<5) /* IPv4 Timestamp Enable */
+#define TX_PTP_VER_SHIFT (1) /* Enable Timestamp capture for IEEE 1588 version X */
+#define TX_PTP_VER_MASK (0xf)
+#define TX_TS_EN (1<<0) /* Transmit Timestamp Enable */
+
+/* Bit definitions for the PTP_TXCFG1 register */
+#define BYTE0_MASK_SHIFT (8) /* Bit mask to be used for matching Byte0 of the PTP Message */
+#define BYTE0_MASK_MASK (0xff)
+#define BYTE0_DATA_SHIFT (0) /* Data to be used for matching Byte0 of the PTP Message */
+#define BYTE0_DATA_MASK (0xff)
+
+/* Bit definitions for the PSF_CFG0 register */
+#define MAC_SRC_ADD_SHIFT (11) /* Status Frame Mac Source Address */
+#define MAC_SRC_ADD_MASK (0x3)
+#define MIN_PRE_SHIFT (8) /* Status Frame Minimum Preamble */
+#define MIN_PRE_MASK (0x7)
+#define PSF_ENDIAN (1<<7) /* Status Frame Endian Control */
+#define PSF_IPV4 (1<<6) /* Status Frame IPv4 Enable */
+#define PSF_PCF_RD (1<<5) /* Control Frame Read PHY Status Frame Enable */
+#define PSF_ERR_EN (1<<4) /* Error PHY Status Frame Enable */
+#define PSF_TXTS_EN (1<<3) /* Transmit Timestamp PHY Status Frame Enable */
+#define PSF_RXTS_EN (1<<2) /* Receive Timestamp PHY Status Frame Enable */
+#define PSF_TRIG_EN (1<<1) /* Trigger PHY Status Frame Enable */
+#define PSF_EVNT_EN (1<<0) /* Event PHY Status Frame Enable */
+
+/* Bit definitions for the PTP_RXCFG0 register */
+#define DOMAIN_EN (1<<15) /* Domain Match Enable */
+#define ALT_MAST_DIS (1<<14) /* Alternate Master Timestamp Disable */
+#define USER_IP_SEL (1<<13) /* Selects portion of IP address accessible thru PTP_RXCFG2 */
+#define USER_IP_EN (1<<12) /* Enable User-programmed IP address filter */
+#define RX_SLAVE (1<<11) /* Receive Slave Only */
+#define IP1588_EN_SHIFT (8) /* Enable IEEE 1588 defined IP address filters */
+#define IP1588_EN_MASK (0xf)
+#define RX_L2_EN (1<<7) /* Layer2 Timestamp Enable */
+#define RX_IPV6_EN (1<<6) /* IPv6 Timestamp Enable */
+#define RX_IPV4_EN (1<<5) /* IPv4 Timestamp Enable */
+#define RX_PTP_VER_SHIFT (1) /* Enable Timestamp capture for IEEE 1588 version X */
+#define RX_PTP_VER_MASK (0xf)
+#define RX_TS_EN (1<<0) /* Receive Timestamp Enable */
+
+/* Bit definitions for the PTP_RXCFG1 register */
+#define BYTE0_MASK_SHIFT (8) /* Bit mask to be used for matching Byte0 of the PTP Message */
+#define BYTE0_MASK_MASK (0xff)
+#define BYTE0_DATA_SHIFT (0) /* Data to be used for matching Byte0 of the PTP Message */
+#define BYTE0_DATA_MASK (0xff)
+
+/* Bit definitions for the PTP_RXCFG3 register */
+#define TS_MIN_IFG_SHIFT (12) /* Minimum Inter-frame Gap */
+#define TS_MIN_IFG_MASK (0xf)
+#define ACC_UDP (1<<11) /* Record Timestamp if UDP Checksum Error */
+#define ACC_CRC (1<<10) /* Record Timestamp if CRC Error */
+#define TS_APPEND (1<<9) /* Append Timestamp for L2 */
+#define TS_INSERT (1<<8) /* Enable Timestamp Insertion */
+#define PTP_DOMAIN_SHIFT (0) /* PTP Message domainNumber field */
+#define PTP_DOMAIN_MASK (0xff)
+
+/* Bit definitions for the PTP_RXCFG4 register */
+#define IPV4_UDP_MOD (1<<15) /* Enable IPV4 UDP Modification */
+#define TS_SEC_EN (1<<14) /* Enable Timestamp Seconds */
+#define TS_SEC_LEN_SHIFT (12) /* Inserted Timestamp Seconds Length */
+#define TS_SEC_LEN_MASK (0x3)
+#define RXTS_NS_OFF_SHIFT (6) /* Receive Timestamp Nanoseconds offset */
+#define RXTS_NS_OFF_MASK (0x3f)
+#define RXTS_SEC_OFF_SHIFT (0) /* Receive Timestamp Seconds offset */
+#define RXTS_SEC_OFF_MASK (0x3f)
+
+/* Bit definitions for the PTP_COC register */
+#define PTP_CLKOUT_EN (1<<15) /* PTP Clock Output Enable */
+#define PTP_CLKOUT_SEL (1<<14) /* PTP Clock Output Source Select */
+#define PTP_CLKOUT_SPEEDSEL (1<<13) /* PTP Clock Output I/O Speed Select */
+#define PTP_CLKDIV_SHIFT (0) /* PTP Clock Divide-by Value */
+#define PTP_CLKDIV_MASK (0xff)
+
+/* Bit definitions for the PSF_CFG1 register */
+#define PTPRESERVED_SHIFT (12) /* PTP v2 reserved field */
+#define PTPRESERVED_MASK (0xf)
+#define VERSIONPTP_SHIFT (8) /* PTP v2 versionPTP field */
+#define VERSIONPTP_MASK (0xf)
+#define TRANSPORT_SPECIFIC_SHIFT (4) /* PTP v2 Header transportSpecific field */
+#define TRANSPORT_SPECIFIC_MASK (0xf)
+#define MESSAGETYPE_SHIFT (0) /* PTP v2 messageType field */
+#define MESSAGETYPE_MASK (0xf)
+
+/* Bit definitions for the PTP_SFDCFG register */
+#define TX_SFD_GPIO_SHIFT (4) /* TX SFD GPIO Select, value 1-12 */
+#define TX_SFD_GPIO_MASK (0xf)
+#define RX_SFD_GPIO_SHIFT (0) /* RX SFD GPIO Select, value 1-12 */
+#define RX_SFD_GPIO_MASK (0xf)
+
+/* Bit definitions for the PTP_INTCTL register */
+#define PTP_INT_GPIO_SHIFT (0) /* PTP Interrupt GPIO Select */
+#define PTP_INT_GPIO_MASK (0xf)
+
+/* Bit definitions for the PTP_CLKSRC register */
+#define CLK_SRC_SHIFT (14) /* PTP Clock Source Select */
+#define CLK_SRC_MASK (0x3)
+#define CLK_SRC_PER_SHIFT (0) /* PTP Clock Source Period */
+#define CLK_SRC_PER_MASK (0x7f)
+
+/* Bit definitions for the PTP_OFF register */
+#define PTP_OFFSET_SHIFT (0) /* PTP Message offset from preceding header */
+#define PTP_OFFSET_MASK (0xff)
+
+#endif
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index fcfafd0..68d7201 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -53,4 +53,23 @@ config PTP_1588_CLOCK_IXP46X
To compile this driver as a module, choose M here: the module
will be called ptp_ixp46x.
+comment "Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks."
+ depends on PTP_1588_CLOCK && (PHYLIB=n || NETWORK_PHY_TIMESTAMPING=n)
+
+config DP83640_PHY
+ tristate "Driver for the National Semiconductor DP83640 PHYTER"
+ depends on PTP_1588_CLOCK
+ depends on NETWORK_PHY_TIMESTAMPING
+ depends on PHYLIB
+ ---help---
+ Supports the DP83640 PHYTER with IEEE 1588 features.
+
+ This driver adds support for using the DP83640 as a PTP
+ clock. This clock is only useful if your PTP programs are
+ getting hardware time stamps on the PTP Ethernet packets
+ using the SO_TIMESTAMPING API.
+
+ In order for this to work, your MAC driver must also
+ implement the skb_tx_timetamp() function.
+
endmenu
--
1.7.0.4
^ permalink raw reply related
* [PATCH V12 3/4] ptp: Added a clock driver for the IXP46x.
From: Richard Cochran @ 2011-02-28 7:58 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Alan Cox, Arnd Bergmann,
Christoph Lameter, David Miller, John Stultz, Krzysztof Halasa,
Peter Zijlstra, Rodolfo Giometti, Thomas Gleixner,
Benjamin Herrenschmidt, Mike Frysinger, Paul Mackerras,
Russell King
In-Reply-To: <cover.1298878618.git.richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>
This patch adds a driver for the hardware time stamping unit found on the
IXP465. The basic clock operations and an external trigger are implemented.
Signed-off-by: Richard Cochran <richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>
Acked-by: John Stultz <johnstul-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h | 78 ++++++
drivers/net/arm/ixp4xx_eth.c | 192 ++++++++++++++-
drivers/ptp/Kconfig | 13 +
drivers/ptp/Makefile | 1 +
drivers/ptp/ptp_ixp46x.c | 332 +++++++++++++++++++++++++
5 files changed, 613 insertions(+), 3 deletions(-)
create mode 100644 arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h
create mode 100644 drivers/ptp/ptp_ixp46x.c
diff --git a/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h b/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h
new file mode 100644
index 0000000..292d55e
--- /dev/null
+++ b/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h
@@ -0,0 +1,78 @@
+/*
+ * PTP 1588 clock using the IXP46X
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _IXP46X_TS_H_
+#define _IXP46X_TS_H_
+
+#define DEFAULT_ADDEND 0xF0000029
+#define TICKS_NS_SHIFT 4
+
+struct ixp46x_channel_ctl {
+ u32 ch_control; /* 0x40 Time Synchronization Channel Control */
+ u32 ch_event; /* 0x44 Time Synchronization Channel Event */
+ u32 tx_snap_lo; /* 0x48 Transmit Snapshot Low Register */
+ u32 tx_snap_hi; /* 0x4C Transmit Snapshot High Register */
+ u32 rx_snap_lo; /* 0x50 Receive Snapshot Low Register */
+ u32 rx_snap_hi; /* 0x54 Receive Snapshot High Register */
+ u32 src_uuid_lo; /* 0x58 Source UUID0 Low Register */
+ u32 src_uuid_hi; /* 0x5C Sequence Identifier/Source UUID0 High */
+};
+
+struct ixp46x_ts_regs {
+ u32 control; /* 0x00 Time Sync Control Register */
+ u32 event; /* 0x04 Time Sync Event Register */
+ u32 addend; /* 0x08 Time Sync Addend Register */
+ u32 accum; /* 0x0C Time Sync Accumulator Register */
+ u32 test; /* 0x10 Time Sync Test Register */
+ u32 unused; /* 0x14 */
+ u32 rsystime_lo; /* 0x18 RawSystemTime_Low Register */
+ u32 rsystime_hi; /* 0x1C RawSystemTime_High Register */
+ u32 systime_lo; /* 0x20 SystemTime_Low Register */
+ u32 systime_hi; /* 0x24 SystemTime_High Register */
+ u32 trgt_lo; /* 0x28 TargetTime_Low Register */
+ u32 trgt_hi; /* 0x2C TargetTime_High Register */
+ u32 asms_lo; /* 0x30 Auxiliary Slave Mode Snapshot Low */
+ u32 asms_hi; /* 0x34 Auxiliary Slave Mode Snapshot High */
+ u32 amms_lo; /* 0x38 Auxiliary Master Mode Snapshot Low */
+ u32 amms_hi; /* 0x3C Auxiliary Master Mode Snapshot High */
+
+ struct ixp46x_channel_ctl channel[3];
+};
+
+/* 0x00 Time Sync Control Register Bits */
+#define TSCR_AMM (1<<3)
+#define TSCR_ASM (1<<2)
+#define TSCR_TTM (1<<1)
+#define TSCR_RST (1<<0)
+
+/* 0x04 Time Sync Event Register Bits */
+#define TSER_SNM (1<<3)
+#define TSER_SNS (1<<2)
+#define TTIPEND (1<<1)
+
+/* 0x40 Time Synchronization Channel Control Register Bits */
+#define MASTER_MODE (1<<0)
+#define TIMESTAMP_ALL (1<<1)
+
+/* 0x44 Time Synchronization Channel Event Register Bits */
+#define TX_SNAPSHOT_LOCKED (1<<0)
+#define RX_SNAPSHOT_LOCKED (1<<1)
+
+#endif
diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c
index 9eb9b98..fa08c17 100644
--- a/drivers/net/arm/ixp4xx_eth.c
+++ b/drivers/net/arm/ixp4xx_eth.c
@@ -30,9 +30,12 @@
#include <linux/etherdevice.h>
#include <linux/io.h>
#include <linux/kernel.h>
+#include <linux/net_tstamp.h>
#include <linux/phy.h>
#include <linux/platform_device.h>
+#include <linux/ptp_classify.h>
#include <linux/slab.h>
+#include <mach/ixp46x_ts.h>
#include <mach/npe.h>
#include <mach/qmgr.h>
@@ -67,6 +70,10 @@
#define RXFREE_QUEUE(port_id) (NPE_ID(port_id) + 26)
#define TXDONE_QUEUE 31
+#define PTP_SLAVE_MODE 1
+#define PTP_MASTER_MODE 2
+#define PORT2CHANNEL(p) NPE_ID(p->id)
+
/* TX Control Registers */
#define TX_CNTRL0_TX_EN 0x01
#define TX_CNTRL0_HALFDUPLEX 0x02
@@ -171,6 +178,8 @@ struct port {
int id; /* logical port ID */
int speed, duplex;
u8 firmware[4];
+ int hwts_tx_en;
+ int hwts_rx_en;
};
/* NPE message structure */
@@ -246,6 +255,169 @@ static int ports_open;
static struct port *npe_port_tab[MAX_NPES];
static struct dma_pool *dma_pool;
+static struct sock_filter ptp_filter[] = {
+ PTP_FILTER
+};
+
+static int match(struct sk_buff *skb, u16 uid_hi, u32 uid_lo, u16 seq)
+{
+ unsigned int type;
+ u16 *hi, *id;
+ u8 *lo, *data = skb->data;
+
+ type = sk_run_filter(skb, ptp_filter);
+
+ if (PTP_CLASS_V1_IPV4 == type) {
+
+ id = (u16 *)(data + 42 + 30);
+ hi = (u16 *)(data + 42 + 22);
+ lo = data + 42 + 24;
+
+ return (uid_hi == *hi &&
+ 0 == memcmp(&uid_lo, lo, sizeof(uid_lo)) &&
+ seq == *id);
+ }
+
+ return 0;
+}
+
+static void do_rx_timestamp(struct port *port, struct sk_buff *skb)
+{
+ struct skb_shared_hwtstamps *shhwtstamps;
+ struct ixp46x_ts_regs *regs;
+ u64 ns;
+ u32 ch, hi, lo, val;
+ u16 uid, seq;
+
+ if (!port->hwts_rx_en)
+ return;
+
+ ch = PORT2CHANNEL(port);
+
+ regs = (struct ixp46x_ts_regs __iomem *) IXP4XX_TIMESYNC_BASE_VIRT;
+
+ val = __raw_readl(®s->channel[ch].ch_event);
+
+ if (!(val & RX_SNAPSHOT_LOCKED))
+ return;
+
+ lo = __raw_readl(®s->channel[ch].src_uuid_lo);
+ hi = __raw_readl(®s->channel[ch].src_uuid_hi);
+
+ uid = hi & 0xffff;
+ seq = (hi >> 16) & 0xffff;
+
+ if (!match(skb, htons(uid), htonl(lo), htons(seq)))
+ goto out;
+
+ lo = __raw_readl(®s->channel[ch].rx_snap_lo);
+ hi = __raw_readl(®s->channel[ch].rx_snap_hi);
+ ns = ((u64) hi) << 32;
+ ns |= lo;
+ ns <<= TICKS_NS_SHIFT;
+
+ shhwtstamps = skb_hwtstamps(skb);
+ memset(shhwtstamps, 0, sizeof(*shhwtstamps));
+ shhwtstamps->hwtstamp = ns_to_ktime(ns);
+out:
+ __raw_writel(RX_SNAPSHOT_LOCKED, ®s->channel[ch].ch_event);
+}
+
+static void do_tx_timestamp(struct port *port, struct sk_buff *skb)
+{
+ struct skb_shared_hwtstamps shhwtstamps;
+ struct ixp46x_ts_regs *regs;
+ struct skb_shared_info *shtx;
+ u64 ns;
+ u32 ch, cnt, hi, lo, val;
+
+ shtx = skb_shinfo(skb);
+ if (unlikely(shtx->tx_flags & SKBTX_HW_TSTAMP && port->hwts_tx_en))
+ shtx->tx_flags |= SKBTX_IN_PROGRESS;
+ else
+ return;
+
+ ch = PORT2CHANNEL(port);
+
+ regs = (struct ixp46x_ts_regs __iomem *) IXP4XX_TIMESYNC_BASE_VIRT;
+
+ /*
+ * This really stinks, but we have to poll for the Tx time stamp.
+ * Usually, the time stamp is ready after 4 to 6 microseconds.
+ */
+ for (cnt = 0; cnt < 100; cnt++) {
+ val = __raw_readl(®s->channel[ch].ch_event);
+ if (val & TX_SNAPSHOT_LOCKED)
+ break;
+ udelay(1);
+ }
+ if (!(val & TX_SNAPSHOT_LOCKED)) {
+ shtx->tx_flags &= ~SKBTX_IN_PROGRESS;
+ return;
+ }
+
+ lo = __raw_readl(®s->channel[ch].tx_snap_lo);
+ hi = __raw_readl(®s->channel[ch].tx_snap_hi);
+ ns = ((u64) hi) << 32;
+ ns |= lo;
+ ns <<= TICKS_NS_SHIFT;
+
+ memset(&shhwtstamps, 0, sizeof(shhwtstamps));
+ shhwtstamps.hwtstamp = ns_to_ktime(ns);
+ skb_tstamp_tx(skb, &shhwtstamps);
+
+ __raw_writel(TX_SNAPSHOT_LOCKED, ®s->channel[ch].ch_event);
+}
+
+static int hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+ struct hwtstamp_config cfg;
+ struct ixp46x_ts_regs *regs;
+ struct port *port = netdev_priv(netdev);
+ int ch;
+
+ if (copy_from_user(&cfg, ifr->ifr_data, sizeof(cfg)))
+ return -EFAULT;
+
+ if (cfg.flags) /* reserved for future extensions */
+ return -EINVAL;
+
+ ch = PORT2CHANNEL(port);
+ regs = (struct ixp46x_ts_regs __iomem *) IXP4XX_TIMESYNC_BASE_VIRT;
+
+ switch (cfg.tx_type) {
+ case HWTSTAMP_TX_OFF:
+ port->hwts_tx_en = 0;
+ break;
+ case HWTSTAMP_TX_ON:
+ port->hwts_tx_en = 1;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ switch (cfg.rx_filter) {
+ case HWTSTAMP_FILTER_NONE:
+ port->hwts_rx_en = 0;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+ port->hwts_rx_en = PTP_SLAVE_MODE;
+ __raw_writel(0, ®s->channel[ch].ch_control);
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+ port->hwts_rx_en = PTP_MASTER_MODE;
+ __raw_writel(MASTER_MODE, ®s->channel[ch].ch_control);
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ /* Clear out any old time stamps. */
+ __raw_writel(TX_SNAPSHOT_LOCKED | RX_SNAPSHOT_LOCKED,
+ ®s->channel[ch].ch_event);
+
+ return copy_to_user(ifr->ifr_data, &cfg, sizeof(cfg)) ? -EFAULT : 0;
+}
static int ixp4xx_mdio_cmd(struct mii_bus *bus, int phy_id, int location,
int write, u16 cmd)
@@ -573,6 +745,7 @@ static int eth_poll(struct napi_struct *napi, int budget)
debug_pkt(dev, "eth_poll", skb->data, skb->len);
+ do_rx_timestamp(port, skb);
skb->protocol = eth_type_trans(skb, dev);
dev->stats.rx_packets++;
dev->stats.rx_bytes += skb->len;
@@ -679,14 +852,12 @@ static int eth_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
memcpy_swab32(mem, (u32 *)((int)skb->data & ~3), bytes / 4);
- dev_kfree_skb(skb);
#endif
phys = dma_map_single(&dev->dev, mem, bytes, DMA_TO_DEVICE);
if (dma_mapping_error(&dev->dev, phys)) {
-#ifdef __ARMEB__
dev_kfree_skb(skb);
-#else
+#ifndef __ARMEB__
kfree(mem);
#endif
dev->stats.tx_dropped++;
@@ -728,6 +899,13 @@ static int eth_xmit(struct sk_buff *skb, struct net_device *dev)
#if DEBUG_TX
printk(KERN_DEBUG "%s: eth_xmit end\n", dev->name);
#endif
+
+ do_tx_timestamp(port, skb);
+ skb_tx_timestamp(skb);
+
+#ifndef __ARMEB__
+ dev_kfree_skb(skb);
+#endif
return NETDEV_TX_OK;
}
@@ -783,6 +961,9 @@ static int eth_ioctl(struct net_device *dev, struct ifreq *req, int cmd)
if (!netif_running(dev))
return -EINVAL;
+ if (cpu_is_ixp46x() && cmd == SIOCSHWTSTAMP)
+ return hwtstamp_ioctl(dev, req, cmd);
+
return phy_mii_ioctl(port->phydev, req, cmd);
}
@@ -1171,6 +1352,11 @@ static int __devinit eth_init_one(struct platform_device *pdev)
char phy_id[MII_BUS_ID_SIZE + 3];
int err;
+ if (ptp_filter_init(ptp_filter, ARRAY_SIZE(ptp_filter))) {
+ pr_err("ixp4xx_eth: bad ptp filter\n");
+ return -EINVAL;
+ }
+
if (!(dev = alloc_etherdev(sizeof(struct port))))
return -ENOMEM;
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 12eb844..fcfafd0 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -40,4 +40,17 @@ config PTP_1588_CLOCK_GIANFAR
To compile this driver as a module, choose M here: the module
will be called gianfar_ptp.
+config PTP_1588_CLOCK_IXP46X
+ tristate "Intel IXP46x as PTP clock"
+ depends on PTP_1588_CLOCK
+ depends on IXP4XX_ETH
+ help
+ This driver adds support for using the IXP46X as a PTP
+ clock. This clock is only useful if your PTP programs are
+ getting hardware time stamps on the PTP Ethernet packets
+ using the SO_TIMESTAMPING API.
+
+ To compile this driver as a module, choose M here: the module
+ will be called ptp_ixp46x.
+
endmenu
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index 480e2af..f6933e8 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -4,3 +4,4 @@
ptp-y := ptp_clock.o ptp_chardev.o ptp_sysfs.o
obj-$(CONFIG_PTP_1588_CLOCK) += ptp.o
+obj-$(CONFIG_PTP_1588_CLOCK_IXP46X) += ptp_ixp46x.o
diff --git a/drivers/ptp/ptp_ixp46x.c b/drivers/ptp/ptp_ixp46x.c
new file mode 100644
index 0000000..e78545f
--- /dev/null
+++ b/drivers/ptp/ptp_ixp46x.c
@@ -0,0 +1,332 @@
+/*
+ * PTP 1588 clock using the IXP46X
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/gpio.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <linux/ptp_clock_kernel.h>
+#include <mach/ixp46x_ts.h>
+
+#define DRIVER "ptp_ixp46x"
+#define N_EXT_TS 2
+#define MASTER_GPIO 8
+#define MASTER_IRQ 25
+#define SLAVE_GPIO 7
+#define SLAVE_IRQ 24
+
+struct ixp_clock {
+ struct ixp46x_ts_regs *regs;
+ struct ptp_clock *ptp_clock;
+ struct ptp_clock_info caps;
+ int exts0_enabled;
+ int exts1_enabled;
+};
+
+DEFINE_SPINLOCK(register_lock);
+
+/*
+ * Register access functions
+ */
+
+static u64 sys_time_read(struct ixp46x_ts_regs *regs)
+{
+ u64 ns;
+ u32 lo, hi;
+
+ lo = __raw_readl(®s->systime_lo);
+ hi = __raw_readl(®s->systime_hi);
+
+ ns = ((u64) hi) << 32;
+ ns |= lo;
+ ns <<= TICKS_NS_SHIFT;
+
+ return ns;
+}
+
+static void sys_time_write(struct ixp46x_ts_regs *regs, u64 ns)
+{
+ u32 hi, lo;
+
+ ns >>= TICKS_NS_SHIFT;
+ hi = ns >> 32;
+ lo = ns & 0xffffffff;
+
+ __raw_writel(lo, ®s->systime_lo);
+ __raw_writel(hi, ®s->systime_hi);
+}
+
+/*
+ * Interrupt service routine
+ */
+
+static irqreturn_t isr(int irq, void *priv)
+{
+ struct ixp_clock *ixp_clock = priv;
+ struct ixp46x_ts_regs *regs = ixp_clock->regs;
+ struct ptp_clock_event event;
+ u32 ack = 0, lo, hi, val;
+
+ val = __raw_readl(®s->event);
+
+ if (val & TSER_SNS) {
+ ack |= TSER_SNS;
+ if (ixp_clock->exts0_enabled) {
+ hi = __raw_readl(®s->asms_hi);
+ lo = __raw_readl(®s->asms_lo);
+ event.type = PTP_CLOCK_EXTTS;
+ event.index = 0;
+ event.timestamp = ((u64) hi) << 32;
+ event.timestamp |= lo;
+ event.timestamp <<= TICKS_NS_SHIFT;
+ ptp_clock_event(ixp_clock->ptp_clock, &event);
+ }
+ }
+
+ if (val & TSER_SNM) {
+ ack |= TSER_SNM;
+ if (ixp_clock->exts1_enabled) {
+ hi = __raw_readl(®s->amms_hi);
+ lo = __raw_readl(®s->amms_lo);
+ event.type = PTP_CLOCK_EXTTS;
+ event.index = 1;
+ event.timestamp = ((u64) hi) << 32;
+ event.timestamp |= lo;
+ event.timestamp <<= TICKS_NS_SHIFT;
+ ptp_clock_event(ixp_clock->ptp_clock, &event);
+ }
+ }
+
+ if (val & TTIPEND)
+ ack |= TTIPEND; /* this bit seems to be always set */
+
+ if (ack) {
+ __raw_writel(ack, ®s->event);
+ return IRQ_HANDLED;
+ } else
+ return IRQ_NONE;
+}
+
+/*
+ * PTP clock operations
+ */
+
+static int ptp_ixp_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
+{
+ u64 adj;
+ u32 diff, addend;
+ int neg_adj = 0;
+ struct ixp_clock *ixp_clock = container_of(ptp, struct ixp_clock, caps);
+ struct ixp46x_ts_regs *regs = ixp_clock->regs;
+
+ if (ppb < 0) {
+ neg_adj = 1;
+ ppb = -ppb;
+ }
+ addend = DEFAULT_ADDEND;
+ adj = addend;
+ adj *= ppb;
+ diff = div_u64(adj, 1000000000ULL);
+
+ addend = neg_adj ? addend - diff : addend + diff;
+
+ __raw_writel(addend, ®s->addend);
+
+ return 0;
+}
+
+static int ptp_ixp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+ s64 now;
+ unsigned long flags;
+ struct ixp_clock *ixp_clock = container_of(ptp, struct ixp_clock, caps);
+ struct ixp46x_ts_regs *regs = ixp_clock->regs;
+
+ spin_lock_irqsave(®ister_lock, flags);
+
+ now = sys_time_read(regs);
+ now += delta;
+ sys_time_write(regs, now);
+
+ spin_unlock_irqrestore(®ister_lock, flags);
+
+ return 0;
+}
+
+static int ptp_ixp_gettime(struct ptp_clock_info *ptp, struct timespec *ts)
+{
+ u64 ns;
+ u32 remainder;
+ unsigned long flags;
+ struct ixp_clock *ixp_clock = container_of(ptp, struct ixp_clock, caps);
+ struct ixp46x_ts_regs *regs = ixp_clock->regs;
+
+ spin_lock_irqsave(®ister_lock, flags);
+
+ ns = sys_time_read(regs);
+
+ spin_unlock_irqrestore(®ister_lock, flags);
+
+ ts->tv_sec = div_u64_rem(ns, 1000000000, &remainder);
+ ts->tv_nsec = remainder;
+ return 0;
+}
+
+static int ptp_ixp_settime(struct ptp_clock_info *ptp,
+ const struct timespec *ts)
+{
+ u64 ns;
+ unsigned long flags;
+ struct ixp_clock *ixp_clock = container_of(ptp, struct ixp_clock, caps);
+ struct ixp46x_ts_regs *regs = ixp_clock->regs;
+
+ ns = ts->tv_sec * 1000000000ULL;
+ ns += ts->tv_nsec;
+
+ spin_lock_irqsave(®ister_lock, flags);
+
+ sys_time_write(regs, ns);
+
+ spin_unlock_irqrestore(®ister_lock, flags);
+
+ return 0;
+}
+
+static int ptp_ixp_enable(struct ptp_clock_info *ptp,
+ struct ptp_clock_request *rq, int on)
+{
+ struct ixp_clock *ixp_clock = container_of(ptp, struct ixp_clock, caps);
+
+ switch (rq->type) {
+ case PTP_CLK_REQ_EXTTS:
+ switch (rq->extts.index) {
+ case 0:
+ ixp_clock->exts0_enabled = on ? 1 : 0;
+ break;
+ case 1:
+ ixp_clock->exts1_enabled = on ? 1 : 0;
+ break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
+ default:
+ break;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static struct ptp_clock_info ptp_ixp_caps = {
+ .owner = THIS_MODULE,
+ .name = "IXP46X timer",
+ .max_adj = 66666655,
+ .n_ext_ts = N_EXT_TS,
+ .pps = 0,
+ .adjfreq = ptp_ixp_adjfreq,
+ .adjtime = ptp_ixp_adjtime,
+ .gettime = ptp_ixp_gettime,
+ .settime = ptp_ixp_settime,
+ .enable = ptp_ixp_enable,
+};
+
+/* module operations */
+
+static struct ixp_clock ixp_clock;
+
+static int setup_interrupt(int gpio)
+{
+ int irq;
+
+ gpio_line_config(gpio, IXP4XX_GPIO_IN);
+
+ irq = gpio_to_irq(gpio);
+
+ if (NO_IRQ == irq)
+ return NO_IRQ;
+
+ if (set_irq_type(irq, IRQF_TRIGGER_FALLING)) {
+ pr_err("cannot set trigger type for irq %d\n", irq);
+ return NO_IRQ;
+ }
+
+ if (request_irq(irq, isr, 0, DRIVER, &ixp_clock)) {
+ pr_err("request_irq failed for irq %d\n", irq);
+ return NO_IRQ;
+ }
+
+ return irq;
+}
+
+static void __exit ptp_ixp_exit(void)
+{
+ free_irq(MASTER_IRQ, &ixp_clock);
+ free_irq(SLAVE_IRQ, &ixp_clock);
+ ptp_clock_unregister(ixp_clock.ptp_clock);
+}
+
+static int __init ptp_ixp_init(void)
+{
+ if (!cpu_is_ixp46x())
+ return -ENODEV;
+
+ ixp_clock.regs =
+ (struct ixp46x_ts_regs __iomem *) IXP4XX_TIMESYNC_BASE_VIRT;
+
+ ixp_clock.caps = ptp_ixp_caps;
+
+ ixp_clock.ptp_clock = ptp_clock_register(&ixp_clock.caps);
+
+ if (IS_ERR(ixp_clock.ptp_clock))
+ return PTR_ERR(ixp_clock.ptp_clock);
+
+ __raw_writel(DEFAULT_ADDEND, &ixp_clock.regs->addend);
+ __raw_writel(1, &ixp_clock.regs->trgt_lo);
+ __raw_writel(0, &ixp_clock.regs->trgt_hi);
+ __raw_writel(TTIPEND, &ixp_clock.regs->event);
+
+ if (MASTER_IRQ != setup_interrupt(MASTER_GPIO)) {
+ pr_err("failed to setup gpio %d as irq\n", MASTER_GPIO);
+ goto no_master;
+ }
+ if (SLAVE_IRQ != setup_interrupt(SLAVE_GPIO)) {
+ pr_err("failed to setup gpio %d as irq\n", SLAVE_GPIO);
+ goto no_slave;
+ }
+
+ return 0;
+no_slave:
+ free_irq(MASTER_IRQ, &ixp_clock);
+no_master:
+ ptp_clock_unregister(ixp_clock.ptp_clock);
+ return -ENODEV;
+}
+
+module_init(ptp_ixp_init);
+module_exit(ptp_ixp_exit);
+
+MODULE_AUTHOR("Richard Cochran <richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>");
+MODULE_DESCRIPTION("PTP clock using the IXP46X timer");
+MODULE_LICENSE("GPL");
--
1.7.0.4
^ permalink raw reply related
* [PATCH V12 2/4] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Richard Cochran @ 2011-02-28 7:57 UTC (permalink / raw)
To: linux-kernel
Cc: linux-api, netdev, devicetree-discuss, linux-arm-kernel,
linuxppc-dev, Alan Cox, Arnd Bergmann, Christoph Lameter,
David Miller, John Stultz, Krzysztof Halasa, Peter Zijlstra,
Rodolfo Giometti, Thomas Gleixner, Benjamin Herrenschmidt,
Mike Frysinger, Paul Mackerras, Russell King
In-Reply-To: <cover.1298878618.git.richard.cochran@omicron.at>
The eTSEC includes a PTP clock with quite a few features. This patch adds
support for the basic clock adjustment functions, plus two external time
stamps, one alarm, and the PPS callback.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
.../devicetree/bindings/net/fsl-tsec-phy.txt | 54 ++
arch/powerpc/boot/dts/mpc8313erdb.dts | 13 +
arch/powerpc/boot/dts/mpc8572ds.dts | 13 +
arch/powerpc/boot/dts/p2020ds.dts | 13 +
arch/powerpc/boot/dts/p2020rdb.dts | 13 +
drivers/net/Makefile | 1 +
drivers/net/gianfar_ptp.c | 579 ++++++++++++++++++++
drivers/ptp/Kconfig | 13 +
8 files changed, 699 insertions(+), 0 deletions(-)
create mode 100644 drivers/net/gianfar_ptp.c
diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
index edb7ae1..2c6be03 100644
--- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
+++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
@@ -74,3 +74,57 @@ Example:
interrupt-parent = <&mpic>;
phy-handle = <&phy0>
};
+
+* Gianfar PTP clock nodes
+
+General Properties:
+
+ - compatible Should be "fsl,etsec-ptp"
+ - reg Offset and length of the register set for the device
+ - interrupts There should be at least two interrupts. Some devices
+ have as many as four PTP related interrupts.
+
+Clock Properties:
+
+ - fsl,tclk-period Timer reference clock period in nanoseconds.
+ - fsl,tmr-prsc Prescaler, divides the output clock.
+ - fsl,tmr-add Frequency compensation value.
+ - fsl,tmr-fiper1 Fixed interval period pulse generator.
+ - fsl,tmr-fiper2 Fixed interval period pulse generator.
+ - fsl,max-adj Maximum frequency adjustment in parts per billion.
+
+ These properties set the operational parameters for the PTP
+ clock. You must choose these carefully for the clock to work right.
+ Here is how to figure good values:
+
+ TimerOsc = system clock MHz
+ tclk_period = desired clock period nanoseconds
+ NominalFreq = 1000 / tclk_period MHz
+ FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0)
+ tmr_add = ceil(2^32 / FreqDivRatio)
+ OutputClock = NominalFreq / tmr_prsc MHz
+ PulseWidth = 1 / OutputClock microseconds
+ FiperFreq1 = desired frequency in Hz
+ FiperDiv1 = 1000000 * OutputClock / FiperFreq1
+ tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
+ max_adj = 1000000000 * (FreqDivRatio - 1.0) - 1
+
+ The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
+ driver expects that tmr_fiper1 will be correctly set to produce a 1
+ Pulse Per Second (PPS) signal, since this will be offered to the PPS
+ subsystem to synchronize the Linux clock.
+
+Example:
+
+ ptp_clock@24E00 {
+ compatible = "fsl,etsec-ptp";
+ reg = <0x24E00 0xB0>;
+ interrupts = <12 0x8 13 0x8>;
+ interrupt-parent = < &ipic >;
+ fsl,tclk-period = <10>;
+ fsl,tmr-prsc = <100>;
+ fsl,tmr-add = <0x999999A4>;
+ fsl,tmr-fiper1 = <0x3B9AC9F6>;
+ fsl,tmr-fiper2 = <0x00018696>;
+ fsl,max-adj = <659999998>;
+ };
diff --git a/arch/powerpc/boot/dts/mpc8313erdb.dts b/arch/powerpc/boot/dts/mpc8313erdb.dts
index 183f2aa..502e17c 100644
--- a/arch/powerpc/boot/dts/mpc8313erdb.dts
+++ b/arch/powerpc/boot/dts/mpc8313erdb.dts
@@ -208,6 +208,19 @@
sleep = <&pmc 0x00300000>;
};
+ ptp_clock@24E00 {
+ compatible = "fsl,etsec-ptp";
+ reg = <0x24E00 0xB0>;
+ interrupts = <12 0x8 13 0x8>;
+ interrupt-parent = < &ipic >;
+ fsl,tclk-period = <10>;
+ fsl,tmr-prsc = <100>;
+ fsl,tmr-add = <0x999999A4>;
+ fsl,tmr-fiper1 = <0x3B9AC9F6>;
+ fsl,tmr-fiper2 = <0x00018696>;
+ fsl,max-adj = <659999998>;
+ };
+
enet0: ethernet@24000 {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/powerpc/boot/dts/mpc8572ds.dts b/arch/powerpc/boot/dts/mpc8572ds.dts
index cafc128..f6c04d2 100644
--- a/arch/powerpc/boot/dts/mpc8572ds.dts
+++ b/arch/powerpc/boot/dts/mpc8572ds.dts
@@ -324,6 +324,19 @@
};
};
+ ptp_clock@24E00 {
+ compatible = "fsl,etsec-ptp";
+ reg = <0x24E00 0xB0>;
+ interrupts = <68 2 69 2 70 2 71 2>;
+ interrupt-parent = < &mpic >;
+ fsl,tclk-period = <5>;
+ fsl,tmr-prsc = <200>;
+ fsl,tmr-add = <0xAAAAAAAB>;
+ fsl,tmr-fiper1 = <0x3B9AC9FB>;
+ fsl,tmr-fiper2 = <0x3B9AC9FB>;
+ fsl,max-adj = <499999999>;
+ };
+
enet0: ethernet@24000 {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/powerpc/boot/dts/p2020ds.dts b/arch/powerpc/boot/dts/p2020ds.dts
index 1101914..3956daf 100644
--- a/arch/powerpc/boot/dts/p2020ds.dts
+++ b/arch/powerpc/boot/dts/p2020ds.dts
@@ -336,6 +336,19 @@
phy_type = "ulpi";
};
+ ptp_clock@24E00 {
+ compatible = "fsl,etsec-ptp";
+ reg = <0x24E00 0xB0>;
+ interrupts = <68 2 69 2 70 2>;
+ interrupt-parent = < &mpic >;
+ fsl,tclk-period = <5>;
+ fsl,tmr-prsc = <200>;
+ fsl,tmr-add = <0xCCCCCCCD>;
+ fsl,tmr-fiper1 = <0x3B9AC9FB>;
+ fsl,tmr-fiper2 = <0x0001869B>;
+ fsl,max-adj = <249999999>;
+ };
+
enet0: ethernet@24000 {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/powerpc/boot/dts/p2020rdb.dts b/arch/powerpc/boot/dts/p2020rdb.dts
index da4cb0d..a6229cb 100644
--- a/arch/powerpc/boot/dts/p2020rdb.dts
+++ b/arch/powerpc/boot/dts/p2020rdb.dts
@@ -396,6 +396,19 @@
phy_type = "ulpi";
};
+ ptp_clock@24E00 {
+ compatible = "fsl,etsec-ptp";
+ reg = <0x24E00 0xB0>;
+ interrupts = <68 2 69 2 70 2>;
+ interrupt-parent = < &mpic >;
+ fsl,tclk-period = <5>;
+ fsl,tmr-prsc = <200>;
+ fsl,tmr-add = <0xCCCCCCCD>;
+ fsl,tmr-fiper1 = <0x3B9AC9FB>;
+ fsl,tmr-fiper2 = <0x0001869B>;
+ fsl,max-adj = <249999999>;
+ };
+
enet0: ethernet@24000 {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index b90738d..c303f5f 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -31,6 +31,7 @@ obj-$(CONFIG_ATL2) += atlx/
obj-$(CONFIG_ATL1E) += atl1e/
obj-$(CONFIG_ATL1C) += atl1c/
obj-$(CONFIG_GIANFAR) += gianfar_driver.o
+obj-$(CONFIG_PTP_1588_CLOCK_GIANFAR) += gianfar_ptp.o
obj-$(CONFIG_TEHUTI) += tehuti.o
obj-$(CONFIG_ENIC) += enic/
obj-$(CONFIG_JME) += jme.o
diff --git a/drivers/net/gianfar_ptp.c b/drivers/net/gianfar_ptp.c
new file mode 100644
index 0000000..23d2eb5
--- /dev/null
+++ b/drivers/net/gianfar_ptp.c
@@ -0,0 +1,579 @@
+/*
+ * PTP 1588 clock using the eTSEC
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <linux/device.h>
+#include <linux/hrtimer.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/timex.h>
+#include <linux/io.h>
+
+#include <linux/ptp_clock_kernel.h>
+
+#include "gianfar.h"
+
+/*
+ * gianfar ptp registers
+ * Generated by regen.tcl on Thu May 13 01:38:57 PM CEST 2010
+ */
+struct gianfar_ptp_registers {
+ u32 tmr_ctrl; /* Timer control register */
+ u32 tmr_tevent; /* Timestamp event register */
+ u32 tmr_temask; /* Timer event mask register */
+ u32 tmr_pevent; /* Timestamp event register */
+ u32 tmr_pemask; /* Timer event mask register */
+ u32 tmr_stat; /* Timestamp status register */
+ u32 tmr_cnt_h; /* Timer counter high register */
+ u32 tmr_cnt_l; /* Timer counter low register */
+ u32 tmr_add; /* Timer drift compensation addend register */
+ u32 tmr_acc; /* Timer accumulator register */
+ u32 tmr_prsc; /* Timer prescale */
+ u8 res1[4];
+ u32 tmroff_h; /* Timer offset high */
+ u32 tmroff_l; /* Timer offset low */
+ u8 res2[8];
+ u32 tmr_alarm1_h; /* Timer alarm 1 high register */
+ u32 tmr_alarm1_l; /* Timer alarm 1 high register */
+ u32 tmr_alarm2_h; /* Timer alarm 2 high register */
+ u32 tmr_alarm2_l; /* Timer alarm 2 high register */
+ u8 res3[48];
+ u32 tmr_fiper1; /* Timer fixed period interval */
+ u32 tmr_fiper2; /* Timer fixed period interval */
+ u32 tmr_fiper3; /* Timer fixed period interval */
+ u8 res4[20];
+ u32 tmr_etts1_h; /* Timestamp of general purpose external trigger */
+ u32 tmr_etts1_l; /* Timestamp of general purpose external trigger */
+ u32 tmr_etts2_h; /* Timestamp of general purpose external trigger */
+ u32 tmr_etts2_l; /* Timestamp of general purpose external trigger */
+};
+
+/* Bit definitions for the TMR_CTRL register */
+#define ALM1P (1<<31) /* Alarm1 output polarity */
+#define ALM2P (1<<30) /* Alarm2 output polarity */
+#define FS (1<<28) /* FIPER start indication */
+#define PP1L (1<<27) /* Fiper1 pulse loopback mode enabled. */
+#define PP2L (1<<26) /* Fiper2 pulse loopback mode enabled. */
+#define TCLK_PERIOD_SHIFT (16) /* 1588 timer reference clock period. */
+#define TCLK_PERIOD_MASK (0x3ff)
+#define RTPE (1<<15) /* Record Tx Timestamp to PAL Enable. */
+#define FRD (1<<14) /* FIPER Realignment Disable */
+#define ESFDP (1<<11) /* External Tx/Rx SFD Polarity. */
+#define ESFDE (1<<10) /* External Tx/Rx SFD Enable. */
+#define ETEP2 (1<<9) /* External trigger 2 edge polarity */
+#define ETEP1 (1<<8) /* External trigger 1 edge polarity */
+#define COPH (1<<7) /* Generated clock output phase. */
+#define CIPH (1<<6) /* External oscillator input clock phase */
+#define TMSR (1<<5) /* Timer soft reset. */
+#define BYP (1<<3) /* Bypass drift compensated clock */
+#define TE (1<<2) /* 1588 timer enable. */
+#define CKSEL_SHIFT (0) /* 1588 Timer reference clock source */
+#define CKSEL_MASK (0x3)
+
+/* Bit definitions for the TMR_TEVENT register */
+#define ETS2 (1<<25) /* External trigger 2 timestamp sampled */
+#define ETS1 (1<<24) /* External trigger 1 timestamp sampled */
+#define ALM2 (1<<17) /* Current time = alarm time register 2 */
+#define ALM1 (1<<16) /* Current time = alarm time register 1 */
+#define PP1 (1<<7) /* periodic pulse generated on FIPER1 */
+#define PP2 (1<<6) /* periodic pulse generated on FIPER2 */
+#define PP3 (1<<5) /* periodic pulse generated on FIPER3 */
+
+/* Bit definitions for the TMR_TEMASK register */
+#define ETS2EN (1<<25) /* External trigger 2 timestamp enable */
+#define ETS1EN (1<<24) /* External trigger 1 timestamp enable */
+#define ALM2EN (1<<17) /* Timer ALM2 event enable */
+#define ALM1EN (1<<16) /* Timer ALM1 event enable */
+#define PP1EN (1<<7) /* Periodic pulse event 1 enable */
+#define PP2EN (1<<6) /* Periodic pulse event 2 enable */
+
+/* Bit definitions for the TMR_PEVENT register */
+#define TXP2 (1<<9) /* PTP transmitted timestamp im TXTS2 */
+#define TXP1 (1<<8) /* PTP transmitted timestamp in TXTS1 */
+#define RXP (1<<0) /* PTP frame has been received */
+
+/* Bit definitions for the TMR_PEMASK register */
+#define TXP2EN (1<<9) /* Transmit PTP packet event 2 enable */
+#define TXP1EN (1<<8) /* Transmit PTP packet event 1 enable */
+#define RXPEN (1<<0) /* Receive PTP packet event enable */
+
+/* Bit definitions for the TMR_STAT register */
+#define STAT_VEC_SHIFT (0) /* Timer general purpose status vector */
+#define STAT_VEC_MASK (0x3f)
+
+/* Bit definitions for the TMR_PRSC register */
+#define PRSC_OCK_SHIFT (0) /* Output clock division/prescale factor. */
+#define PRSC_OCK_MASK (0xffff)
+
+
+#define DRIVER "gianfar_ptp"
+#define DEFAULT_CKSEL 1
+#define N_ALARM 1 /* first alarm is used internally to reset fipers */
+#define N_EXT_TS 2
+#define REG_SIZE sizeof(struct gianfar_ptp_registers)
+
+struct etsects {
+ struct gianfar_ptp_registers *regs;
+ spinlock_t lock; /* protects regs */
+ struct ptp_clock *clock;
+ struct ptp_clock_info caps;
+ struct resource *rsrc;
+ int irq;
+ u64 alarm_interval; /* for periodic alarm */
+ u64 alarm_value;
+ u32 tclk_period; /* nanoseconds */
+ u32 tmr_prsc;
+ u32 tmr_add;
+ u32 cksel;
+ u32 tmr_fiper1;
+ u32 tmr_fiper2;
+};
+
+/*
+ * Register access functions
+ */
+
+static u64 tmr_cnt_read(struct etsects *etsects)
+{
+ u64 ns;
+ u32 lo, hi;
+
+ lo = gfar_read(&etsects->regs->tmr_cnt_l);
+ hi = gfar_read(&etsects->regs->tmr_cnt_h);
+ ns = ((u64) hi) << 32;
+ ns |= lo;
+ return ns;
+}
+
+static void tmr_cnt_write(struct etsects *etsects, u64 ns)
+{
+ u32 hi = ns >> 32;
+ u32 lo = ns & 0xffffffff;
+
+ gfar_write(&etsects->regs->tmr_cnt_l, lo);
+ gfar_write(&etsects->regs->tmr_cnt_h, hi);
+}
+
+static void set_alarm(struct etsects *etsects)
+{
+ u64 ns;
+ u32 lo, hi;
+
+ ns = tmr_cnt_read(etsects) + 1500000000ULL;
+ ns = div_u64(ns, 1000000000UL) * 1000000000ULL;
+ ns -= etsects->tclk_period;
+ hi = ns >> 32;
+ lo = ns & 0xffffffff;
+ gfar_write(&etsects->regs->tmr_alarm1_l, lo);
+ gfar_write(&etsects->regs->tmr_alarm1_h, hi);
+}
+
+static void set_fipers(struct etsects *etsects)
+{
+ u32 tmr_ctrl = gfar_read(&etsects->regs->tmr_ctrl);
+
+ gfar_write(&etsects->regs->tmr_ctrl, tmr_ctrl & (~TE));
+ gfar_write(&etsects->regs->tmr_prsc, etsects->tmr_prsc);
+ gfar_write(&etsects->regs->tmr_fiper1, etsects->tmr_fiper1);
+ gfar_write(&etsects->regs->tmr_fiper2, etsects->tmr_fiper2);
+ set_alarm(etsects);
+ gfar_write(&etsects->regs->tmr_ctrl, tmr_ctrl|TE);
+}
+
+/*
+ * Interrupt service routine
+ */
+
+static irqreturn_t isr(int irq, void *priv)
+{
+ struct etsects *etsects = priv;
+ struct ptp_clock_event event;
+ u64 ns;
+ u32 ack = 0, lo, hi, mask, val;
+
+ val = gfar_read(&etsects->regs->tmr_tevent);
+
+ if (val & ETS1) {
+ ack |= ETS1;
+ hi = gfar_read(&etsects->regs->tmr_etts1_h);
+ lo = gfar_read(&etsects->regs->tmr_etts1_l);
+ event.type = PTP_CLOCK_EXTTS;
+ event.index = 0;
+ event.timestamp = ((u64) hi) << 32;
+ event.timestamp |= lo;
+ ptp_clock_event(etsects->clock, &event);
+ }
+
+ if (val & ETS2) {
+ ack |= ETS2;
+ hi = gfar_read(&etsects->regs->tmr_etts2_h);
+ lo = gfar_read(&etsects->regs->tmr_etts2_l);
+ event.type = PTP_CLOCK_EXTTS;
+ event.index = 1;
+ event.timestamp = ((u64) hi) << 32;
+ event.timestamp |= lo;
+ ptp_clock_event(etsects->clock, &event);
+ }
+
+ if (val & ALM2) {
+ ack |= ALM2;
+ if (etsects->alarm_value) {
+ event.type = PTP_CLOCK_ALARM;
+ event.index = 0;
+ event.timestamp = etsects->alarm_value;
+ ptp_clock_event(etsects->clock, &event);
+ }
+ if (etsects->alarm_interval) {
+ ns = etsects->alarm_value + etsects->alarm_interval;
+ hi = ns >> 32;
+ lo = ns & 0xffffffff;
+ spin_lock(&etsects->lock);
+ gfar_write(&etsects->regs->tmr_alarm2_l, lo);
+ gfar_write(&etsects->regs->tmr_alarm2_h, hi);
+ spin_unlock(&etsects->lock);
+ etsects->alarm_value = ns;
+ } else {
+ gfar_write(&etsects->regs->tmr_tevent, ALM2);
+ spin_lock(&etsects->lock);
+ mask = gfar_read(&etsects->regs->tmr_temask);
+ mask &= ~ALM2EN;
+ gfar_write(&etsects->regs->tmr_temask, mask);
+ spin_unlock(&etsects->lock);
+ etsects->alarm_value = 0;
+ etsects->alarm_interval = 0;
+ }
+ }
+
+ if (val & PP1) {
+ ack |= PP1;
+ event.type = PTP_CLOCK_PPS;
+ ptp_clock_event(etsects->clock, &event);
+ }
+
+ if (ack) {
+ gfar_write(&etsects->regs->tmr_tevent, ack);
+ return IRQ_HANDLED;
+ } else
+ return IRQ_NONE;
+}
+
+/*
+ * PTP clock operations
+ */
+
+static int ptp_gianfar_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
+{
+ u64 adj;
+ u32 diff, tmr_add;
+ int neg_adj = 0;
+ struct etsects *etsects = container_of(ptp, struct etsects, caps);
+
+ if (ppb < 0) {
+ neg_adj = 1;
+ ppb = -ppb;
+ }
+ tmr_add = etsects->tmr_add;
+ adj = tmr_add;
+ adj *= ppb;
+ diff = div_u64(adj, 1000000000ULL);
+
+ tmr_add = neg_adj ? tmr_add - diff : tmr_add + diff;
+
+ gfar_write(&etsects->regs->tmr_add, tmr_add);
+
+ return 0;
+}
+
+static int ptp_gianfar_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+ s64 now;
+ unsigned long flags;
+ struct etsects *etsects = container_of(ptp, struct etsects, caps);
+
+ spin_lock_irqsave(&etsects->lock, flags);
+
+ now = tmr_cnt_read(etsects);
+ now += delta;
+ tmr_cnt_write(etsects, now);
+
+ spin_unlock_irqrestore(&etsects->lock, flags);
+
+ set_fipers(etsects);
+
+ return 0;
+}
+
+static int ptp_gianfar_gettime(struct ptp_clock_info *ptp, struct timespec *ts)
+{
+ u64 ns;
+ u32 remainder;
+ unsigned long flags;
+ struct etsects *etsects = container_of(ptp, struct etsects, caps);
+
+ spin_lock_irqsave(&etsects->lock, flags);
+
+ ns = tmr_cnt_read(etsects);
+
+ spin_unlock_irqrestore(&etsects->lock, flags);
+
+ ts->tv_sec = div_u64_rem(ns, 1000000000, &remainder);
+ ts->tv_nsec = remainder;
+ return 0;
+}
+
+static int ptp_gianfar_settime(struct ptp_clock_info *ptp,
+ const struct timespec *ts)
+{
+ u64 ns;
+ unsigned long flags;
+ struct etsects *etsects = container_of(ptp, struct etsects, caps);
+
+ ns = ts->tv_sec * 1000000000ULL;
+ ns += ts->tv_nsec;
+
+ spin_lock_irqsave(&etsects->lock, flags);
+
+ tmr_cnt_write(etsects, ns);
+ set_fipers(etsects);
+
+ spin_unlock_irqrestore(&etsects->lock, flags);
+
+ return 0;
+}
+
+static int ptp_gianfar_enable(struct ptp_clock_info *ptp,
+ struct ptp_clock_request *rq, int on)
+{
+ struct etsects *etsects = container_of(ptp, struct etsects, caps);
+ unsigned long flags;
+ u32 bit, mask;
+
+ switch (rq->type) {
+ case PTP_CLK_REQ_EXTTS:
+ switch (rq->extts.index) {
+ case 0:
+ bit = ETS1EN;
+ break;
+ case 1:
+ bit = ETS2EN;
+ break;
+ default:
+ return -EINVAL;
+ }
+ spin_lock_irqsave(&etsects->lock, flags);
+ mask = gfar_read(&etsects->regs->tmr_temask);
+ if (on)
+ mask |= bit;
+ else
+ mask &= ~bit;
+ gfar_write(&etsects->regs->tmr_temask, mask);
+ spin_unlock_irqrestore(&etsects->lock, flags);
+ return 0;
+
+ case PTP_CLK_REQ_PPS:
+ spin_lock_irqsave(&etsects->lock, flags);
+ mask = gfar_read(&etsects->regs->tmr_temask);
+ if (on)
+ mask |= PP1EN;
+ else
+ mask &= ~PP1EN;
+ gfar_write(&etsects->regs->tmr_temask, mask);
+ spin_unlock_irqrestore(&etsects->lock, flags);
+ return 0;
+
+ default:
+ break;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static struct ptp_clock_info ptp_gianfar_caps = {
+ .owner = THIS_MODULE,
+ .name = "gianfar clock",
+ .max_adj = 512000,
+ .n_alarm = N_ALARM,
+ .n_ext_ts = N_EXT_TS,
+ .n_per_out = 0,
+ .pps = 1,
+ .adjfreq = ptp_gianfar_adjfreq,
+ .adjtime = ptp_gianfar_adjtime,
+ .gettime = ptp_gianfar_gettime,
+ .settime = ptp_gianfar_settime,
+ .enable = ptp_gianfar_enable,
+};
+
+/* OF device tree */
+
+static int get_of_u32(struct device_node *node, char *str, u32 *val)
+{
+ int plen;
+ const u32 *prop = of_get_property(node, str, &plen);
+
+ if (!prop || plen != sizeof(*prop))
+ return -1;
+ *val = *prop;
+ return 0;
+}
+
+static int gianfar_ptp_probe(struct platform_device *dev)
+{
+ struct device_node *node = dev->dev.of_node;
+ struct etsects *etsects;
+ struct timespec now;
+ int err = -ENOMEM;
+ u32 tmr_ctrl;
+
+ etsects = kzalloc(sizeof(*etsects), GFP_KERNEL);
+ if (!etsects)
+ goto no_memory;
+
+ err = -ENODEV;
+
+ etsects->caps = ptp_gianfar_caps;
+ etsects->cksel = DEFAULT_CKSEL;
+
+ if (get_of_u32(node, "fsl,tclk-period", &etsects->tclk_period) ||
+ get_of_u32(node, "fsl,tmr-prsc", &etsects->tmr_prsc) ||
+ get_of_u32(node, "fsl,tmr-add", &etsects->tmr_add) ||
+ get_of_u32(node, "fsl,tmr-fiper1", &etsects->tmr_fiper1) ||
+ get_of_u32(node, "fsl,tmr-fiper2", &etsects->tmr_fiper2) ||
+ get_of_u32(node, "fsl,max-adj", &etsects->caps.max_adj)) {
+ pr_err("device tree node missing required elements\n");
+ goto no_node;
+ }
+
+ etsects->irq = platform_get_irq(dev, 0);
+
+ if (etsects->irq == NO_IRQ) {
+ pr_err("irq not in device tree\n");
+ goto no_node;
+ }
+ if (request_irq(etsects->irq, isr, 0, DRIVER, etsects)) {
+ pr_err("request_irq failed\n");
+ goto no_node;
+ }
+
+ etsects->rsrc = platform_get_resource(dev, IORESOURCE_MEM, 0);
+ if (!etsects->rsrc) {
+ pr_err("no resource\n");
+ goto no_resource;
+ }
+ if (request_resource(&ioport_resource, etsects->rsrc)) {
+ pr_err("resource busy\n");
+ goto no_resource;
+ }
+
+ spin_lock_init(&etsects->lock);
+
+ etsects->regs = ioremap(etsects->rsrc->start,
+ 1 + etsects->rsrc->end - etsects->rsrc->start);
+ if (!etsects->regs) {
+ pr_err("ioremap ptp registers failed\n");
+ goto no_ioremap;
+ }
+ getnstimeofday(&now);
+ ptp_gianfar_settime(&etsects->caps, &now);
+
+ tmr_ctrl =
+ (etsects->tclk_period & TCLK_PERIOD_MASK) << TCLK_PERIOD_SHIFT |
+ (etsects->cksel & CKSEL_MASK) << CKSEL_SHIFT;
+
+ gfar_write(&etsects->regs->tmr_ctrl, tmr_ctrl);
+ gfar_write(&etsects->regs->tmr_add, etsects->tmr_add);
+ gfar_write(&etsects->regs->tmr_prsc, etsects->tmr_prsc);
+ gfar_write(&etsects->regs->tmr_fiper1, etsects->tmr_fiper1);
+ gfar_write(&etsects->regs->tmr_fiper2, etsects->tmr_fiper2);
+ set_alarm(etsects);
+ gfar_write(&etsects->regs->tmr_ctrl, tmr_ctrl|FS|RTPE|TE);
+
+ etsects->clock = ptp_clock_register(&etsects->caps);
+ if (IS_ERR(etsects->clock)) {
+ err = PTR_ERR(etsects->clock);
+ goto no_clock;
+ }
+
+ dev_set_drvdata(&dev->dev, etsects);
+
+ return 0;
+
+no_clock:
+no_ioremap:
+ release_resource(etsects->rsrc);
+no_resource:
+ free_irq(etsects->irq, etsects);
+no_node:
+ kfree(etsects);
+no_memory:
+ return err;
+}
+
+static int gianfar_ptp_remove(struct platform_device *dev)
+{
+ struct etsects *etsects = dev_get_drvdata(&dev->dev);
+
+ gfar_write(&etsects->regs->tmr_temask, 0);
+ gfar_write(&etsects->regs->tmr_ctrl, 0);
+
+ ptp_clock_unregister(etsects->clock);
+ iounmap(etsects->regs);
+ release_resource(etsects->rsrc);
+ free_irq(etsects->irq, etsects);
+ kfree(etsects);
+
+ return 0;
+}
+
+static struct of_device_id match_table[] = {
+ { .compatible = "fsl,etsec-ptp" },
+ {},
+};
+
+static struct platform_driver gianfar_ptp_driver = {
+ .driver = {
+ .name = "gianfar_ptp",
+ .of_match_table = match_table,
+ .owner = THIS_MODULE,
+ },
+ .probe = gianfar_ptp_probe,
+ .remove = gianfar_ptp_remove,
+};
+
+/* module operations */
+
+static int __init ptp_gianfar_init(void)
+{
+ return platform_driver_register(&gianfar_ptp_driver);
+}
+
+module_init(ptp_gianfar_init);
+
+static void __exit ptp_gianfar_exit(void)
+{
+ platform_driver_unregister(&gianfar_ptp_driver);
+}
+
+module_exit(ptp_gianfar_exit);
+
+MODULE_AUTHOR("Richard Cochran <richard.cochran@omicron.at>");
+MODULE_DESCRIPTION("PTP clock using the eTSEC");
+MODULE_LICENSE("GPL");
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 70d4bb1..12eb844 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -27,4 +27,17 @@ config PTP_1588_CLOCK
To compile this driver as a module, choose M here: the module
will be called ptp.
+config PTP_1588_CLOCK_GIANFAR
+ tristate "Freescale eTSEC as PTP clock"
+ depends on PTP_1588_CLOCK
+ depends on GIANFAR
+ help
+ This driver adds support for using the eTSEC as a PTP
+ clock. This clock is only useful if your PTP programs are
+ getting hardware time stamps on the PTP Ethernet packets
+ using the SO_TIMESTAMPING API.
+
+ To compile this driver as a module, choose M here: the module
+ will be called gianfar_ptp.
+
endmenu
--
1.7.0.4
^ permalink raw reply related
* [PATCH V12 1/4] ptp: Added a brand new class driver for ptp clocks.
From: Richard Cochran @ 2011-02-28 7:57 UTC (permalink / raw)
To: linux-kernel
Cc: linux-api, netdev, devicetree-discuss, linux-arm-kernel,
linuxppc-dev, Alan Cox, Arnd Bergmann, Christoph Lameter,
David Miller, John Stultz, Krzysztof Halasa, Peter Zijlstra,
Rodolfo Giometti, Thomas Gleixner, Benjamin Herrenschmidt,
Mike Frysinger, Paul Mackerras, Russell King
In-Reply-To: <cover.1298878618.git.richard.cochran@omicron.at>
This patch adds an infrastructure for hardware clocks that implement
IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a
registration method to particular hardware clock drivers. Each clock is
presented as a standard POSIX clock.
The ancillary clock features are exposed in two different ways, via
the sysfs and by a character device.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
Documentation/ABI/testing/sysfs-ptp | 98 ++++++++++
Documentation/ptp/ptp.txt | 89 +++++++++
Documentation/ptp/testptp.c | 352 +++++++++++++++++++++++++++++++++++
Documentation/ptp/testptp.mk | 33 ++++
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/ptp/Kconfig | 30 +++
drivers/ptp/Makefile | 6 +
drivers/ptp/ptp_chardev.c | 156 ++++++++++++++++
drivers/ptp/ptp_clock.c | 320 +++++++++++++++++++++++++++++++
drivers/ptp/ptp_private.h | 86 +++++++++
drivers/ptp/ptp_sysfs.c | 230 +++++++++++++++++++++++
include/linux/Kbuild | 1 +
include/linux/ptp_clock.h | 84 +++++++++
include/linux/ptp_clock_kernel.h | 139 ++++++++++++++
15 files changed, 1627 insertions(+), 0 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-ptp
create mode 100644 Documentation/ptp/ptp.txt
create mode 100644 Documentation/ptp/testptp.c
create mode 100644 Documentation/ptp/testptp.mk
create mode 100644 drivers/ptp/Kconfig
create mode 100644 drivers/ptp/Makefile
create mode 100644 drivers/ptp/ptp_chardev.c
create mode 100644 drivers/ptp/ptp_clock.c
create mode 100644 drivers/ptp/ptp_private.h
create mode 100644 drivers/ptp/ptp_sysfs.c
create mode 100644 include/linux/ptp_clock.h
create mode 100644 include/linux/ptp_clock_kernel.h
diff --git a/Documentation/ABI/testing/sysfs-ptp b/Documentation/ABI/testing/sysfs-ptp
new file mode 100644
index 0000000..d40d2b5
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-ptp
@@ -0,0 +1,98 @@
+What: /sys/class/ptp/
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This directory contains files and directories
+ providing a standardized interface to the ancillary
+ features of PTP hardware clocks.
+
+What: /sys/class/ptp/ptpN/
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This directory contains the attributes of the Nth PTP
+ hardware clock registered into the PTP class driver
+ subsystem.
+
+What: /sys/class/ptp/ptpN/clock_name
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This file contains the name of the PTP hardware clock
+ as a human readable string.
+
+What: /sys/class/ptp/ptpN/max_adjustment
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This file contains the PTP hardware clock's maximum
+ frequency adjustment value (a positive integer) in
+ parts per billion.
+
+What: /sys/class/ptp/ptpN/n_alarms
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This file contains the number of periodic or one shot
+ alarms offer by the PTP hardware clock.
+
+What: /sys/class/ptp/ptpN/n_external_timestamps
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This file contains the number of external timestamp
+ channels offered by the PTP hardware clock.
+
+What: /sys/class/ptp/ptpN/n_periodic_outputs
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This file contains the number of programmable periodic
+ output channels offered by the PTP hardware clock.
+
+What: /sys/class/ptp/ptpN/pps_avaiable
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This file indicates whether the PTP hardware clock
+ supports a Pulse Per Second to the host CPU. Reading
+ "1" means that the PPS is supported, while "0" means
+ not supported.
+
+What: /sys/class/ptp/ptpN/extts_enable
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This write-only file enables or disables external
+ timestamps. To enable external timestamps, write the
+ channel index followed by a "1" into the file.
+ To disable external timestamps, write the channel
+ index followed by a "0" into the file.
+
+What: /sys/class/ptp/ptpN/fifo
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This file provides timestamps on external events, in
+ the form of three integers: channel index, seconds,
+ and nanoseconds.
+
+What: /sys/class/ptp/ptpN/period
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This write-only file enables or disables periodic
+ outputs. To enable a periodic output, write five
+ integers into the file: channel index, start time
+ seconds, start time nanoseconds, period seconds, and
+ period nanoseconds. To disable a periodic output, set
+ all the seconds and nanoseconds values to zero.
+
+What: /sys/class/ptp/ptpN/pps_enable
+Date: September 2010
+Contact: Richard Cochran <richardcochran@gmail.com>
+Description:
+ This write-only file enables or disables delivery of
+ PPS events to the Linux PPS subsystem. To enable PPS
+ events, write a "1" into the file. To disable events,
+ write a "0" into the file.
diff --git a/Documentation/ptp/ptp.txt b/Documentation/ptp/ptp.txt
new file mode 100644
index 0000000..ae8fef8
--- /dev/null
+++ b/Documentation/ptp/ptp.txt
@@ -0,0 +1,89 @@
+
+* PTP hardware clock infrastructure for Linux
+
+ This patch set introduces support for IEEE 1588 PTP clocks in
+ Linux. Together with the SO_TIMESTAMPING socket options, this
+ presents a standardized method for developing PTP user space
+ programs, synchronizing Linux with external clocks, and using the
+ ancillary features of PTP hardware clocks.
+
+ A new class driver exports a kernel interface for specific clock
+ drivers and a user space interface. The infrastructure supports a
+ complete set of PTP hardware clock functionality.
+
+ + Basic clock operations
+ - Set time
+ - Get time
+ - Shift the clock by a given offset atomically
+ - Adjust clock frequency
+
+ + Ancillary clock features
+ - One short or periodic alarms, with signal delivery to user program
+ - Time stamp external events
+ - Period output signals configurable from user space
+ - Synchronization of the Linux system time via the PPS subsystem
+
+** PTP hardware clock kernel API
+
+ A PTP clock driver registers itself with the class driver. The
+ class driver handles all of the dealings with user space. The
+ author of a clock driver need only implement the details of
+ programming the clock hardware. The clock driver notifies the class
+ driver of asynchronous events (alarms and external time stamps) via
+ a simple message passing interface.
+
+ The class driver supports multiple PTP clock drivers. In normal use
+ cases, only one PTP clock is needed. However, for testing and
+ development, it can be useful to have more than one clock in a
+ single system, in order to allow performance comparisons.
+
+** PTP hardware clock user space API
+
+ The class driver also creates a character device for each
+ registered clock. User space can use an open file descriptor from
+ the character device as a POSIX clock id and may call
+ clock_gettime, clock_settime, and clock_adjtime. These calls
+ implement the basic clock operations.
+
+ User space programs may control the clock using standardized
+ ioctls. A program may query, enable, configure, and disable the
+ ancillary clock features. User space can receive time stamped
+ events via blocking read() and poll(). One shot and periodic
+ signals may be configured via the POSIX timer_settime() system
+ call.
+
+** Writing clock drivers
+
+ Clock drivers include include/linux/ptp_clock_kernel.h and register
+ themselves by presenting a 'struct ptp_clock_info' to the
+ registration method. Clock drivers must implement all of the
+ functions in the interface. If a clock does not offer a particular
+ ancillary feature, then the driver should just return -EOPNOTSUPP
+ from those functions.
+
+ Drivers must ensure that all of the methods in interface are
+ reentrant. Since most hardware implementations treat the time value
+ as a 64 bit integer accessed as two 32 bit registers, drivers
+ should use spin_lock_irqsave/spin_unlock_irqrestore to protect
+ against concurrent access. This locking cannot be accomplished in
+ class driver, since the lock may also be needed by the clock
+ driver's interrupt service routine.
+
+** Supported hardware
+
+ + Freescale eTSEC gianfar
+ - 2 Time stamp external triggers, programmable polarity (opt. interrupt)
+ - 2 Alarm registers (optional interrupt)
+ - 3 Periodic signals (optional interrupt)
+
+ + National DP83640
+ - 6 GPIOs programmable as inputs or outputs
+ - 6 GPIOs with dedicated functions (LED/JTAG/clock) can also be
+ used as general inputs or outputs
+ - GPIO inputs can time stamp external triggers
+ - GPIO outputs can produce periodic signals
+ - 1 interrupt pin
+
+ + Intel IXP465
+ - Auxiliary Slave/Master Mode Snapshot (optional interrupt)
+ - Target Time (optional interrupt)
diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
new file mode 100644
index 0000000..a1d2bf1
--- /dev/null
+++ b/Documentation/ptp/testptp.c
@@ -0,0 +1,352 @@
+/*
+ * PTP 1588 clock support - User space test program
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <errno.h>
+#include <fcntl.h>
+#include <math.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/timex.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <linux/ptp_clock.h>
+
+#define DEVICE "/dev/ptp0"
+
+#ifndef ADJ_SETOFFSET
+#define ADJ_SETOFFSET 0x0100
+#endif
+
+#ifndef CLOCK_INVALID
+#define CLOCK_INVALID -1
+#endif
+
+/* When glibc offers the syscall, this will go away. */
+#include <sys/syscall.h>
+static int clock_adjtime(clockid_t id, struct timex *tx)
+{
+ return syscall(__NR_clock_adjtime, id, tx);
+}
+
+static clockid_t get_clockid(int fd)
+{
+#define CLOCKFD 3
+#define FD_TO_CLOCKID(fd) ((~(clockid_t) (fd) << 3) | CLOCKFD)
+
+ return FD_TO_CLOCKID(fd);
+}
+
+static void handle_alarm(int s)
+{
+ printf("received signal %d\n", s);
+}
+
+static int install_handler(int signum, void (*handler)(int))
+{
+ struct sigaction action;
+ sigset_t mask;
+
+ /* Unblock the signal. */
+ sigemptyset(&mask);
+ sigaddset(&mask, signum);
+ sigprocmask(SIG_UNBLOCK, &mask, NULL);
+
+ /* Install the signal handler. */
+ action.sa_handler = handler;
+ action.sa_flags = 0;
+ sigemptyset(&action.sa_mask);
+ sigaction(signum, &action, NULL);
+
+ return 0;
+}
+
+static void usage(char *progname)
+{
+ fprintf(stderr,
+ "usage: %s [options]\n"
+ " -a val request a one-shot alarm after 'val' seconds\n"
+ " -A val request a periodic alarm every 'val' seconds\n"
+ " -c query the ptp clock's capabilities\n"
+ " -d name device to open\n"
+ " -e val read 'val' external time stamp events\n"
+ " -f val adjust the ptp clock frequency by 'val' ppb\n"
+ " -g get the ptp clock time\n"
+ " -h prints this message\n"
+ " -p val enable output with a period of 'val' nanoseconds\n"
+ " -P val enable or disable (val=1|0) the system clock PPS\n"
+ " -s set the ptp clock time from the system time\n"
+ " -t val shift the ptp clock time by 'val' seconds\n",
+ progname);
+}
+
+int main(int argc, char *argv[])
+{
+ struct ptp_clock_caps caps;
+ struct ptp_extts_event event;
+ struct ptp_extts_request extts_request;
+ struct ptp_perout_request perout_request;
+ struct timespec ts;
+ struct timex tx;
+
+ static timer_t timerid;
+ struct itimerspec timeout;
+ struct sigevent sigevent;
+
+ char *progname;
+ int c, cnt, fd;
+
+ char *device = DEVICE;
+ clockid_t clkid;
+ int adjfreq = 0x7fffffff;
+ int adjtime = 0;
+ int capabilities = 0;
+ int extts = 0;
+ int gettime = 0;
+ int oneshot = 0;
+ int periodic = 0;
+ int perout = -1;
+ int pps = -1;
+ int settime = 0;
+
+ progname = strrchr(argv[0], '/');
+ progname = progname ? 1+progname : argv[0];
+ while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghp:P:st:v"))) {
+ switch (c) {
+ case 'a':
+ oneshot = atoi(optarg);
+ break;
+ case 'A':
+ periodic = atoi(optarg);
+ break;
+ case 'c':
+ capabilities = 1;
+ break;
+ case 'd':
+ device = optarg;
+ break;
+ case 'e':
+ extts = atoi(optarg);
+ break;
+ case 'f':
+ adjfreq = atoi(optarg);
+ break;
+ case 'g':
+ gettime = 1;
+ break;
+ case 'p':
+ perout = atoi(optarg);
+ break;
+ case 'P':
+ pps = atoi(optarg);
+ break;
+ case 's':
+ settime = 1;
+ break;
+ case 't':
+ adjtime = atoi(optarg);
+ break;
+ case 'h':
+ usage(progname);
+ return 0;
+ case '?':
+ default:
+ usage(progname);
+ return -1;
+ }
+ }
+
+ fd = open(device, O_RDWR);
+ if (fd < 0) {
+ fprintf(stderr, "opening %s: %s\n", device, strerror(errno));
+ return -1;
+ }
+
+ clkid = get_clockid(fd);
+ if (CLOCK_INVALID == clkid) {
+ fprintf(stderr, "failed to read clock id\n");
+ return -1;
+ }
+
+ if (capabilities) {
+ if (ioctl(fd, PTP_CLOCK_GETCAPS, &caps)) {
+ perror("PTP_CLOCK_GETCAPS");
+ } else {
+ printf("capabilities:\n"
+ " %d maximum frequency adjustment (ppb)\n"
+ " %d programmable alarms\n"
+ " %d external time stamp channels\n"
+ " %d programmable periodic signals\n"
+ " %d pulse per second\n",
+ caps.max_adj,
+ caps.n_alarm,
+ caps.n_ext_ts,
+ caps.n_per_out,
+ caps.pps);
+ }
+ }
+
+ if (0x7fffffff != adjfreq) {
+ memset(&tx, 0, sizeof(tx));
+ tx.modes = ADJ_FREQUENCY;
+ tx.freq = (long) (adjfreq * 65.536);
+ if (clock_adjtime(clkid, &tx)) {
+ perror("clock_adjtime");
+ } else {
+ puts("frequency adjustment okay");
+ }
+ }
+
+ if (adjtime) {
+ memset(&tx, 0, sizeof(tx));
+ tx.modes = ADJ_SETOFFSET;
+ tx.time.tv_sec = adjtime;
+ tx.time.tv_usec = 0;
+ if (clock_adjtime(clkid, &tx) < 0) {
+ perror("clock_adjtime");
+ } else {
+ puts("time shift okay");
+ }
+ }
+
+ if (gettime) {
+ if (clock_gettime(clkid, &ts)) {
+ perror("clock_gettime");
+ } else {
+ printf("clock time: %ld.%09ld or %s",
+ ts.tv_sec, ts.tv_nsec, ctime(&ts.tv_sec));
+ }
+ }
+
+ if (settime) {
+ clock_gettime(CLOCK_REALTIME, &ts);
+ if (clock_settime(clkid, &ts)) {
+ perror("clock_settime");
+ } else {
+ puts("set time okay");
+ }
+ }
+
+ if (extts) {
+ memset(&extts_request, 0, sizeof(extts_request));
+ extts_request.index = 0;
+ extts_request.flags = PTP_ENABLE_FEATURE;
+ if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request)) {
+ perror("PTP_EXTTS_REQUEST");
+ extts = 0;
+ } else {
+ puts("external time stamp request okay");
+ }
+ for (; extts; extts--) {
+ cnt = read(fd, &event, sizeof(event));
+ if (cnt != sizeof(event)) {
+ perror("read");
+ break;
+ }
+ printf("event index %u at %lld.%09u\n", event.index,
+ event.t.sec, event.t.nsec);
+ fflush(stdout);
+ }
+ /* Disable the feature again. */
+ extts_request.flags = 0;
+ if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request)) {
+ perror("PTP_EXTTS_REQUEST");
+ }
+ }
+
+ if (oneshot) {
+ install_handler(SIGALRM, handle_alarm);
+ /* Create a timer. */
+ sigevent.sigev_notify = SIGEV_SIGNAL;
+ sigevent.sigev_signo = SIGALRM;
+ if (timer_create(clkid, &sigevent, &timerid)) {
+ perror("timer_create");
+ return -1;
+ }
+ /* Start the timer. */
+ memset(&timeout, 0, sizeof(timeout));
+ timeout.it_value.tv_sec = oneshot;
+ if (timer_settime(timerid, 0, &timeout, NULL)) {
+ perror("timer_settime");
+ return -1;
+ }
+ pause();
+ timer_delete(timerid);
+ }
+
+ if (periodic) {
+ install_handler(SIGALRM, handle_alarm);
+ /* Create a timer. */
+ sigevent.sigev_notify = SIGEV_SIGNAL;
+ sigevent.sigev_signo = SIGALRM;
+ if (timer_create(clkid, &sigevent, &timerid)) {
+ perror("timer_create");
+ return -1;
+ }
+ /* Start the timer. */
+ memset(&timeout, 0, sizeof(timeout));
+ timeout.it_interval.tv_sec = periodic;
+ timeout.it_value.tv_sec = periodic;
+ if (timer_settime(timerid, 0, &timeout, NULL)) {
+ perror("timer_settime");
+ return -1;
+ }
+ while (1) {
+ pause();
+ }
+ timer_delete(timerid);
+ }
+
+ if (perout >= 0) {
+ if (clock_gettime(clkid, &ts)) {
+ perror("clock_gettime");
+ return -1;
+ }
+ memset(&perout_request, 0, sizeof(perout_request));
+ perout_request.index = 0;
+ perout_request.start.sec = ts.tv_sec + 2;
+ perout_request.start.nsec = 0;
+ perout_request.period.sec = 0;
+ perout_request.period.nsec = perout;
+ if (ioctl(fd, PTP_PEROUT_REQUEST, &perout_request)) {
+ perror("PTP_PEROUT_REQUEST");
+ } else {
+ puts("periodic output request okay");
+ }
+ }
+
+ if (pps != -1) {
+ int enable = pps ? 1 : 0;
+ if (ioctl(fd, PTP_ENABLE_PPS, enable)) {
+ perror("PTP_ENABLE_PPS");
+ } else {
+ puts("pps for system time request okay");
+ }
+ }
+
+ close(fd);
+ return 0;
+}
diff --git a/Documentation/ptp/testptp.mk b/Documentation/ptp/testptp.mk
new file mode 100644
index 0000000..4ef2d97
--- /dev/null
+++ b/Documentation/ptp/testptp.mk
@@ -0,0 +1,33 @@
+# PTP 1588 clock support - User space test program
+#
+# Copyright (C) 2010 OMICRON electronics GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+CC = $(CROSS_COMPILE)gcc
+INC = -I$(KBUILD_OUTPUT)/usr/include
+CFLAGS = -Wall $(INC)
+LDLIBS = -lrt
+PROGS = testptp
+
+all: $(PROGS)
+
+testptp: testptp.o
+
+clean:
+ rm -f testptp.o
+
+distclean: clean
+ rm -f $(PROGS)
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 9bfb71f..82ddceb 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -54,6 +54,8 @@ source "drivers/spi/Kconfig"
source "drivers/pps/Kconfig"
+source "drivers/ptp/Kconfig"
+
source "drivers/gpio/Kconfig"
source "drivers/w1/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index b423bb1..f9b3087 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_I2O) += message/
obj-$(CONFIG_RTC_LIB) += rtc/
obj-y += i2c/ media/
obj-$(CONFIG_PPS) += pps/
+obj-$(CONFIG_PTP_1588_CLOCK) += ptp/
obj-$(CONFIG_W1) += w1/
obj-$(CONFIG_POWER_SUPPLY) += power/
obj-$(CONFIG_HWMON) += hwmon/
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
new file mode 100644
index 0000000..70d4bb1
--- /dev/null
+++ b/drivers/ptp/Kconfig
@@ -0,0 +1,30 @@
+#
+# PTP clock support configuration
+#
+
+menu "PTP clock support"
+
+comment "Enable Device Drivers -> PPS to see the PTP clock options."
+ depends on PPS=n
+
+config PTP_1588_CLOCK
+ tristate "PTP clock support"
+ depends on EXPERIMENTAL
+ depends on PPS
+ help
+ The IEEE 1588 standard defines a method to precisely
+ synchronize distributed clocks over Ethernet networks. The
+ standard defines a Precision Time Protocol (PTP), which can
+ be used to achieve synchronization within a few dozen
+ microseconds. In addition, with the help of special hardware
+ time stamping units, it can be possible to achieve
+ synchronization to within a few hundred nanoseconds.
+
+ This driver adds support for PTP clocks as character
+ devices. If you want to use a PTP clock, then you should
+ also enable at least one clock driver as well.
+
+ To compile this driver as a module, choose M here: the module
+ will be called ptp.
+
+endmenu
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
new file mode 100644
index 0000000..480e2af
--- /dev/null
+++ b/drivers/ptp/Makefile
@@ -0,0 +1,6 @@
+#
+# Makefile for PTP 1588 clock support.
+#
+
+ptp-y := ptp_clock.o ptp_chardev.o ptp_sysfs.o
+obj-$(CONFIG_PTP_1588_CLOCK) += ptp.o
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
new file mode 100644
index 0000000..c806192
--- /dev/null
+++ b/drivers/ptp/ptp_chardev.c
@@ -0,0 +1,156 @@
+/*
+ * PTP 1588 clock support - character device implementation.
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <linux/module.h>
+#include <linux/posix-clock.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+
+#include "ptp_private.h"
+
+int ptp_open(struct posix_clock *pc, fmode_t fmode)
+{
+ return 0;
+}
+
+long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
+{
+ struct ptp_clock_caps caps;
+ struct ptp_clock_request req;
+ struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+ struct ptp_clock_info *ops = ptp->info;
+ int enable, err = 0;
+
+ switch (cmd) {
+
+ case PTP_CLOCK_GETCAPS:
+ memset(&caps, 0, sizeof(caps));
+ caps.max_adj = ptp->info->max_adj;
+ caps.n_alarm = ptp->info->n_alarm;
+ caps.n_ext_ts = ptp->info->n_ext_ts;
+ caps.n_per_out = ptp->info->n_per_out;
+ caps.pps = ptp->info->pps;
+ err = copy_to_user((void __user *)arg, &caps, sizeof(caps));
+ break;
+
+ case PTP_EXTTS_REQUEST:
+ if (copy_from_user(&req.extts, (void __user *)arg,
+ sizeof(req.extts))) {
+ err = -EFAULT;
+ break;
+ }
+ if (req.extts.index >= ops->n_ext_ts) {
+ err = -EINVAL;
+ break;
+ }
+ req.type = PTP_CLK_REQ_EXTTS;
+ enable = req.extts.flags & PTP_ENABLE_FEATURE ? 1 : 0;
+ err = ops->enable(ops, &req, enable);
+ break;
+
+ case PTP_PEROUT_REQUEST:
+ if (copy_from_user(&req.perout, (void __user *)arg,
+ sizeof(req.perout))) {
+ err = -EFAULT;
+ break;
+ }
+ if (req.perout.index >= ops->n_per_out) {
+ err = -EINVAL;
+ break;
+ }
+ req.type = PTP_CLK_REQ_PEROUT;
+ enable = req.perout.period.sec || req.perout.period.nsec;
+ err = ops->enable(ops, &req, enable);
+ break;
+
+ case PTP_ENABLE_PPS:
+ if (!capable(CAP_SYS_TIME))
+ return -EPERM;
+ req.type = PTP_CLK_REQ_PPS;
+ enable = arg ? 1 : 0;
+ err = ops->enable(ops, &req, enable);
+ break;
+
+ default:
+ err = -ENOTTY;
+ break;
+ }
+ return err;
+}
+
+uint ptp_poll(struct posix_clock *pc, struct file *fp, poll_table *wait)
+{
+ struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+
+ poll_wait(fp, &ptp->tsev_wq, wait);
+
+ return queue_cnt(&ptp->tsevq) ? POLLIN : 0;
+}
+
+ssize_t ptp_read(struct posix_clock *pc,
+ uint rdflags, char __user *buf, size_t cnt)
+{
+ struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+ struct timestamp_event_queue *queue = &ptp->tsevq;
+ struct ptp_extts_event event[PTP_BUF_TIMESTAMPS];
+ unsigned long flags;
+ size_t qcnt, i;
+
+ if (cnt > sizeof(event))
+ cnt = sizeof(event);
+
+ cnt = cnt / sizeof(struct ptp_extts_event);
+
+ if (mutex_lock_interruptible(&ptp->tsevq_mux))
+ return -ERESTARTSYS;
+
+ if (wait_event_interruptible(ptp->tsev_wq, queue_cnt(queue))) {
+ mutex_unlock(&ptp->tsevq_mux);
+ return -ERESTARTSYS;
+ }
+
+ spin_lock_irqsave(&queue->lock, flags);
+
+ qcnt = queue_cnt(queue);
+
+ if (cnt > qcnt)
+ cnt = qcnt;
+
+ for (i = 0; i < cnt; i++) {
+ event[i] = queue->buf[queue->head];
+ queue->head = (queue->head + 1) % PTP_MAX_TIMESTAMPS;
+ }
+
+ spin_unlock_irqrestore(&queue->lock, flags);
+
+ cnt = cnt * sizeof(struct ptp_extts_event);
+
+ if (copy_to_user(buf, event, cnt)) {
+ mutex_unlock(&ptp->tsevq_mux);
+ return -EFAULT;
+ }
+
+ mutex_unlock(&ptp->tsevq_mux);
+ return cnt;
+}
+
+int ptp_release(struct posix_clock *pc)
+{
+ return 0;
+}
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
new file mode 100644
index 0000000..1786a74
--- /dev/null
+++ b/drivers/ptp/ptp_clock.c
@@ -0,0 +1,320 @@
+/*
+ * PTP 1588 clock support
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <linux/bitops.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/posix-clock.h>
+#include <linux/pps_kernel.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+
+#include "ptp_private.h"
+
+#define PTP_MAX_ALARMS 4
+#define PTP_MAX_CLOCKS 8
+#define PTP_PPS_DEFAULTS (PPS_CAPTUREASSERT | PPS_OFFSETASSERT)
+#define PTP_PPS_EVENT PPS_CAPTUREASSERT
+#define PTP_PPS_MODE (PTP_PPS_DEFAULTS | PPS_CANWAIT | PPS_TSFMT_TSPEC)
+
+/* private globals */
+
+static dev_t ptp_devt;
+static struct class *ptp_class;
+
+static DECLARE_BITMAP(ptp_clocks_map, PTP_MAX_CLOCKS);
+static DEFINE_MUTEX(ptp_clocks_mutex); /* protects 'ptp_clocks_map' */
+
+/* time stamp event queue operations */
+
+static inline int queue_free(struct timestamp_event_queue *q)
+{
+ return PTP_MAX_TIMESTAMPS - queue_cnt(q) - 1;
+}
+
+static void enqueue_external_timestamp(struct timestamp_event_queue *queue,
+ struct ptp_clock_event *src)
+{
+ struct ptp_extts_event *dst;
+ unsigned long flags;
+ u32 remainder;
+
+ dst = &queue->buf[queue->tail];
+
+ dst->index = src->index;
+ dst->t.sec = div_u64_rem(src->timestamp, 1000000000, &remainder);
+ dst->t.nsec = remainder;
+
+ spin_lock_irqsave(&queue->lock, flags);
+
+ if (!queue_free(queue))
+ queue->head = (queue->head + 1) % PTP_MAX_TIMESTAMPS;
+
+ queue->tail = (queue->tail + 1) % PTP_MAX_TIMESTAMPS;
+
+ spin_unlock_irqrestore(&queue->lock, flags);
+}
+
+/* posix clock implementation */
+
+static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp)
+{
+ return 1; /* always round timer functions to one nanosecond */
+}
+
+static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp)
+{
+ struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+ return ptp->info->settime(ptp->info, tp);
+}
+
+static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp)
+{
+ struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+ return ptp->info->gettime(ptp->info, tp);
+}
+
+static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx)
+{
+ struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+ struct ptp_clock_info *ops;
+ int err = -EOPNOTSUPP;
+
+ ops = ptp->info;
+
+ if (tx->modes & ADJ_SETOFFSET) {
+ struct timespec ts;
+ ktime_t kt;
+ s64 delta;
+
+ ts.tv_sec = tx->time.tv_sec;
+ ts.tv_nsec = tx->time.tv_usec;
+
+ if (!(tx->modes & ADJ_NANO))
+ ts.tv_nsec *= 1000;
+
+ if ((unsigned long) ts.tv_nsec >= NSEC_PER_SEC)
+ return -EINVAL;
+
+ kt = timespec_to_ktime(ts);
+ delta = ktime_to_ns(kt);
+ err = ops->adjtime(ops, delta);
+
+ } else if (tx->modes & ADJ_FREQUENCY) {
+ s64 ppb = 1 + tx->freq;
+ ppb *= 125;
+ ppb >>= 13;
+ err = ops->adjfreq(ops, (s32)ppb);
+ }
+
+ return err;
+}
+
+static struct posix_clock_operations ptp_clock_ops = {
+ .owner = THIS_MODULE,
+ .clock_adjtime = ptp_clock_adjtime,
+ .clock_gettime = ptp_clock_gettime,
+ .clock_getres = ptp_clock_getres,
+ .clock_settime = ptp_clock_settime,
+ .ioctl = ptp_ioctl,
+ .open = ptp_open,
+ .poll = ptp_poll,
+ .read = ptp_read,
+ .release = ptp_release,
+};
+
+static void delete_ptp_clock(struct posix_clock *pc)
+{
+ struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+
+ /* Release the clock's resources. */
+ if (ptp->info->pps)
+ pps_unregister_source(ptp->pps_source);
+ ptp_cleanup_sysfs(ptp);
+ device_destroy(ptp_class, ptp->devid);
+ mutex_destroy(&ptp->tsevq_mux);
+
+ /* Remove the clock from the bit map. */
+ mutex_lock(&ptp_clocks_mutex);
+ clear_bit(ptp->index, ptp_clocks_map);
+ mutex_unlock(&ptp_clocks_mutex);
+
+ kfree(ptp);
+}
+
+/* public interface */
+
+struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info)
+{
+ struct ptp_clock *ptp;
+ int err = 0, index, major = MAJOR(ptp_devt);
+
+ if (info->n_alarm > PTP_MAX_ALARMS)
+ return ERR_PTR(-EINVAL);
+
+ /* Find a free clock slot and reserve it. */
+ err = -EBUSY;
+ mutex_lock(&ptp_clocks_mutex);
+ index = find_first_zero_bit(ptp_clocks_map, PTP_MAX_CLOCKS);
+ if (index < PTP_MAX_CLOCKS)
+ set_bit(index, ptp_clocks_map);
+ else
+ goto no_slot;
+
+ /* Initialize a clock structure. */
+ err = -ENOMEM;
+ ptp = kzalloc(sizeof(struct ptp_clock), GFP_KERNEL);
+ if (ptp == NULL)
+ goto no_memory;
+
+ ptp->clock.ops = ptp_clock_ops;
+ ptp->clock.release = delete_ptp_clock;
+ ptp->info = info;
+ ptp->devid = MKDEV(major, index);
+ ptp->index = index;
+ spin_lock_init(&ptp->tsevq.lock);
+ mutex_init(&ptp->tsevq_mux);
+ init_waitqueue_head(&ptp->tsev_wq);
+
+ /* Create a new device in our class. */
+ ptp->dev = device_create(ptp_class, NULL, ptp->devid, ptp,
+ "ptp%d", ptp->index);
+ if (IS_ERR(ptp->dev))
+ goto no_device;
+
+ dev_set_drvdata(ptp->dev, ptp);
+
+ err = ptp_populate_sysfs(ptp);
+ if (err)
+ goto no_sysfs;
+
+ /* Register a new PPS source. */
+ if (info->pps) {
+ struct pps_source_info pps;
+ memset(&pps, 0, sizeof(pps));
+ snprintf(pps.name, PPS_MAX_NAME_LEN, "ptp%d", index);
+ pps.mode = PTP_PPS_MODE;
+ pps.owner = info->owner;
+ ptp->pps_source = pps_register_source(&pps, PTP_PPS_DEFAULTS);
+ if (!ptp->pps_source) {
+ pr_err("failed to register pps source\n");
+ goto no_pps;
+ }
+ }
+
+ /* Create a posix clock. */
+ err = posix_clock_register(&ptp->clock, ptp->devid);
+ if (err) {
+ pr_err("failed to create posix clock\n");
+ goto no_clock;
+ }
+
+ mutex_unlock(&ptp_clocks_mutex);
+ return ptp;
+
+no_clock:
+ if (ptp->info->pps)
+ pps_unregister_source(ptp->pps_source);
+no_pps:
+ ptp_cleanup_sysfs(ptp);
+no_sysfs:
+ device_destroy(ptp_class, ptp->devid);
+no_device:
+ mutex_destroy(&ptp->tsevq_mux);
+ kfree(ptp);
+no_memory:
+ clear_bit(index, ptp_clocks_map);
+no_slot:
+ mutex_unlock(&ptp_clocks_mutex);
+ return ERR_PTR(err);
+}
+EXPORT_SYMBOL(ptp_clock_register);
+
+int ptp_clock_unregister(struct ptp_clock *ptp)
+{
+ posix_clock_unregister(&ptp->clock);
+ return 0;
+}
+EXPORT_SYMBOL(ptp_clock_unregister);
+
+void ptp_clock_event(struct ptp_clock *ptp, struct ptp_clock_event *event)
+{
+ struct pps_event_time evt;
+
+ switch (event->type) {
+
+ case PTP_CLOCK_ALARM:
+ break;
+
+ case PTP_CLOCK_EXTTS:
+ enqueue_external_timestamp(&ptp->tsevq, event);
+ wake_up_interruptible(&ptp->tsev_wq);
+ break;
+
+ case PTP_CLOCK_PPS:
+ pps_get_ts(&evt);
+ pps_event(ptp->pps_source, &evt, PTP_PPS_EVENT, NULL);
+ break;
+ }
+}
+EXPORT_SYMBOL(ptp_clock_event);
+
+/* module operations */
+
+static void __exit ptp_exit(void)
+{
+ class_destroy(ptp_class);
+ unregister_chrdev_region(ptp_devt, PTP_MAX_CLOCKS);
+}
+
+static int __init ptp_init(void)
+{
+ int err;
+
+ ptp_class = class_create(THIS_MODULE, "ptp");
+ if (IS_ERR(ptp_class)) {
+ pr_err("ptp: failed to allocate class\n");
+ return PTR_ERR(ptp_class);
+ }
+
+ err = alloc_chrdev_region(&ptp_devt, 0, PTP_MAX_CLOCKS, "ptp");
+ if (err < 0) {
+ pr_err("ptp: failed to allocate device region\n");
+ goto no_region;
+ }
+
+ ptp_class->dev_attrs = ptp_dev_attrs;
+ pr_info("PTP clock support registered\n");
+ return 0;
+
+no_region:
+ class_destroy(ptp_class);
+ return err;
+}
+
+subsys_initcall(ptp_init);
+module_exit(ptp_exit);
+
+MODULE_AUTHOR("Richard Cochran <richard.cochran@omicron.at>");
+MODULE_DESCRIPTION("PTP clocks support");
+MODULE_LICENSE("GPL");
diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
new file mode 100644
index 0000000..2f76266
--- /dev/null
+++ b/drivers/ptp/ptp_private.h
@@ -0,0 +1,86 @@
+/*
+ * PTP 1588 clock support - private declarations for the core module.
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#ifndef _PTP_PRIVATE_H_
+#define _PTP_PRIVATE_H_
+
+#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/posix-clock.h>
+#include <linux/ptp_clock.h>
+#include <linux/ptp_clock_kernel.h>
+#include <linux/time.h>
+
+#define PTP_MAX_TIMESTAMPS 128
+#define PTP_BUF_TIMESTAMPS 30
+
+struct timestamp_event_queue {
+ struct ptp_extts_event buf[PTP_MAX_TIMESTAMPS];
+ int head;
+ int tail;
+ spinlock_t lock;
+};
+
+struct ptp_clock {
+ struct posix_clock clock;
+ struct device *dev;
+ struct ptp_clock_info *info;
+ dev_t devid;
+ int index; /* index into clocks.map */
+ struct pps_device *pps_source;
+ struct timestamp_event_queue tsevq; /* simple fifo for time stamps */
+ struct mutex tsevq_mux; /* one process at a time reading the fifo */
+ wait_queue_head_t tsev_wq;
+};
+
+static inline int queue_cnt(struct timestamp_event_queue *q)
+{
+ int cnt = q->tail - q->head;
+ return cnt < 0 ? PTP_MAX_TIMESTAMPS + cnt : cnt;
+}
+
+/*
+ * see ptp_chardev.c
+ */
+
+long ptp_ioctl(struct posix_clock *pc,
+ unsigned int cmd, unsigned long arg);
+
+int ptp_open(struct posix_clock *pc, fmode_t fmode);
+
+ssize_t ptp_read(struct posix_clock *pc,
+ uint flags, char __user *buf, size_t cnt);
+
+int ptp_release(struct posix_clock *pc);
+
+uint ptp_poll(struct posix_clock *pc,
+ struct file *fp, poll_table *wait);
+
+/*
+ * see ptp_sysfs.c
+ */
+
+extern struct device_attribute ptp_dev_attrs[];
+
+int ptp_cleanup_sysfs(struct ptp_clock *ptp);
+
+int ptp_populate_sysfs(struct ptp_clock *ptp);
+
+#endif
diff --git a/drivers/ptp/ptp_sysfs.c b/drivers/ptp/ptp_sysfs.c
new file mode 100644
index 0000000..2f93926
--- /dev/null
+++ b/drivers/ptp/ptp_sysfs.c
@@ -0,0 +1,230 @@
+/*
+ * PTP 1588 clock support - sysfs interface.
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include <linux/capability.h>
+
+#include "ptp_private.h"
+
+static ssize_t clock_name_show(struct device *dev,
+ struct device_attribute *attr, char *page)
+{
+ struct ptp_clock *ptp = dev_get_drvdata(dev);
+ return snprintf(page, PAGE_SIZE-1, "%s\n", ptp->info->name);
+}
+
+#define PTP_SHOW_INT(name) \
+static ssize_t name##_show(struct device *dev, \
+ struct device_attribute *attr, char *page) \
+{ \
+ struct ptp_clock *ptp = dev_get_drvdata(dev); \
+ return snprintf(page, PAGE_SIZE-1, "%d\n", ptp->info->name); \
+}
+
+PTP_SHOW_INT(max_adj);
+PTP_SHOW_INT(n_alarm);
+PTP_SHOW_INT(n_ext_ts);
+PTP_SHOW_INT(n_per_out);
+PTP_SHOW_INT(pps);
+
+#define PTP_RO_ATTR(_var, _name) { \
+ .attr = { .name = __stringify(_name), .mode = 0444 }, \
+ .show = _var##_show, \
+}
+
+struct device_attribute ptp_dev_attrs[] = {
+ PTP_RO_ATTR(clock_name, clock_name),
+ PTP_RO_ATTR(max_adj, max_adjustment),
+ PTP_RO_ATTR(n_alarm, n_alarms),
+ PTP_RO_ATTR(n_ext_ts, n_external_timestamps),
+ PTP_RO_ATTR(n_per_out, n_periodic_outputs),
+ PTP_RO_ATTR(pps, pps_available),
+ __ATTR_NULL,
+};
+
+static ssize_t extts_enable_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct ptp_clock *ptp = dev_get_drvdata(dev);
+ struct ptp_clock_info *ops = ptp->info;
+ struct ptp_clock_request req = { .type = PTP_CLK_REQ_EXTTS };
+ int cnt, enable;
+ int err = -EINVAL;
+
+ cnt = sscanf(buf, "%u %d", &req.extts.index, &enable);
+ if (cnt != 2)
+ goto out;
+ if (req.extts.index >= ops->n_ext_ts)
+ goto out;
+
+ err = ops->enable(ops, &req, enable ? 1 : 0);
+ if (err)
+ goto out;
+
+ return count;
+out:
+ return err;
+}
+
+static ssize_t extts_fifo_show(struct device *dev,
+ struct device_attribute *attr, char *page)
+{
+ struct ptp_clock *ptp = dev_get_drvdata(dev);
+ struct timestamp_event_queue *queue = &ptp->tsevq;
+ struct ptp_extts_event event;
+ unsigned long flags;
+ size_t qcnt;
+ int cnt = 0;
+
+ memset(&event, 0, sizeof(event));
+
+ if (mutex_lock_interruptible(&ptp->tsevq_mux))
+ return -ERESTARTSYS;
+
+ spin_lock_irqsave(&queue->lock, flags);
+ qcnt = queue_cnt(queue);
+ if (qcnt) {
+ event = queue->buf[queue->head];
+ queue->head = (queue->head + 1) % PTP_MAX_TIMESTAMPS;
+ }
+ spin_unlock_irqrestore(&queue->lock, flags);
+
+ if (!qcnt)
+ goto out;
+
+ cnt = snprintf(page, PAGE_SIZE, "%u %lld %u\n",
+ event.index, event.t.sec, event.t.nsec);
+out:
+ mutex_unlock(&ptp->tsevq_mux);
+ return cnt;
+}
+
+static ssize_t period_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct ptp_clock *ptp = dev_get_drvdata(dev);
+ struct ptp_clock_info *ops = ptp->info;
+ struct ptp_clock_request req = { .type = PTP_CLK_REQ_PEROUT };
+ int cnt, enable, err = -EINVAL;
+
+ cnt = sscanf(buf, "%u %lld %u %lld %u", &req.perout.index,
+ &req.perout.start.sec, &req.perout.start.nsec,
+ &req.perout.period.sec, &req.perout.period.nsec);
+ if (cnt != 5)
+ goto out;
+ if (req.perout.index >= ops->n_per_out)
+ goto out;
+
+ enable = req.perout.period.sec || req.perout.period.nsec;
+ err = ops->enable(ops, &req, enable);
+ if (err)
+ goto out;
+
+ return count;
+out:
+ return err;
+}
+
+static ssize_t pps_enable_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct ptp_clock *ptp = dev_get_drvdata(dev);
+ struct ptp_clock_info *ops = ptp->info;
+ struct ptp_clock_request req = { .type = PTP_CLK_REQ_PPS };
+ int cnt, enable;
+ int err = -EINVAL;
+
+ if (!capable(CAP_SYS_TIME))
+ return -EPERM;
+
+ cnt = sscanf(buf, "%d", &enable);
+ if (cnt != 1)
+ goto out;
+
+ err = ops->enable(ops, &req, enable ? 1 : 0);
+ if (err)
+ goto out;
+
+ return count;
+out:
+ return err;
+}
+
+static DEVICE_ATTR(extts_enable, 0220, NULL, extts_enable_store);
+static DEVICE_ATTR(fifo, 0444, extts_fifo_show, NULL);
+static DEVICE_ATTR(period, 0220, NULL, period_store);
+static DEVICE_ATTR(pps_enable, 0220, NULL, pps_enable_store);
+
+int ptp_cleanup_sysfs(struct ptp_clock *ptp)
+{
+ struct device *dev = ptp->dev;
+ struct ptp_clock_info *info = ptp->info;
+
+ if (info->n_ext_ts) {
+ device_remove_file(dev, &dev_attr_extts_enable);
+ device_remove_file(dev, &dev_attr_fifo);
+ }
+ if (info->n_per_out)
+ device_remove_file(dev, &dev_attr_period);
+
+ if (info->pps)
+ device_remove_file(dev, &dev_attr_pps_enable);
+
+ return 0;
+}
+
+int ptp_populate_sysfs(struct ptp_clock *ptp)
+{
+ struct device *dev = ptp->dev;
+ struct ptp_clock_info *info = ptp->info;
+ int err;
+
+ if (info->n_ext_ts) {
+ err = device_create_file(dev, &dev_attr_extts_enable);
+ if (err)
+ goto out1;
+ err = device_create_file(dev, &dev_attr_fifo);
+ if (err)
+ goto out2;
+ }
+ if (info->n_per_out) {
+ err = device_create_file(dev, &dev_attr_period);
+ if (err)
+ goto out3;
+ }
+ if (info->pps) {
+ err = device_create_file(dev, &dev_attr_pps_enable);
+ if (err)
+ goto out4;
+ }
+ return 0;
+out4:
+ if (info->n_per_out)
+ device_remove_file(dev, &dev_attr_period);
+out3:
+ if (info->n_ext_ts)
+ device_remove_file(dev, &dev_attr_fifo);
+out2:
+ if (info->n_ext_ts)
+ device_remove_file(dev, &dev_attr_extts_enable);
+out1:
+ return err;
+}
diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index b0ada6f..985ab2a 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -300,6 +300,7 @@ header-y += ppp-comp.h
header-y += ppp_defs.h
header-y += pps.h
header-y += prctl.h
+header-y += ptp_clock.h
header-y += ptrace.h
header-y += qnx4_fs.h
header-y += qnxtypes.h
diff --git a/include/linux/ptp_clock.h b/include/linux/ptp_clock.h
new file mode 100644
index 0000000..94e981f
--- /dev/null
+++ b/include/linux/ptp_clock.h
@@ -0,0 +1,84 @@
+/*
+ * PTP 1588 clock support - user space interface
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _PTP_CLOCK_H_
+#define _PTP_CLOCK_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/* PTP_xxx bits, for the flags field within the request structures. */
+#define PTP_ENABLE_FEATURE (1<<0)
+#define PTP_RISING_EDGE (1<<1)
+#define PTP_FALLING_EDGE (1<<2)
+
+/*
+ * struct ptp_clock_time - represents a time value
+ *
+ * The sign of the seconds field applies to the whole value. The
+ * nanoseconds field is always unsigned. The reserved field is
+ * included for sub-nanosecond resolution, should the demand for
+ * this ever appear.
+ *
+ */
+struct ptp_clock_time {
+ __s64 sec; /* seconds */
+ __u32 nsec; /* nanoseconds */
+ __u32 reserved;
+};
+
+struct ptp_clock_caps {
+ int max_adj; /* Maximum frequency adjustment in parts per billon. */
+ int n_alarm; /* Number of programmable alarms. */
+ int n_ext_ts; /* Number of external time stamp channels. */
+ int n_per_out; /* Number of programmable periodic signals. */
+ int pps; /* Whether the clock supports a PPS callback. */
+ int rsv[15]; /* Reserved for future use. */
+};
+
+struct ptp_extts_request {
+ unsigned int index; /* Which channel to configure. */
+ unsigned int flags; /* Bit field for PTP_xxx flags. */
+ unsigned int rsv[2]; /* Reserved for future use. */
+};
+
+struct ptp_perout_request {
+ struct ptp_clock_time start; /* Absolute start time. */
+ struct ptp_clock_time period; /* Desired period, zero means disable. */
+ unsigned int index; /* Which channel to configure. */
+ unsigned int flags; /* Reserved for future use. */
+ unsigned int rsv[4]; /* Reserved for future use. */
+};
+
+#define PTP_CLK_MAGIC '='
+
+#define PTP_CLOCK_GETCAPS _IOR(PTP_CLK_MAGIC, 1, struct ptp_clock_caps)
+#define PTP_EXTTS_REQUEST _IOW(PTP_CLK_MAGIC, 2, struct ptp_extts_request)
+#define PTP_PEROUT_REQUEST _IOW(PTP_CLK_MAGIC, 3, struct ptp_perout_request)
+#define PTP_ENABLE_PPS _IOW(PTP_CLK_MAGIC, 4, int)
+
+struct ptp_extts_event {
+ struct ptp_clock_time t; /* Time event occured. */
+ unsigned int index; /* Which channel produced the event. */
+ unsigned int flags; /* Reserved for future use. */
+ unsigned int rsv[2]; /* Reserved for future use. */
+};
+
+#endif
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
new file mode 100644
index 0000000..dd2e44f
--- /dev/null
+++ b/include/linux/ptp_clock_kernel.h
@@ -0,0 +1,139 @@
+/*
+ * PTP 1588 clock support
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _PTP_CLOCK_KERNEL_H_
+#define _PTP_CLOCK_KERNEL_H_
+
+#include <linux/ptp_clock.h>
+
+
+struct ptp_clock_request {
+ enum {
+ PTP_CLK_REQ_EXTTS,
+ PTP_CLK_REQ_PEROUT,
+ PTP_CLK_REQ_PPS,
+ } type;
+ union {
+ struct ptp_extts_request extts;
+ struct ptp_perout_request perout;
+ };
+};
+
+/**
+ * struct ptp_clock_info - decribes a PTP hardware clock
+ *
+ * @owner: The clock driver should set to THIS_MODULE.
+ * @name: A short name to identify the clock.
+ * @max_adj: The maximum possible frequency adjustment, in parts per billon.
+ * @n_alarm: The number of programmable alarms.
+ * @n_ext_ts: The number of external time stamp channels.
+ * @n_per_out: The number of programmable periodic signals.
+ * @pps: Indicates whether the clock supports a PPS callback.
+ *
+ * clock operations
+ *
+ * @adjfreq: Adjusts the frequency of the hardware clock.
+ * parameter delta: Desired period change in parts per billion.
+ *
+ * @adjtime: Shifts the time of the hardware clock.
+ * parameter delta: Desired change in nanoseconds.
+ *
+ * @gettime: Reads the current time from the hardware clock.
+ * parameter ts: Holds the result.
+ *
+ * @settime: Set the current time on the hardware clock.
+ * parameter ts: Time value to set.
+ *
+ * @enable: Request driver to enable or disable an ancillary feature.
+ * parameter request: Desired resource to enable or disable.
+ * parameter on: Caller passes one to enable or zero to disable.
+ *
+ * Drivers should embed their ptp_clock_info within a private
+ * structure, obtaining a reference to it using container_of().
+ *
+ * The callbacks must all return zero on success, non-zero otherwise.
+ */
+
+struct ptp_clock_info {
+ struct module *owner;
+ char name[16];
+ s32 max_adj;
+ int n_alarm;
+ int n_ext_ts;
+ int n_per_out;
+ int pps;
+ int (*adjfreq)(struct ptp_clock_info *ptp, s32 delta);
+ int (*adjtime)(struct ptp_clock_info *ptp, s64 delta);
+ int (*gettime)(struct ptp_clock_info *ptp, struct timespec *ts);
+ int (*settime)(struct ptp_clock_info *ptp, const struct timespec *ts);
+ int (*enable)(struct ptp_clock_info *ptp,
+ struct ptp_clock_request *request, int on);
+};
+
+struct ptp_clock;
+
+/**
+ * ptp_clock_register() - register a PTP hardware clock driver
+ *
+ * @info: Structure describing the new clock.
+ */
+
+extern struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info);
+
+/**
+ * ptp_clock_unregister() - unregister a PTP hardware clock driver
+ *
+ * @ptp: The clock to remove from service.
+ */
+
+extern int ptp_clock_unregister(struct ptp_clock *ptp);
+
+
+enum ptp_clock_events {
+ PTP_CLOCK_ALARM,
+ PTP_CLOCK_EXTTS,
+ PTP_CLOCK_PPS,
+};
+
+/**
+ * struct ptp_clock_event - decribes a PTP hardware clock event
+ *
+ * @type: One of the ptp_clock_events enumeration values.
+ * @index: Identifies the source of the event.
+ * @timestamp: When the event occured.
+ */
+
+struct ptp_clock_event {
+ int type;
+ int index;
+ u64 timestamp;
+};
+
+/**
+ * ptp_clock_event() - notify the PTP layer about an event
+ *
+ * @ptp: The clock obtained from ptp_clock_register().
+ * @event: Message structure describing the event.
+ */
+
+extern void ptp_clock_event(struct ptp_clock *ptp,
+ struct ptp_clock_event *event);
+
+#endif
--
1.7.0.4
^ permalink raw reply related
* [PATCH V12 0/4] ptp: IEEE 1588 hardware clock support
From: Richard Cochran @ 2011-02-28 7:57 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Alan Cox, Arnd Bergmann,
Christoph Lameter, David Miller, John Stultz, Krzysztof Halasa,
Peter Zijlstra, Rodolfo Giometti, Thomas Gleixner,
Benjamin Herrenschmidt, Mike Frysinger, Paul Mackerras,
Russell King
This really might be the last review of the PTP hardware clock patch
series. These patches apply on top of the timers/core branch in the
tip tree.
* Why all the CCs?
- One driver is for PowerPC, and adds device tree stuff.
- One driver is for the ARM Xscale IXP465.
* PHC Patch ChangeLog
** v12
*** gianfar_ptp
- fixed up device tree
- inlined the header file
- use platform_ calls instead of deprecated of_ calls
- removed static global single instance
- removed John Stultz's ack from this patch
* Previous Discussions
- [V11] http://lkml.org/lkml/2011/2/23/107
- [V10] http://lkml.org/lkml/2011/1/27/71
- [V9] http://lkml.org/lkml/2011/1/13/65
- [V8] http://lkml.org/lkml/2010/12/31/128
- [V7] http://lkml.org/lkml/2010/12/16/195
- [V6] http://lkml.org/lkml/2010/9/23/310
- [V5] http://lkml.org/lkml/2010/8/16/90
- Thomas Gleixner: Rework of the PTP support series core code
http://lkml.org/lkml/2011/2/1/137
- Dynamic clock devices [RFC]
http://lkml.org/lkml/2010/11/4/290
- POSIX clock tuning syscall with dynamic clock ids
http://lkml.org/lkml/2010/9/3/119
- POSIX clock tuning syscall with static clock ids
http://lkml.org/lkml/2010/8/23/49
- Versions 1-4 appeared on the netdev list.
Richard Cochran (4):
ptp: Added a brand new class driver for ptp clocks.
ptp: Added a clock that uses the eTSEC found on the MPC85xx.
ptp: Added a clock driver for the IXP46x.
ptp: Added a clock driver for the National Semiconductor PHYTER.
Documentation/ABI/testing/sysfs-ptp | 98 ++
.../devicetree/bindings/net/fsl-tsec-phy.txt | 54 +
Documentation/ptp/ptp.txt | 89 ++
Documentation/ptp/testptp.c | 352 +++++++
Documentation/ptp/testptp.mk | 33 +
arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h | 78 ++
arch/powerpc/boot/dts/mpc8313erdb.dts | 13 +
arch/powerpc/boot/dts/mpc8572ds.dts | 13 +
arch/powerpc/boot/dts/p2020ds.dts | 13 +
arch/powerpc/boot/dts/p2020rdb.dts | 13 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/net/Makefile | 1 +
drivers/net/arm/ixp4xx_eth.c | 192 ++++-
drivers/net/gianfar_ptp.c | 579 +++++++++++
drivers/net/phy/Makefile | 1 +
drivers/net/phy/dp83640.c | 1012 ++++++++++++++++++++
drivers/net/phy/dp83640_reg.h | 267 +++++
drivers/ptp/Kconfig | 75 ++
drivers/ptp/Makefile | 7 +
drivers/ptp/ptp_chardev.c | 156 +++
drivers/ptp/ptp_clock.c | 320 ++++++
drivers/ptp/ptp_ixp46x.c | 332 +++++++
drivers/ptp/ptp_private.h | 86 ++
drivers/ptp/ptp_sysfs.c | 230 +++++
include/linux/Kbuild | 1 +
include/linux/ptp_clock.h | 84 ++
include/linux/ptp_clock_kernel.h | 139 +++
28 files changed, 4238 insertions(+), 3 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-ptp
create mode 100644 Documentation/ptp/ptp.txt
create mode 100644 Documentation/ptp/testptp.c
create mode 100644 Documentation/ptp/testptp.mk
create mode 100644 arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h
create mode 100644 drivers/net/gianfar_ptp.c
create mode 100644 drivers/net/phy/dp83640.c
create mode 100644 drivers/net/phy/dp83640_reg.h
create mode 100644 drivers/ptp/Kconfig
create mode 100644 drivers/ptp/Makefile
create mode 100644 drivers/ptp/ptp_chardev.c
create mode 100644 drivers/ptp/ptp_clock.c
create mode 100644 drivers/ptp/ptp_ixp46x.c
create mode 100644 drivers/ptp/ptp_private.h
create mode 100644 drivers/ptp/ptp_sysfs.c
create mode 100644 include/linux/ptp_clock.h
create mode 100644 include/linux/ptp_clock_kernel.h
^ permalink raw reply
* Re: [Lxc-users] Bad checksums and lost packets with macvlan on dummy
From: Eric Dumazet @ 2011-02-28 7:45 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Andrian Nord, lxc-users, Patrick McHardy, Linux Netdev List
In-Reply-To: <4D6AB578.3010704@free.fr>
Le dimanche 27 février 2011 à 21:35 +0100, Daniel Lezcano a écrit :
> On 02/27/2011 08:50 PM, Eric Dumazet wrote:
> > Le dimanche 27 février 2011 à 16:14 +0100, Daniel Lezcano a écrit :
> >> On 02/23/2011 06:13 PM, Andrian Nord wrote:
> >>> On Mon, Feb 21, 2011 at 05:07:31PM +0100, Daniel Lezcano wrote:
> >>>> I Cc'ed the netdev mailing list and Patrick in case my analysis is wrong
> >>>> or incomplete.
> >>> I'm confirming, that this happens only when macvlan's are onto dummy net
> >>> device. In case of some physical interface under macvlan there is no lost
> >>> packages and no broken checksums.
> >> I did some tests with a 2.6.35 kernel version and it seems the checksum
> >> errors do not appear.
> >> I noticed there are some changes in the dummy setup function:
> >>
> >> dev->features |= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO;
> >> dev->features |= NETIF_F_NO_CSUM | NETIF_F_HIGHDMA | NETIF_F_LLTX;
> >>
> >>
> >> May be that was introduced by commit:
> >>
> >> commit 6d81f41c58c69ddde497e9e640ba5805aa26e78c
> >> Author: Eric Dumazet<eric.dumazet@gmail.com>
> >> Date: Mon Sep 27 20:50:33 2010 +0000
> >>
> >> dummy: percpu stats and lockless xmit
> >>
> >> Converts dummy network device driver to :
> >>
> >> - percpu stats
> >>
> >> - 64bit stats
> >>
> >> - lockless xmit (NETIF_F_LLTX)
> >>
> >> - performance features added (NETIF_F_SG | NETIF_F_FRAGLIST |
> >> NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA)
> >>
> >> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
> >> Signed-off-by: David S. Miller<davem@davemloft.net>
> >>
> >>
> >> Eric,
> >>
> >> Andrian is observing, with a couple of macvlan (in bridge mode) on top
> >> of a dummy interface, a lot of checksums error and packets drop.
> >> Each macvlan is in a different network namespace and the dummy interface
> >> is in the init_net.
> >>
> >> Any ideas ?
> > Not sure I understand... I thought dummy was dropping all frames
> > anyway ?
> >
> > static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev)
> > {
> > struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);
> >
> > u64_stats_update_begin(&dstats->syncp);
> > dstats->tx_packets++;
> > dstats->tx_bytes += skb->len;
> > u64_stats_update_end(&dstats->syncp);
> >
> > dev_kfree_skb(skb);
> > return NETDEV_TX_OK;
> > }
> >
> >
> > Maybe you could describe the setup ?
>
> Yes, it is very simple.
>
> There are two network namespaces.
>
> macvlan1 is in network namespace 1
> macvlan2 is in network namespace 2
>
> Both are in "bridge" mode, so they can communicate together.
> The lower device is dummy0 in the init network namespace.
>
> IMO the problem is coming from the macvlan driver:
>
> dev->features = lowerdev->features & MACVLAN_FEATURES
>
> As dummy0 has the offloading capabilities set on, the macvlan driver
> inherit these features.
>
> In the normal case, dummy0 is supposed to drop the packets. But with
> macvlan these packets are broadcasted to the other macvlan ports, so no
> checksum is computed when the packets are transmitted between macvlan1
> and macvlan2.
So where frames get bad checksums ?
In this "bridge" mode, I suspect the broadcast is done _before_ sending
frame to dummy, so maybe macvlan should not inherit from lowerdev in
this particular case ?
^ permalink raw reply
* Re: [PATCH 0/3] [RFC] Implement multiqueue (RX & TX) virtio-net
From: Michael S. Tsirkin @ 2011-02-28 7:35 UTC (permalink / raw)
To: Krishna Kumar
Cc: rusty, davem, eric.dumazet, arnd, netdev, horms, avi, anthony,
kvm
In-Reply-To: <20110228063427.24908.63561.sendpatchset@krkumar2.in.ibm.com>
On Mon, Feb 28, 2011 at 12:04:27PM +0530, Krishna Kumar wrote:
> This patch series is a continuation of an earlier one that
> implemented guest MQ TX functionality. This new patchset
> implements both RX and TX MQ. Qemu changes are not being
> included at this time solely to aid in easier review.
> Compatibility testing with old/new combinations of qemu/guest
> and vhost was done without any issues.
>
> Some early TCP/UDP test results are at the bottom of this
> post, I plan to submit more test results in the coming days.
>
> Please review and provide feedback on what can improve.
>
> Thanks!
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
To help testing, could you post the qemu changes separately please?
> ---
>
>
> Test configuration:
> Host: 8 Intel Xeon, 8 GB memory
> Guest: 4 cpus, 2 GB memory
>
> Each test case runs for 60 secs, results below are average over
> two runs. Bandwidth numbers are in gbps. I have used default
> netperf, and no testing/system tuning other than taskset each
> vhost to 0xf (cpus 0-3). Comparison is testing original kernel
> vs new kernel with #txqs=8 ("#" refers to number of netperf
> sessions).
>
> _______________________________________________________________________
> TCP: Guest -> Local Host (TCP_STREAM)
> # BW1 BW2 (%) SD1 SD2 (%) RSD1 RSD2 (%)
> _______________________________________________________________________
> 1 7190 7170 (-.2) 0 0 (0) 3 4 (33.3)
> 2 8774 11235 (28.0) 3 3 (0) 16 14 (-12.5)
> 4 9753 15195 (55.7) 17 21 (23.5) 65 59 (-9.2)
> 8 10224 18265 (78.6) 71 115 (61.9) 251 240 (-4.3)
> 16 10749 18123 (68.6) 277 456 (64.6) 985 925 (-6.0)
> 32 11133 17351 (55.8) 1132 1947 (71.9) 3935 3831 (-2.6)
> 64 11223 17115 (52.4) 4682 7836 (67.3) 15949 15373 (-3.6)
> 128 11269 16986 (50.7) 19783 31505 (59.2) 66799 61759 (-7.5)
> _______________________________________________________________________
> Summary: BW: 37.6 SD: 61.2 RSD: -6.5
>
>
> _______________________________________________________________________
> TCP: Local Host -> Guest (TCP_MAERTS)
> # BW1 BW2 (%) SD1 SD2 (%) RSD1 RSD2 (%)
> _______________________________________________________________________
> 1 11490 10870 (-5.3) 0 0 (0) 2 2 (0)
> 2 10612 10554 (-.5) 2 3 (50.0) 12 12 (0)
> 4 10047 14320 (42.5) 13 16 (23.0) 53 53 (0)
> 8 9273 15182 (63.7) 56 84 (50.0) 228 233 (2.1)
> 16 9291 15853 (70.6) 235 390 (65.9) 934 965 (3.3)
> 32 9382 15741 (67.7) 969 1823 (88.1) 3868 4037 (4.3)
> 64 9270 14585 (57.3) 3966 8836 (122.7) 15415 17818 (15.5)
> 128 8997 14678 (63.1) 17024 36649 (115.2) 64933 72677 (11.9)
> _______________________________________________________________________
> SUM: BW: 24.8 SD: 114.6 RSD: 12.1
>
> ______________________________________________________
> UDP: Local Host -> Guest (UDP_STREAM)
> # BW1 BW2 (%) SD1 SD2 (%)
> ______________________________________________________
> 1 17236 16585 (-3.7) 1 1 (0)
> 2 16795 22693 (35.1) 5 6 (20.0)
> 4 13390 21502 (60.5) 37 36 (-2.7)
> 8 13261 24361 (83.7) 163 175 (7.3)
> 16 12772 23796 (86.3) 692 826 (19.3)
> 32 12832 23880 (86.0) 2812 2871 (2.0)
> 64 12779 24293 (90.1) 11299 11237 (-.5)
> 128 13006 24857 (91.1) 44778 43884 (-1.9)
> ______________________________________________________
> Summary: BW: 37.1 SD: -1.2
^ permalink raw reply
* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: David Miller @ 2011-02-28 7:30 UTC (permalink / raw)
To: jpirko
Cc: fubar, nicolas.2p.debian, kaber, eric.dumazet, netdev, shemminger,
andy, anna.fischer
In-Reply-To: <20110228070732.GA2793@psychotron.redhat.com>
From: Jiri Pirko <jpirko@redhat.com>
Date: Mon, 28 Feb 2011 08:07:33 +0100
> Mon, Feb 28, 2011 at 12:22:08AM CET, davem@davemloft.net wrote:
>>From: Jiri Pirko <jpirko@redhat.com>
>>Date: Sun, 27 Feb 2011 13:58:17 +0100
>>
>>> That is very true. And given that af_packet uses orig_dev to obtain
>>> ifindex, it can be replaced by skb->skb_iif. That way we can get rid of
>>> orig_dev parameter for good.
>>
>>I would rather see a complete patch set submitting at a unit, thanks.
>>
>>I've already marked your V3 last night as "changes requested" in
>>patchwork for this reason.
>
> That's a pity. V3 is complete patch. The changes we talk about in
> discussion are just taking advantage of changes in patch V3. Would be in
> my opinion better to apply V3 now and followup with the rest after that.
Fair enough, I'll aply it now, thanks.
^ permalink raw reply
* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-28 7:07 UTC (permalink / raw)
To: David Miller
Cc: fubar, nicolas.2p.debian, kaber, eric.dumazet, netdev, shemminger,
andy, anna.fischer
In-Reply-To: <20110227.152208.102543719.davem@davemloft.net>
Mon, Feb 28, 2011 at 12:22:08AM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Sun, 27 Feb 2011 13:58:17 +0100
>
>> That is very true. And given that af_packet uses orig_dev to obtain
>> ifindex, it can be replaced by skb->skb_iif. That way we can get rid of
>> orig_dev parameter for good.
>
>I would rather see a complete patch set submitting at a unit, thanks.
>
>I've already marked your V3 last night as "changes requested" in
>patchwork for this reason.
That's a pity. V3 is complete patch. The changes we talk about in
discussion are just taking advantage of changes in patch V3. Would be in
my opinion better to apply V3 now and followup with the rest after that.
^ permalink raw reply
* Re:[ANNOUNCE]: Release of iptables-1.4.9.1
From: xuhainanjing @ 2011-02-28 6:47 UTC (permalink / raw)
To: Patrick McHardy
Cc: Netfilter Development Mailinglist, NetDev, netfilter-announce,
'netfilter@vger.kernel.org'
In-Reply-To: <4C5C136D.3040108@trash.net>
unsubscribe
At 2010-08-06 21:51:41,"Patrick McHardy" <kaber@trash.net> wrote:
>The netfilter coreteam presents:
>
> iptables version 1.4.9.1
>
>the iptables release for the 2.6.35 kernel. This release fixes
>a compilation problem with static linking in the 1.4.9 release.
>
>Version 1.4.9.1 can be obtained from:
>
>http://www.netfilter.org/projects/iptables/downloads.html
>ftp://ftp.netfilter.org/pub/iptables/
>git://git.netfilter.org/iptables.git
>
>On behalf of the Netfilter Core Team.
>Happy firewalling!
>
^ permalink raw reply
* [PATCH 3/3] [RFC] Changes for MQ vhost
From: Krishna Kumar @ 2011-02-28 6:34 UTC (permalink / raw)
To: rusty, davem, mst
Cc: eric.dumazet, arnd, netdev, horms, avi, anthony, kvm,
Krishna Kumar
In-Reply-To: <20110228063427.24908.63561.sendpatchset@krkumar2.in.ibm.com>
Changes for mq vhost.
vhost_net_open is changed to allocate a vhost_net and return.
The remaining initializations are delayed till SET_OWNER.
SET_OWNER is changed so that the argument is used to determine
how many txqs to use. Unmodified qemu's will pass NULL, so
this is recognized and handled as numtxqs=1.
The number of vhost threads is <= #txqs. Threads handle more
than one txq when #txqs is more than MAX_VHOST_THREADS (4).
The same thread handles both RX and TX - tested with tap/bridge
so far (TBD: needs some changes in macvtap driver to support
the same).
I had to convert space->tab in vhost_attach_cgroups* to avoid
checkpatch errors.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
drivers/vhost/net.c | 295 ++++++++++++++++++++++++++--------------
drivers/vhost/vhost.c | 225 +++++++++++++++++++-----------
drivers/vhost/vhost.h | 39 ++++-
3 files changed, 378 insertions(+), 181 deletions(-)
diff -ruNp org/drivers/vhost/vhost.h new/drivers/vhost/vhost.h
--- org/drivers/vhost/vhost.h 2011-02-08 09:05:09.000000000 +0530
+++ new/drivers/vhost/vhost.h 2011-02-28 11:48:06.000000000 +0530
@@ -35,11 +35,11 @@ struct vhost_poll {
wait_queue_t wait;
struct vhost_work work;
unsigned long mask;
- struct vhost_dev *dev;
+ struct vhost_virtqueue *vq; /* points back to vq */
};
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
- unsigned long mask, struct vhost_dev *dev);
+ unsigned long mask, struct vhost_virtqueue *vq);
void vhost_poll_start(struct vhost_poll *poll, struct file *file);
void vhost_poll_stop(struct vhost_poll *poll);
void vhost_poll_flush(struct vhost_poll *poll);
@@ -108,8 +108,14 @@ struct vhost_virtqueue {
/* Log write descriptors */
void __user *log_base;
struct vhost_log *log;
+ struct task_struct *worker; /* worker for this vq */
+ spinlock_t *work_lock; /* points to a dev->work_lock[] entry */
+ struct list_head *work_list; /* points to a dev->work_list[] entry */
+ int qnum; /* 0 for RX, 1 -> n-1 for TX */
};
+#define MAX_VHOST_THREADS 4
+
struct vhost_dev {
/* Readers use RCU to access memory table pointer
* log base pointer and features.
@@ -122,12 +128,33 @@ struct vhost_dev {
int nvqs;
struct file *log_file;
struct eventfd_ctx *log_ctx;
- spinlock_t work_lock;
- struct list_head work_list;
- struct task_struct *worker;
+ spinlock_t *work_lock[MAX_VHOST_THREADS];
+ struct list_head *work_list[MAX_VHOST_THREADS];
};
-long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs);
+/*
+ * Return maximum number of vhost threads needed to handle RX & TX.
+ * Upto MAX_VHOST_THREADS are started, and threads can be shared
+ * among different vq's if numtxqs > MAX_VHOST_THREADS.
+ */
+static inline int get_nvhosts(int nvqs)
+{
+ return min_t(int, nvqs / 2, MAX_VHOST_THREADS);
+}
+
+/*
+ * Get index of an existing thread that will handle this txq/rxq.
+ * The same thread handles both rx[index] and tx[index].
+ */
+static inline int vhost_get_thread_index(int index, int numtxqs, int nvhosts)
+{
+ return (index % numtxqs) % nvhosts;
+}
+
+int vhost_setup_vqs(struct vhost_dev *dev, int numtxqs);
+void vhost_free_vqs(struct vhost_dev *dev);
+long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs,
+ int nvhosts);
long vhost_dev_check_owner(struct vhost_dev *);
long vhost_dev_reset_owner(struct vhost_dev *);
void vhost_dev_cleanup(struct vhost_dev *);
diff -ruNp org/drivers/vhost/net.c new/drivers/vhost/net.c
--- org/drivers/vhost/net.c 2011-02-08 09:05:09.000000000 +0530
+++ new/drivers/vhost/net.c 2011-02-28 11:48:53.000000000 +0530
@@ -32,12 +32,6 @@
* Using this limit prevents one virtqueue from starving others. */
#define VHOST_NET_WEIGHT 0x80000
-enum {
- VHOST_NET_VQ_RX = 0,
- VHOST_NET_VQ_TX = 1,
- VHOST_NET_VQ_MAX = 2,
-};
-
enum vhost_net_poll_state {
VHOST_NET_POLL_DISABLED = 0,
VHOST_NET_POLL_STARTED = 1,
@@ -46,12 +40,13 @@ enum vhost_net_poll_state {
struct vhost_net {
struct vhost_dev dev;
- struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
- struct vhost_poll poll[VHOST_NET_VQ_MAX];
+ struct vhost_virtqueue *vqs;
+ struct vhost_poll *poll;
+ struct socket **socks;
/* Tells us whether we are polling a socket for TX.
* We only do this when socket buffer fills up.
* Protected by tx vq lock. */
- enum vhost_net_poll_state tx_poll_state;
+ enum vhost_net_poll_state *tx_poll_state;
};
/* Pop first len bytes from iovec. Return number of segments used. */
@@ -91,28 +86,28 @@ static void copy_iovec_hdr(const struct
}
/* Caller must have TX VQ lock */
-static void tx_poll_stop(struct vhost_net *net)
+static void tx_poll_stop(struct vhost_net *net, int qnum)
{
- if (likely(net->tx_poll_state != VHOST_NET_POLL_STARTED))
+ if (likely(net->tx_poll_state[qnum] != VHOST_NET_POLL_STARTED))
return;
- vhost_poll_stop(net->poll + VHOST_NET_VQ_TX);
- net->tx_poll_state = VHOST_NET_POLL_STOPPED;
+ vhost_poll_stop(&net->poll[qnum]);
+ net->tx_poll_state[qnum] = VHOST_NET_POLL_STOPPED;
}
/* Caller must have TX VQ lock */
-static void tx_poll_start(struct vhost_net *net, struct socket *sock)
+static void tx_poll_start(struct vhost_net *net, struct socket *sock, int qnum)
{
- if (unlikely(net->tx_poll_state != VHOST_NET_POLL_STOPPED))
+ if (unlikely(net->tx_poll_state[qnum] != VHOST_NET_POLL_STOPPED))
return;
- vhost_poll_start(net->poll + VHOST_NET_VQ_TX, sock->file);
- net->tx_poll_state = VHOST_NET_POLL_STARTED;
+ vhost_poll_start(&net->poll[qnum], sock->file);
+ net->tx_poll_state[qnum] = VHOST_NET_POLL_STARTED;
}
/* Expects to be always run from workqueue - which acts as
* read-size critical section for our kind of RCU. */
-static void handle_tx(struct vhost_net *net)
+static void handle_tx(struct vhost_virtqueue *vq)
{
- struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
+ struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev);
unsigned out, in, s;
int head;
struct msghdr msg = {
@@ -136,7 +131,7 @@ static void handle_tx(struct vhost_net *
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
if (wmem >= sock->sk->sk_sndbuf) {
mutex_lock(&vq->mutex);
- tx_poll_start(net, sock);
+ tx_poll_start(net, sock, vq->qnum);
mutex_unlock(&vq->mutex);
return;
}
@@ -145,7 +140,7 @@ static void handle_tx(struct vhost_net *
vhost_disable_notify(vq);
if (wmem < sock->sk->sk_sndbuf / 2)
- tx_poll_stop(net);
+ tx_poll_stop(net, vq->qnum);
hdr_size = vq->vhost_hlen;
for (;;) {
@@ -160,7 +155,7 @@ static void handle_tx(struct vhost_net *
if (head == vq->num) {
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
- tx_poll_start(net, sock);
+ tx_poll_start(net, sock, vq->qnum);
set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
break;
}
@@ -190,7 +185,7 @@ static void handle_tx(struct vhost_net *
err = sock->ops->sendmsg(NULL, sock, &msg, len);
if (unlikely(err < 0)) {
vhost_discard_vq_desc(vq, 1);
- tx_poll_start(net, sock);
+ tx_poll_start(net, sock, vq->qnum);
break;
}
if (err != len)
@@ -282,9 +277,9 @@ err:
/* Expects to be always run from workqueue - which acts as
* read-size critical section for our kind of RCU. */
-static void handle_rx_big(struct vhost_net *net)
+static void handle_rx_big(struct vhost_virtqueue *vq,
+ struct vhost_net *net)
{
- struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
unsigned out, in, log, s;
int head;
struct vhost_log *vq_log;
@@ -392,9 +387,9 @@ static void handle_rx_big(struct vhost_n
/* Expects to be always run from workqueue - which acts as
* read-size critical section for our kind of RCU. */
-static void handle_rx_mergeable(struct vhost_net *net)
+static void handle_rx_mergeable(struct vhost_virtqueue *vq,
+ struct vhost_net *net)
{
- struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
unsigned uninitialized_var(in), log;
struct vhost_log *vq_log;
struct msghdr msg = {
@@ -498,99 +493,196 @@ static void handle_rx_mergeable(struct v
mutex_unlock(&vq->mutex);
}
-static void handle_rx(struct vhost_net *net)
+static void handle_rx(struct vhost_virtqueue *vq)
{
+ struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev);
+
if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF))
- handle_rx_mergeable(net);
+ handle_rx_mergeable(vq, net);
else
- handle_rx_big(net);
+ handle_rx_big(vq, net);
}
static void handle_tx_kick(struct vhost_work *work)
{
struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
poll.work);
- struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev);
- handle_tx(net);
+ handle_tx(vq);
}
static void handle_rx_kick(struct vhost_work *work)
{
struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
poll.work);
- struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev);
- handle_rx(net);
+ handle_rx(vq);
}
static void handle_tx_net(struct vhost_work *work)
{
- struct vhost_net *net = container_of(work, struct vhost_net,
- poll[VHOST_NET_VQ_TX].work);
- handle_tx(net);
+ struct vhost_virtqueue *vq = container_of(work, struct vhost_poll,
+ work)->vq;
+
+ handle_tx(vq);
}
static void handle_rx_net(struct vhost_work *work)
{
- struct vhost_net *net = container_of(work, struct vhost_net,
- poll[VHOST_NET_VQ_RX].work);
- handle_rx(net);
+ struct vhost_virtqueue *vq = container_of(work, struct vhost_poll,
+ work)->vq;
+
+ handle_rx(vq);
}
-static int vhost_net_open(struct inode *inode, struct file *f)
+void vhost_free_vqs(struct vhost_dev *dev)
{
- struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
- struct vhost_dev *dev;
- int r;
+ struct vhost_net *n = container_of(dev, struct vhost_net, dev);
+ int i;
- if (!n)
- return -ENOMEM;
+ if (!n->vqs)
+ return;
- dev = &n->dev;
- n->vqs[VHOST_NET_VQ_TX].handle_kick = handle_tx_kick;
- n->vqs[VHOST_NET_VQ_RX].handle_kick = handle_rx_kick;
- r = vhost_dev_init(dev, n->vqs, VHOST_NET_VQ_MAX);
- if (r < 0) {
- kfree(n);
- return r;
+ /* vhost_net_open does kzalloc, so this loop will not panic */
+ for (i = 0; i < get_nvhosts(dev->nvqs); i++) {
+ kfree(dev->work_list[i]);
+ kfree(dev->work_lock[i]);
}
- vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
- vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
- n->tx_poll_state = VHOST_NET_POLL_DISABLED;
+ kfree(n->socks);
+ kfree(n->tx_poll_state);
+ kfree(n->poll);
+ kfree(n->vqs);
+
+ /*
+ * Reset so that vhost_net_release (which gets called when
+ * vhost_dev_set_owner() call fails) will notice.
+ */
+ n->vqs = NULL;
+}
- f->private_data = n;
+int vhost_setup_vqs(struct vhost_dev *dev, int numtxqs)
+{
+ struct vhost_net *n = container_of(dev, struct vhost_net, dev);
+ int nvhosts;
+ int i, nvqs;
+ int ret = -ENOMEM;
+
+ if (numtxqs < 0 || numtxqs > VIRTIO_MAX_TXQS)
+ return -EINVAL;
+
+ if (numtxqs == 0) {
+ /* Old qemu doesn't pass arguments to set_owner, use 1 txq */
+ numtxqs = 1;
+ }
+
+ /* Get total number of virtqueues */
+ nvqs = numtxqs * 2;
+
+ n->vqs = kmalloc(nvqs * sizeof(*n->vqs), GFP_KERNEL);
+ n->poll = kmalloc(nvqs * sizeof(*n->poll), GFP_KERNEL);
+ n->socks = kmalloc(nvqs * sizeof(*n->socks), GFP_KERNEL);
+ n->tx_poll_state = kmalloc(nvqs * sizeof(*n->tx_poll_state),
+ GFP_KERNEL);
+ if (!n->vqs || !n->poll || !n->socks || !n->tx_poll_state)
+ goto err;
+
+ /* Get total number of vhost threads */
+ nvhosts = get_nvhosts(nvqs);
+
+ for (i = 0; i < nvhosts; i++) {
+ dev->work_lock[i] = kmalloc(sizeof(*dev->work_lock[i]),
+ GFP_KERNEL);
+ dev->work_list[i] = kmalloc(sizeof(*dev->work_list[i]),
+ GFP_KERNEL);
+ if (!dev->work_lock[i] || !dev->work_list[i])
+ goto err;
+ if (((unsigned long) dev->work_lock[i] & (SMP_CACHE_BYTES - 1))
+ ||
+ ((unsigned long) dev->work_list[i] & SMP_CACHE_BYTES - 1))
+ pr_debug("Unaligned buffer @ %d - Lock: %p List: %p\n",
+ i, dev->work_lock[i], dev->work_list[i]);
+ }
+
+ /* 'numtxqs' RX followed by 'numtxqs' TX queues */
+ for (i = 0; i < numtxqs; i++)
+ n->vqs[i].handle_kick = handle_rx_kick;
+ for (; i < nvqs; i++)
+ n->vqs[i].handle_kick = handle_tx_kick;
+
+ ret = vhost_dev_init(dev, n->vqs, nvqs, nvhosts);
+ if (ret < 0)
+ goto err;
+
+ for (i = 0; i < numtxqs; i++)
+ vhost_poll_init(&n->poll[i], handle_rx_net, POLLIN, &n->vqs[i]);
+
+ for (; i < nvqs; i++) {
+ vhost_poll_init(&n->poll[i], handle_tx_net, POLLOUT,
+ &n->vqs[i]);
+ n->tx_poll_state[i] = VHOST_NET_POLL_DISABLED;
+ }
return 0;
+
+err:
+ /* Free all pointers that may have been allocated */
+ vhost_free_vqs(dev);
+
+ return ret;
+}
+
+static int vhost_net_open(struct inode *inode, struct file *f)
+{
+ struct vhost_net *n = kzalloc(sizeof *n, GFP_KERNEL);
+ int ret = -ENOMEM;
+
+ if (n) {
+ struct vhost_dev *dev = &n->dev;
+
+ f->private_data = n;
+ mutex_init(&dev->mutex);
+
+ /* Defer all other initialization till user does SET_OWNER */
+ ret = 0;
+ }
+
+ return ret;
}
static void vhost_net_disable_vq(struct vhost_net *n,
struct vhost_virtqueue *vq)
{
+ int qnum = vq->qnum;
+
if (!vq->private_data)
return;
- if (vq == n->vqs + VHOST_NET_VQ_TX) {
- tx_poll_stop(n);
- n->tx_poll_state = VHOST_NET_POLL_DISABLED;
- } else
- vhost_poll_stop(n->poll + VHOST_NET_VQ_RX);
+ if (qnum < n->dev.nvqs / 2) {
+ /* qnum is less than half, we are RX */
+ vhost_poll_stop(&n->poll[qnum]);
+ } else { /* otherwise we are TX */
+ tx_poll_stop(n, qnum);
+ n->tx_poll_state[qnum] = VHOST_NET_POLL_DISABLED;
+ }
}
static void vhost_net_enable_vq(struct vhost_net *n,
struct vhost_virtqueue *vq)
{
struct socket *sock;
+ int qnum = vq->qnum;
sock = rcu_dereference_protected(vq->private_data,
lockdep_is_held(&vq->mutex));
if (!sock)
return;
- if (vq == n->vqs + VHOST_NET_VQ_TX) {
- n->tx_poll_state = VHOST_NET_POLL_STOPPED;
- tx_poll_start(n, sock);
- } else
- vhost_poll_start(n->poll + VHOST_NET_VQ_RX, sock->file);
+ if (qnum < n->dev.nvqs / 2) {
+ /* qnum is less than half, we are RX */
+ vhost_poll_start(&n->poll[qnum], sock->file);
+ } else {
+ n->tx_poll_state[qnum] = VHOST_NET_POLL_STOPPED;
+ tx_poll_start(n, sock, qnum);
+ }
}
static struct socket *vhost_net_stop_vq(struct vhost_net *n,
@@ -607,11 +699,12 @@ static struct socket *vhost_net_stop_vq(
return sock;
}
-static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock,
- struct socket **rx_sock)
+static void vhost_net_stop(struct vhost_net *n)
{
- *tx_sock = vhost_net_stop_vq(n, n->vqs + VHOST_NET_VQ_TX);
- *rx_sock = vhost_net_stop_vq(n, n->vqs + VHOST_NET_VQ_RX);
+ int i;
+
+ for (i = n->dev.nvqs - 1; i >= 0; i--)
+ n->socks[i] = vhost_net_stop_vq(n, &n->vqs[i]);
}
static void vhost_net_flush_vq(struct vhost_net *n, int index)
@@ -622,26 +715,33 @@ static void vhost_net_flush_vq(struct vh
static void vhost_net_flush(struct vhost_net *n)
{
- vhost_net_flush_vq(n, VHOST_NET_VQ_TX);
- vhost_net_flush_vq(n, VHOST_NET_VQ_RX);
+ int i;
+
+ for (i = n->dev.nvqs - 1; i >= 0; i--)
+ vhost_net_flush_vq(n, i);
}
static int vhost_net_release(struct inode *inode, struct file *f)
{
struct vhost_net *n = f->private_data;
- struct socket *tx_sock;
- struct socket *rx_sock;
+ struct vhost_dev *dev = &n->dev;
+ int i;
- vhost_net_stop(n, &tx_sock, &rx_sock);
+ vhost_net_stop(n);
vhost_net_flush(n);
- vhost_dev_cleanup(&n->dev);
- if (tx_sock)
- fput(tx_sock->file);
- if (rx_sock)
- fput(rx_sock->file);
+ vhost_dev_cleanup(dev);
+
+ for (i = n->dev.nvqs - 1; i >= 0; i--)
+ if (n->socks[i])
+ fput(n->socks[i]->file);
+
/* We do an extra flush before freeing memory,
* since jobs can re-queue themselves. */
vhost_net_flush(n);
+
+ /* Free all old pointers */
+ vhost_free_vqs(dev);
+
kfree(n);
return 0;
}
@@ -719,7 +819,7 @@ static long vhost_net_set_backend(struct
if (r)
goto err;
- if (index >= VHOST_NET_VQ_MAX) {
+ if (index >= n->dev.nvqs) {
r = -ENOBUFS;
goto err;
}
@@ -741,9 +841,9 @@ static long vhost_net_set_backend(struct
oldsock = rcu_dereference_protected(vq->private_data,
lockdep_is_held(&vq->mutex));
if (sock != oldsock) {
- vhost_net_disable_vq(n, vq);
- rcu_assign_pointer(vq->private_data, sock);
- vhost_net_enable_vq(n, vq);
+ vhost_net_disable_vq(n, vq);
+ rcu_assign_pointer(vq->private_data, sock);
+ vhost_net_enable_vq(n, vq);
}
mutex_unlock(&vq->mutex);
@@ -765,22 +865,25 @@ err:
static long vhost_net_reset_owner(struct vhost_net *n)
{
- struct socket *tx_sock = NULL;
- struct socket *rx_sock = NULL;
long err;
+ int i;
+
mutex_lock(&n->dev.mutex);
err = vhost_dev_check_owner(&n->dev);
- if (err)
- goto done;
- vhost_net_stop(n, &tx_sock, &rx_sock);
+ if (err) {
+ mutex_unlock(&n->dev.mutex);
+ return err;
+ }
+
+ vhost_net_stop(n);
vhost_net_flush(n);
err = vhost_dev_reset_owner(&n->dev);
-done:
mutex_unlock(&n->dev.mutex);
- if (tx_sock)
- fput(tx_sock->file);
- if (rx_sock)
- fput(rx_sock->file);
+
+ for (i = n->dev.nvqs - 1; i >= 0; i--)
+ if (n->socks[i])
+ fput(n->socks[i]->file);
+
return err;
}
@@ -809,7 +912,7 @@ static int vhost_net_set_features(struct
}
n->dev.acked_features = features;
smp_wmb();
- for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+ for (i = 0; i < n->dev.nvqs; ++i) {
mutex_lock(&n->vqs[i].mutex);
n->vqs[i].vhost_hlen = vhost_hlen;
n->vqs[i].sock_hlen = sock_hlen;
diff -ruNp org/drivers/vhost/vhost.c new/drivers/vhost/vhost.c
--- org/drivers/vhost/vhost.c 2011-01-19 20:01:29.000000000 +0530
+++ new/drivers/vhost/vhost.c 2011-02-25 21:18:14.000000000 +0530
@@ -70,12 +70,12 @@ static void vhost_work_init(struct vhost
/* Init poll structure */
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
- unsigned long mask, struct vhost_dev *dev)
+ unsigned long mask, struct vhost_virtqueue *vq)
{
init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
init_poll_funcptr(&poll->table, vhost_poll_func);
poll->mask = mask;
- poll->dev = dev;
+ poll->vq = vq;
vhost_work_init(&poll->work, fn);
}
@@ -97,29 +97,30 @@ void vhost_poll_stop(struct vhost_poll *
remove_wait_queue(poll->wqh, &poll->wait);
}
-static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
- unsigned seq)
+static bool vhost_work_seq_done(struct vhost_virtqueue *vq,
+ struct vhost_work *work, unsigned seq)
{
int left;
- spin_lock_irq(&dev->work_lock);
+ spin_lock_irq(vq->work_lock);
left = seq - work->done_seq;
- spin_unlock_irq(&dev->work_lock);
+ spin_unlock_irq(vq->work_lock);
return left <= 0;
}
-static void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work)
+static void vhost_work_flush(struct vhost_virtqueue *vq,
+ struct vhost_work *work)
{
unsigned seq;
int flushing;
- spin_lock_irq(&dev->work_lock);
+ spin_lock_irq(vq->work_lock);
seq = work->queue_seq;
work->flushing++;
- spin_unlock_irq(&dev->work_lock);
- wait_event(work->done, vhost_work_seq_done(dev, work, seq));
- spin_lock_irq(&dev->work_lock);
+ spin_unlock_irq(vq->work_lock);
+ wait_event(work->done, vhost_work_seq_done(vq, work, seq));
+ spin_lock_irq(vq->work_lock);
flushing = --work->flushing;
- spin_unlock_irq(&dev->work_lock);
+ spin_unlock_irq(vq->work_lock);
BUG_ON(flushing < 0);
}
@@ -127,26 +128,26 @@ static void vhost_work_flush(struct vhos
* locks that are also used by the callback. */
void vhost_poll_flush(struct vhost_poll *poll)
{
- vhost_work_flush(poll->dev, &poll->work);
+ vhost_work_flush(poll->vq, &poll->work);
}
-static inline void vhost_work_queue(struct vhost_dev *dev,
+static inline void vhost_work_queue(struct vhost_virtqueue *vq,
struct vhost_work *work)
{
unsigned long flags;
- spin_lock_irqsave(&dev->work_lock, flags);
+ spin_lock_irqsave(vq->work_lock, flags);
if (list_empty(&work->node)) {
- list_add_tail(&work->node, &dev->work_list);
+ list_add_tail(&work->node, vq->work_list);
work->queue_seq++;
- wake_up_process(dev->worker);
+ wake_up_process(vq->worker);
}
- spin_unlock_irqrestore(&dev->work_lock, flags);
+ spin_unlock_irqrestore(vq->work_lock, flags);
}
void vhost_poll_queue(struct vhost_poll *poll)
{
- vhost_work_queue(poll->dev, &poll->work);
+ vhost_work_queue(poll->vq, &poll->work);
}
static void vhost_vq_reset(struct vhost_dev *dev,
@@ -176,17 +177,17 @@ static void vhost_vq_reset(struct vhost_
static int vhost_worker(void *data)
{
- struct vhost_dev *dev = data;
+ struct vhost_virtqueue *vq = data;
struct vhost_work *work = NULL;
unsigned uninitialized_var(seq);
- use_mm(dev->mm);
+ use_mm(vq->dev->mm);
for (;;) {
/* mb paired w/ kthread_stop */
set_current_state(TASK_INTERRUPTIBLE);
- spin_lock_irq(&dev->work_lock);
+ spin_lock_irq(vq->work_lock);
if (work) {
work->done_seq = seq;
if (work->flushing)
@@ -194,18 +195,18 @@ static int vhost_worker(void *data)
}
if (kthread_should_stop()) {
- spin_unlock_irq(&dev->work_lock);
+ spin_unlock_irq(vq->work_lock);
__set_current_state(TASK_RUNNING);
break;
}
- if (!list_empty(&dev->work_list)) {
- work = list_first_entry(&dev->work_list,
+ if (!list_empty(vq->work_list)) {
+ work = list_first_entry(vq->work_list,
struct vhost_work, node);
list_del_init(&work->node);
seq = work->queue_seq;
} else
work = NULL;
- spin_unlock_irq(&dev->work_lock);
+ spin_unlock_irq(vq->work_lock);
if (work) {
__set_current_state(TASK_RUNNING);
@@ -214,7 +215,7 @@ static int vhost_worker(void *data)
schedule();
}
- unuse_mm(dev->mm);
+ unuse_mm(vq->dev->mm);
return 0;
}
@@ -258,7 +259,7 @@ static void vhost_dev_free_iovecs(struct
}
long vhost_dev_init(struct vhost_dev *dev,
- struct vhost_virtqueue *vqs, int nvqs)
+ struct vhost_virtqueue *vqs, int nvqs, int nvhosts)
{
int i;
@@ -269,20 +270,34 @@ long vhost_dev_init(struct vhost_dev *de
dev->log_file = NULL;
dev->memory = NULL;
dev->mm = NULL;
- spin_lock_init(&dev->work_lock);
- INIT_LIST_HEAD(&dev->work_list);
- dev->worker = NULL;
for (i = 0; i < dev->nvqs; ++i) {
- dev->vqs[i].log = NULL;
- dev->vqs[i].indirect = NULL;
- dev->vqs[i].heads = NULL;
- dev->vqs[i].dev = dev;
- mutex_init(&dev->vqs[i].mutex);
+ struct vhost_virtqueue *vq = &dev->vqs[i];
+ int j;
+
+ if (i < nvhosts) {
+ spin_lock_init(dev->work_lock[i]);
+ INIT_LIST_HEAD(dev->work_list[i]);
+ j = i;
+ } else {
+ /* Share work with another thread */
+ j = vhost_get_thread_index(i, nvqs / 2, nvhosts);
+ }
+
+ vq->work_lock = dev->work_lock[j];
+ vq->work_list = dev->work_list[j];
+
+ vq->worker = NULL;
+ vq->qnum = i;
+ vq->log = NULL;
+ vq->indirect = NULL;
+ vq->heads = NULL;
+ vq->dev = dev;
+ mutex_init(&vq->mutex);
vhost_vq_reset(dev, dev->vqs + i);
- if (dev->vqs[i].handle_kick)
- vhost_poll_init(&dev->vqs[i].poll,
- dev->vqs[i].handle_kick, POLLIN, dev);
+ if (vq->handle_kick)
+ vhost_poll_init(&vq->poll,
+ vq->handle_kick, POLLIN, vq);
}
return 0;
@@ -296,65 +311,124 @@ long vhost_dev_check_owner(struct vhost_
}
struct vhost_attach_cgroups_struct {
- struct vhost_work work;
- struct task_struct *owner;
- int ret;
+ struct vhost_work work;
+ struct task_struct *owner;
+ int ret;
};
static void vhost_attach_cgroups_work(struct vhost_work *work)
{
- struct vhost_attach_cgroups_struct *s;
- s = container_of(work, struct vhost_attach_cgroups_struct, work);
- s->ret = cgroup_attach_task_all(s->owner, current);
+ struct vhost_attach_cgroups_struct *s;
+ s = container_of(work, struct vhost_attach_cgroups_struct, work);
+ s->ret = cgroup_attach_task_all(s->owner, current);
+}
+
+static int vhost_attach_cgroups(struct vhost_virtqueue *vq)
+{
+ struct vhost_attach_cgroups_struct attach;
+ attach.owner = current;
+ vhost_work_init(&attach.work, vhost_attach_cgroups_work);
+ vhost_work_queue(vq, &attach.work);
+ vhost_work_flush(vq, &attach.work);
+ return attach.ret;
+}
+
+static void __vhost_stop_workers(struct vhost_dev *dev, int nvhosts)
+{
+ int i;
+
+ for (i = 0; i < nvhosts; i++) {
+ WARN_ON(!list_empty(dev->vqs[i].work_list));
+ if (dev->vqs[i].worker) {
+ kthread_stop(dev->vqs[i].worker);
+ dev->vqs[i].worker = NULL;
+ }
+ }
+
+ if (dev->mm)
+ mmput(dev->mm);
+ dev->mm = NULL;
+}
+
+static void vhost_stop_workers(struct vhost_dev *dev)
+{
+ __vhost_stop_workers(dev, get_nvhosts(dev->nvqs));
}
-static int vhost_attach_cgroups(struct vhost_dev *dev)
-{
- struct vhost_attach_cgroups_struct attach;
- attach.owner = current;
- vhost_work_init(&attach.work, vhost_attach_cgroups_work);
- vhost_work_queue(dev, &attach.work);
- vhost_work_flush(dev, &attach.work);
- return attach.ret;
+static int vhost_start_workers(struct vhost_dev *dev)
+{
+ int nvhosts = get_nvhosts(dev->nvqs);
+ int i, err;
+
+ for (i = 0; i < dev->nvqs; ++i) {
+ struct vhost_virtqueue *vq = &dev->vqs[i];
+
+ if (i < nvhosts) {
+ /* Start a new thread */
+ vq->worker = kthread_create(vhost_worker, vq,
+ "vhost-%d-%d",
+ current->pid, i);
+ if (IS_ERR(vq->worker)) {
+ i--; /* no thread to clean at this index */
+ err = PTR_ERR(vq->worker);
+ goto err;
+ }
+
+ wake_up_process(vq->worker);
+
+ /* avoid contributing to loadavg */
+ err = vhost_attach_cgroups(vq);
+ if (err)
+ goto err;
+ } else {
+ /* Share work with an existing thread */
+ int j = vhost_get_thread_index(i, dev->nvqs / 2,
+ nvhosts);
+
+ vq->worker = dev->vqs[j].worker;
+ }
+ }
+ return 0;
+
+err:
+ __vhost_stop_workers(dev, i);
+ return err;
}
/* Caller should have device mutex */
-static long vhost_dev_set_owner(struct vhost_dev *dev)
+static long vhost_dev_set_owner(struct vhost_dev *dev, int numtxqs)
{
- struct task_struct *worker;
int err;
/* Is there an owner already? */
if (dev->mm) {
err = -EBUSY;
goto err_mm;
}
+
+ err = vhost_setup_vqs(dev, numtxqs);
+ if (err)
+ goto err_mm;
+
/* No owner, become one */
dev->mm = get_task_mm(current);
- worker = kthread_create(vhost_worker, dev, "vhost-%d", current->pid);
- if (IS_ERR(worker)) {
- err = PTR_ERR(worker);
- goto err_worker;
- }
-
- dev->worker = worker;
- wake_up_process(worker); /* avoid contributing to loadavg */
- err = vhost_attach_cgroups(dev);
+ /* Start threads */
+ err = vhost_start_workers(dev);
if (err)
- goto err_cgroup;
+ goto free_vqs;
err = vhost_dev_alloc_iovecs(dev);
if (err)
- goto err_cgroup;
+ goto clean_workers;
return 0;
-err_cgroup:
- kthread_stop(worker);
- dev->worker = NULL;
-err_worker:
+clean_workers:
+ vhost_stop_workers(dev);
+free_vqs:
if (dev->mm)
mmput(dev->mm);
dev->mm = NULL;
+ vhost_free_vqs(dev);
err_mm:
return err;
}
@@ -408,14 +482,7 @@ void vhost_dev_cleanup(struct vhost_dev
kfree(rcu_dereference_protected(dev->memory,
lockdep_is_held(&dev->mutex)));
RCU_INIT_POINTER(dev->memory, NULL);
- WARN_ON(!list_empty(&dev->work_list));
- if (dev->worker) {
- kthread_stop(dev->worker);
- dev->worker = NULL;
- }
- if (dev->mm)
- mmput(dev->mm);
- dev->mm = NULL;
+ vhost_stop_workers(dev);
}
static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
@@ -775,7 +842,7 @@ long vhost_dev_ioctl(struct vhost_dev *d
/* If you are not the owner, you can become one */
if (ioctl == VHOST_SET_OWNER) {
- r = vhost_dev_set_owner(d);
+ r = vhost_dev_set_owner(d, arg);
goto done;
}
^ permalink raw reply
* [PATCH 2/3] [RFC] Changes for MQ virtio-net
From: Krishna Kumar @ 2011-02-28 6:34 UTC (permalink / raw)
To: rusty, davem, mst
Cc: eric.dumazet, arnd, netdev, horms, avi, anthony, kvm,
Krishna Kumar
In-Reply-To: <20110228063427.24908.63561.sendpatchset@krkumar2.in.ibm.com>
Implement mq virtio-net driver.
Though struct virtio_net_config changes, it works with the old
qemu since the last element is not accessed unless qemu sets
VIRTIO_NET_F_NUMTXQS. Patch also adds a macro for the maximum
number of TX vq's (VIRTIO_MAX_TXQS) that the user can specify.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
drivers/net/virtio_net.c | 543 ++++++++++++++++++++++++-----------
include/linux/virtio_net.h | 6
2 files changed, 386 insertions(+), 163 deletions(-)
diff -ruNp org/include/linux/virtio_net.h new/include/linux/virtio_net.h
--- org/include/linux/virtio_net.h 2010-10-11 10:20:22.000000000 +0530
+++ new/include/linux/virtio_net.h 2011-02-25 16:24:15.000000000 +0530
@@ -7,6 +7,9 @@
#include <linux/virtio_config.h>
#include <linux/if_ether.h>
+/* Maximum number of individual RX/TX queues supported */
+#define VIRTIO_MAX_TXQS 16
+
/* The feature bitmap for virtio net */
#define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */
#define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */
@@ -26,6 +29,7 @@
#define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
#define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
#define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
+#define VIRTIO_NET_F_NUMTXQS 21 /* Device supports multiple TX queue */
#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
@@ -34,6 +38,8 @@ struct virtio_net_config {
__u8 mac[6];
/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
__u16 status;
+ /* number of RX/TX queues */
+ __u16 numtxqs;
} __attribute__((packed));
/* This is the first element of the scatter-gather list. If you don't
diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c 2011-02-21 17:55:42.000000000 +0530
+++ new/drivers/net/virtio_net.c 2011-02-25 16:23:41.000000000 +0530
@@ -40,31 +40,53 @@ module_param(gso, bool, 0444);
#define VIRTNET_SEND_COMMAND_SG_MAX 2
-struct virtnet_info {
- struct virtio_device *vdev;
- struct virtqueue *rvq, *svq, *cvq;
- struct net_device *dev;
+/* Internal representation of a send virtqueue */
+struct send_queue {
+ struct virtqueue *svq;
+
+ /* TX: fragments + linear part + virtio header */
+ struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+};
+
+/* Internal representation of a receive virtqueue */
+struct receive_queue {
+ /* Virtqueue associated with this receive_queue */
+ struct virtqueue *rvq;
+
+ /* Back pointer to the virtnet_info */
+ struct virtnet_info *vi;
+
struct napi_struct napi;
- unsigned int status;
/* Number of input buffers, and max we've ever had. */
unsigned int num, max;
- /* I like... big packets and I cannot lie! */
- bool big_packets;
-
- /* Host will merge rx buffers for big packets (shake it! shake it!) */
- bool mergeable_rx_bufs;
-
/* Work struct for refilling if we run low on memory. */
struct delayed_work refill;
/* Chain pages by the private ptr. */
struct page *pages;
- /* fragments + linear part + virtio header */
+ /* RX: fragments + linear part + virtio header */
struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
- struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+};
+
+struct virtnet_info {
+ struct send_queue **sq;
+ struct receive_queue **rq;
+
+ /* read-mostly variables */
+ int numtxqs ____cacheline_aligned_in_smp; /* # of rxqs/txqs */
+ struct virtio_device *vdev;
+ struct virtqueue *cvq;
+ struct net_device *dev;
+ unsigned int status;
+
+ /* I like... big packets and I cannot lie! */
+ bool big_packets;
+
+ /* Host will merge rx buffers for big packets (shake it! shake it!) */
+ bool mergeable_rx_bufs;
};
struct skb_vnet_hdr {
@@ -94,22 +116,22 @@ static inline struct skb_vnet_hdr *skb_v
* private is used to chain pages for big packets, put the whole
* most recent used list in the beginning for reuse
*/
-static void give_pages(struct virtnet_info *vi, struct page *page)
+static void give_pages(struct receive_queue *rq, struct page *page)
{
struct page *end;
/* Find end of list, sew whole thing into vi->pages. */
for (end = page; end->private; end = (struct page *)end->private);
- end->private = (unsigned long)vi->pages;
- vi->pages = page;
+ end->private = (unsigned long)rq->pages;
+ rq->pages = page;
}
-static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
+static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
{
- struct page *p = vi->pages;
+ struct page *p = rq->pages;
if (p) {
- vi->pages = (struct page *)p->private;
+ rq->pages = (struct page *)p->private;
/* clear private here, it is used to chain pages */
p->private = 0;
} else
@@ -117,15 +139,20 @@ static struct page *get_a_page(struct vi
return p;
}
+/*
+ * Note for 'qnum' below:
+ * first 'numtxqs' vqs are RX, next 'numtxqs' vqs are TX.
+ */
static void skb_xmit_done(struct virtqueue *svq)
{
struct virtnet_info *vi = svq->vdev->priv;
+ int qnum = svq->queue_index - vi->numtxqs;
/* Suppress further interrupts. */
virtqueue_disable_cb(svq);
/* We were probably waiting for more output buffers. */
- netif_wake_queue(vi->dev);
+ netif_wake_subqueue(vi->dev, qnum);
}
static void set_skb_frag(struct sk_buff *skb, struct page *page,
@@ -145,9 +172,10 @@ static void set_skb_frag(struct sk_buff
*len -= f->size;
}
-static struct sk_buff *page_to_skb(struct virtnet_info *vi,
+static struct sk_buff *page_to_skb(struct receive_queue *rq,
struct page *page, unsigned int len)
{
+ struct virtnet_info *vi = rq->vi;
struct sk_buff *skb;
struct skb_vnet_hdr *hdr;
unsigned int copy, hdr_len, offset;
@@ -190,12 +218,12 @@ static struct sk_buff *page_to_skb(struc
}
if (page)
- give_pages(vi, page);
+ give_pages(rq, page);
return skb;
}
-static int receive_mergeable(struct virtnet_info *vi, struct sk_buff *skb)
+static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb)
{
struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
struct page *page;
@@ -210,7 +238,7 @@ static int receive_mergeable(struct virt
return -EINVAL;
}
- page = virtqueue_get_buf(vi->rvq, &len);
+ page = virtqueue_get_buf(rq->rvq, &len);
if (!page) {
pr_debug("%s: rx error: %d buffers missing\n",
skb->dev->name, hdr->mhdr.num_buffers);
@@ -222,13 +250,14 @@ static int receive_mergeable(struct virt
set_skb_frag(skb, page, 0, &len);
- --vi->num;
+ --rq->num;
}
return 0;
}
-static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
+static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
{
+ struct net_device *dev = rq->vi->dev;
struct virtnet_info *vi = netdev_priv(dev);
struct sk_buff *skb;
struct page *page;
@@ -238,7 +267,7 @@ static void receive_buf(struct net_devic
pr_debug("%s: short packet %i\n", dev->name, len);
dev->stats.rx_length_errors++;
if (vi->mergeable_rx_bufs || vi->big_packets)
- give_pages(vi, buf);
+ give_pages(rq, buf);
else
dev_kfree_skb(buf);
return;
@@ -250,14 +279,14 @@ static void receive_buf(struct net_devic
skb_trim(skb, len);
} else {
page = buf;
- skb = page_to_skb(vi, page, len);
+ skb = page_to_skb(rq, page, len);
if (unlikely(!skb)) {
dev->stats.rx_dropped++;
- give_pages(vi, page);
+ give_pages(rq, page);
return;
}
if (vi->mergeable_rx_bufs)
- if (receive_mergeable(vi, skb)) {
+ if (receive_mergeable(rq, skb)) {
dev_kfree_skb(skb);
return;
}
@@ -323,184 +352,200 @@ frame_err:
dev_kfree_skb(skb);
}
-static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_small(struct receive_queue *rq, gfp_t gfp)
{
struct sk_buff *skb;
struct skb_vnet_hdr *hdr;
int err;
- skb = netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN);
+ skb = netdev_alloc_skb_ip_align(rq->vi->dev, MAX_PACKET_LEN);
if (unlikely(!skb))
return -ENOMEM;
skb_put(skb, MAX_PACKET_LEN);
hdr = skb_vnet_hdr(skb);
- sg_set_buf(vi->rx_sg, &hdr->hdr, sizeof hdr->hdr);
+ sg_set_buf(rq->rx_sg, &hdr->hdr, sizeof hdr->hdr);
- skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
+ skb_to_sgvec(skb, rq->rx_sg + 1, 0, skb->len);
- err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
+ err = virtqueue_add_buf_gfp(rq->rvq, rq->rx_sg, 0, 2, skb, gfp);
if (err < 0)
dev_kfree_skb(skb);
return err;
}
-static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_big(struct receive_queue *rq, gfp_t gfp)
{
struct page *first, *list = NULL;
char *p;
int i, err, offset;
- /* page in vi->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
+ /* page in rq->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
- first = get_a_page(vi, gfp);
+ first = get_a_page(rq, gfp);
if (!first) {
if (list)
- give_pages(vi, list);
+ give_pages(rq, list);
return -ENOMEM;
}
- sg_set_buf(&vi->rx_sg[i], page_address(first), PAGE_SIZE);
+ sg_set_buf(&rq->rx_sg[i], page_address(first), PAGE_SIZE);
/* chain new page in list head to match sg */
first->private = (unsigned long)list;
list = first;
}
- first = get_a_page(vi, gfp);
+ first = get_a_page(rq, gfp);
if (!first) {
- give_pages(vi, list);
+ give_pages(rq, list);
return -ENOMEM;
}
p = page_address(first);
- /* vi->rx_sg[0], vi->rx_sg[1] share the same page */
- /* a separated vi->rx_sg[0] for virtio_net_hdr only due to QEMU bug */
- sg_set_buf(&vi->rx_sg[0], p, sizeof(struct virtio_net_hdr));
+ /* rq->rx_sg[0], rq->rx_sg[1] share the same page */
+ /* a separated rq->rx_sg[0] for virtio_net_hdr only due to QEMU bug */
+ sg_set_buf(&rq->rx_sg[0], p, sizeof(struct virtio_net_hdr));
- /* vi->rx_sg[1] for data packet, from offset */
+ /* rq->rx_sg[1] for data packet, from offset */
offset = sizeof(struct padded_vnet_hdr);
- sg_set_buf(&vi->rx_sg[1], p + offset, PAGE_SIZE - offset);
+ sg_set_buf(&rq->rx_sg[1], p + offset, PAGE_SIZE - offset);
/* chain first in list head */
first->private = (unsigned long)list;
- err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
+ err = virtqueue_add_buf_gfp(rq->rvq, rq->rx_sg, 0, MAX_SKB_FRAGS + 2,
first, gfp);
if (err < 0)
- give_pages(vi, first);
+ give_pages(rq, first);
return err;
}
-static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
{
struct page *page;
int err;
- page = get_a_page(vi, gfp);
+ page = get_a_page(rq, gfp);
if (!page)
return -ENOMEM;
- sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
+ sg_init_one(rq->rx_sg, page_address(page), PAGE_SIZE);
- err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
+ err = virtqueue_add_buf_gfp(rq->rvq, rq->rx_sg, 0, 1, page, gfp);
if (err < 0)
- give_pages(vi, page);
+ give_pages(rq, page);
return err;
}
/* Returns false if we couldn't fill entirely (OOM). */
-static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
+static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp)
{
+ struct virtnet_info *vi = rq->vi;
int err;
bool oom;
do {
if (vi->mergeable_rx_bufs)
- err = add_recvbuf_mergeable(vi, gfp);
+ err = add_recvbuf_mergeable(rq, gfp);
else if (vi->big_packets)
- err = add_recvbuf_big(vi, gfp);
+ err = add_recvbuf_big(rq, gfp);
else
- err = add_recvbuf_small(vi, gfp);
+ err = add_recvbuf_small(rq, gfp);
oom = err == -ENOMEM;
if (err < 0)
break;
- ++vi->num;
+ ++rq->num;
} while (err > 0);
- if (unlikely(vi->num > vi->max))
- vi->max = vi->num;
- virtqueue_kick(vi->rvq);
+ if (unlikely(rq->num > rq->max))
+ rq->max = rq->num;
+ virtqueue_kick(rq->rvq);
return !oom;
}
static void skb_recv_done(struct virtqueue *rvq)
{
+ int qnum = rvq->queue_index;
struct virtnet_info *vi = rvq->vdev->priv;
+ struct napi_struct *napi = &vi->rq[qnum]->napi;
+
/* Schedule NAPI, Suppress further interrupts if successful. */
- if (napi_schedule_prep(&vi->napi)) {
+ if (napi_schedule_prep(napi)) {
virtqueue_disable_cb(rvq);
- __napi_schedule(&vi->napi);
+ __napi_schedule(napi);
}
}
-static void virtnet_napi_enable(struct virtnet_info *vi)
+static void virtnet_napi_enable(struct receive_queue *rq)
{
- napi_enable(&vi->napi);
+ napi_enable(&rq->napi);
/* If all buffers were filled by other side before we napi_enabled, we
* won't get another interrupt, so process any outstanding packets
* now. virtnet_poll wants re-enable the queue, so we disable here.
* We synchronize against interrupts via NAPI_STATE_SCHED */
- if (napi_schedule_prep(&vi->napi)) {
- virtqueue_disable_cb(vi->rvq);
- __napi_schedule(&vi->napi);
+ if (napi_schedule_prep(&rq->napi)) {
+ virtqueue_disable_cb(rq->rvq);
+ __napi_schedule(&rq->napi);
}
}
+static void virtnet_napi_enable_all_queues(struct virtnet_info *vi)
+{
+ int i;
+
+ for (i = 0; i < vi->numtxqs; i++)
+ virtnet_napi_enable(vi->rq[i]);
+}
+
static void refill_work(struct work_struct *work)
{
- struct virtnet_info *vi;
+ struct napi_struct *napi;
+ struct receive_queue *rq;
bool still_empty;
- vi = container_of(work, struct virtnet_info, refill.work);
- napi_disable(&vi->napi);
- still_empty = !try_fill_recv(vi, GFP_KERNEL);
- virtnet_napi_enable(vi);
+ rq = container_of(work, struct receive_queue, refill.work);
+ napi = &rq->napi;
+
+ napi_disable(napi);
+ still_empty = !try_fill_recv(rq, GFP_KERNEL);
+ virtnet_napi_enable(rq);
/* In theory, this can happen: if we don't get any buffers in
* we will *never* try to fill again. */
if (still_empty)
- schedule_delayed_work(&vi->refill, HZ/2);
+ schedule_delayed_work(&rq->refill, HZ/2);
}
static int virtnet_poll(struct napi_struct *napi, int budget)
{
- struct virtnet_info *vi = container_of(napi, struct virtnet_info, napi);
+ struct receive_queue *rq = container_of(napi, struct receive_queue,
+ napi);
void *buf;
unsigned int len, received = 0;
again:
while (received < budget &&
- (buf = virtqueue_get_buf(vi->rvq, &len)) != NULL) {
- receive_buf(vi->dev, buf, len);
- --vi->num;
+ (buf = virtqueue_get_buf(rq->rvq, &len)) != NULL) {
+ receive_buf(rq, buf, len);
+ --rq->num;
received++;
}
- if (vi->num < vi->max / 2) {
- if (!try_fill_recv(vi, GFP_ATOMIC))
- schedule_delayed_work(&vi->refill, 0);
+ if (rq->num < rq->max / 2) {
+ if (!try_fill_recv(rq, GFP_ATOMIC))
+ schedule_delayed_work(&rq->refill, 0);
}
/* Out of packets? */
if (received < budget) {
napi_complete(napi);
- if (unlikely(!virtqueue_enable_cb(vi->rvq)) &&
+ if (unlikely(!virtqueue_enable_cb(rq->rvq)) &&
napi_schedule_prep(napi)) {
- virtqueue_disable_cb(vi->rvq);
+ virtqueue_disable_cb(rq->rvq);
__napi_schedule(napi);
goto again;
}
@@ -509,12 +554,13 @@ again:
return received;
}
-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static unsigned int free_old_xmit_skbs(struct virtnet_info *vi,
+ struct virtqueue *svq)
{
struct sk_buff *skb;
unsigned int len, tot_sgs = 0;
- while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
+ while ((skb = virtqueue_get_buf(svq, &len)) != NULL) {
pr_debug("Sent skb %p\n", skb);
vi->dev->stats.tx_bytes += skb->len;
vi->dev->stats.tx_packets++;
@@ -524,7 +570,8 @@ static unsigned int free_old_xmit_skbs(s
return tot_sgs;
}
-static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
+static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb,
+ struct virtqueue *svq, struct scatterlist *tx_sg)
{
struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
@@ -562,12 +609,12 @@ static int xmit_skb(struct virtnet_info
/* Encode metadata header at front. */
if (vi->mergeable_rx_bufs)
- sg_set_buf(vi->tx_sg, &hdr->mhdr, sizeof hdr->mhdr);
+ sg_set_buf(tx_sg, &hdr->mhdr, sizeof hdr->mhdr);
else
- sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
+ sg_set_buf(tx_sg, &hdr->hdr, sizeof hdr->hdr);
- hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
- return virtqueue_add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
+ hdr->num_sg = skb_to_sgvec(skb, tx_sg + 1, 0, skb->len) + 1;
+ return virtqueue_add_buf(svq, tx_sg, hdr->num_sg,
0, skb);
}
@@ -575,31 +622,34 @@ static netdev_tx_t start_xmit(struct sk_
{
struct virtnet_info *vi = netdev_priv(dev);
int capacity;
+ int qnum = skb_get_queue_mapping(skb);
+ struct virtqueue *svq = vi->sq[qnum]->svq;
/* Free up any pending old buffers before queueing new ones. */
- free_old_xmit_skbs(vi);
+ free_old_xmit_skbs(vi, svq);
/* Try to transmit */
- capacity = xmit_skb(vi, skb);
+ capacity = xmit_skb(vi, skb, svq, vi->sq[qnum]->tx_sg);
/* This can happen with OOM and indirect buffers. */
if (unlikely(capacity < 0)) {
if (net_ratelimit()) {
if (likely(capacity == -ENOMEM)) {
dev_warn(&dev->dev,
- "TX queue failure: out of memory\n");
+ "TXQ (%d) failure: out of memory\n",
+ qnum);
} else {
dev->stats.tx_fifo_errors++;
dev_warn(&dev->dev,
- "Unexpected TX queue failure: %d\n",
- capacity);
+ "Unexpected TXQ (%d) failure: %d\n",
+ qnum, capacity);
}
}
dev->stats.tx_dropped++;
kfree_skb(skb);
return NETDEV_TX_OK;
}
- virtqueue_kick(vi->svq);
+ virtqueue_kick(svq);
/* Don't wait up for transmitted skbs to be freed. */
skb_orphan(skb);
@@ -608,13 +658,13 @@ static netdev_tx_t start_xmit(struct sk_
/* Apparently nice girls don't return TX_BUSY; stop the queue
* before it gets out of hand. Naturally, this wastes entries. */
if (capacity < 2+MAX_SKB_FRAGS) {
- netif_stop_queue(dev);
- if (unlikely(!virtqueue_enable_cb(vi->svq))) {
+ netif_stop_subqueue(dev, qnum);
+ if (unlikely(!virtqueue_enable_cb(svq))) {
/* More just got used, free them then recheck. */
- capacity += free_old_xmit_skbs(vi);
+ capacity += free_old_xmit_skbs(vi, svq);
if (capacity >= 2+MAX_SKB_FRAGS) {
- netif_start_queue(dev);
- virtqueue_disable_cb(vi->svq);
+ netif_start_subqueue(dev, qnum);
+ virtqueue_disable_cb(svq);
}
}
}
@@ -643,8 +693,10 @@ static int virtnet_set_mac_address(struc
static void virtnet_netpoll(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
+ int i;
- napi_schedule(&vi->napi);
+ for (i = 0; i < vi->numtxqs; i++)
+ napi_schedule(&vi->rq[i]->napi);
}
#endif
@@ -652,7 +704,7 @@ static int virtnet_open(struct net_devic
{
struct virtnet_info *vi = netdev_priv(dev);
- virtnet_napi_enable(vi);
+ virtnet_napi_enable_all_queues(vi);
return 0;
}
@@ -704,8 +756,10 @@ static bool virtnet_send_command(struct
static int virtnet_close(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
+ int i;
- napi_disable(&vi->napi);
+ for (i = 0; i < vi->numtxqs; i++)
+ napi_disable(&vi->rq[i]->napi);
return 0;
}
@@ -876,10 +930,10 @@ static void virtnet_update_status(struct
if (vi->status & VIRTIO_NET_S_LINK_UP) {
netif_carrier_on(vi->dev);
- netif_wake_queue(vi->dev);
+ netif_tx_wake_all_queues(vi->dev);
} else {
netif_carrier_off(vi->dev);
- netif_stop_queue(vi->dev);
+ netif_tx_stop_all_queues(vi->dev);
}
}
@@ -890,18 +944,173 @@ static void virtnet_config_changed(struc
virtnet_update_status(vi);
}
+static void free_receive_bufs(struct virtnet_info *vi)
+{
+ int i;
+
+ for (i = 0; i < vi->numtxqs; i++) {
+ BUG_ON(vi->rq[i] == NULL);
+ while (vi->rq[i]->pages)
+ __free_pages(get_a_page(vi->rq[i], GFP_KERNEL), 0);
+ }
+}
+
+static void free_rq_sq(struct virtnet_info *vi)
+{
+ int i;
+
+ if (vi->rq) {
+ for (i = 0; i < vi->numtxqs; i++)
+ kfree(vi->rq[i]);
+ kfree(vi->rq);
+ }
+
+ if (vi->sq) {
+ for (i = 0; i < vi->numtxqs; i++)
+ kfree(vi->sq[i]);
+ kfree(vi->sq);
+ }
+}
+
+#define MAX_DEVICE_NAME 16
+static int initialize_vqs(struct virtnet_info *vi, int numtxqs)
+{
+ vq_callback_t **callbacks;
+ struct virtqueue **vqs;
+ int i, err = -ENOMEM;
+ int totalvqs;
+ char **names;
+
+ /* Allocate receive queues */
+ vi->rq = kzalloc(numtxqs * sizeof(*vi->rq), GFP_KERNEL);
+ if (!vi->rq)
+ goto out;
+ for (i = 0; i < numtxqs; i++) {
+ vi->rq[i] = kzalloc(sizeof(*vi->rq[i]), GFP_KERNEL);
+ if (!vi->rq[i])
+ goto out;
+ }
+
+ /* Allocate send queues */
+ vi->sq = kzalloc(numtxqs * sizeof(*vi->sq), GFP_KERNEL);
+ if (!vi->sq)
+ goto out;
+ for (i = 0; i < numtxqs; i++) {
+ vi->sq[i] = kzalloc(sizeof(*vi->sq[i]), GFP_KERNEL);
+ if (!vi->sq[i])
+ goto out;
+ }
+
+ /* setup initial receive and send queue parameters */
+ for (i = 0; i < numtxqs; i++) {
+ vi->rq[i]->vi = vi;
+ vi->rq[i]->pages = NULL;
+ INIT_DELAYED_WORK(&vi->rq[i]->refill, refill_work);
+ netif_napi_add(vi->dev, &vi->rq[i]->napi, virtnet_poll,
+ napi_weight);
+
+ sg_init_table(vi->rq[i]->rx_sg, ARRAY_SIZE(vi->rq[i]->rx_sg));
+ sg_init_table(vi->sq[i]->tx_sg, ARRAY_SIZE(vi->sq[i]->tx_sg));
+ }
+
+ /*
+ * We expect 'numtxqs' RX virtqueue followed by 'numtxqs' TX
+ * virtqueues, and optionally one control virtqueue.
+ */
+ totalvqs = numtxqs * 2 +
+ virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ);
+
+ /* Allocate space for find_vqs parameters */
+ vqs = kmalloc(totalvqs * sizeof(*vqs), GFP_KERNEL);
+ callbacks = kmalloc(totalvqs * sizeof(*callbacks), GFP_KERNEL);
+ names = kzalloc(totalvqs * sizeof(*names), GFP_KERNEL);
+ if (!vqs || !callbacks || !names)
+ goto free_params;
+
+ /* Allocate/initialize parameters for recv virtqueues */
+ for (i = 0; i < numtxqs; i++) {
+ callbacks[i] = skb_recv_done;
+ names[i] = kmalloc(MAX_DEVICE_NAME * sizeof(*names[i]),
+ GFP_KERNEL);
+ if (!names[i])
+ goto free_params;
+ sprintf(names[i], "input.%d", i);
+ }
+
+ /* Allocate/initialize parameters for send virtqueues */
+ for (; i < numtxqs * 2; i++) {
+ callbacks[i] = skb_xmit_done;
+ names[i] = kmalloc(MAX_DEVICE_NAME * sizeof(*names[i]),
+ GFP_KERNEL);
+ if (!names[i])
+ goto free_params;
+ sprintf(names[i], "output.%d", i - numtxqs);
+ }
+
+ /* Parameters for control virtqueue, if any */
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
+ callbacks[i] = NULL;
+ names[i] = "control";
+ }
+
+ err = vi->vdev->config->find_vqs(vi->vdev, totalvqs, vqs, callbacks,
+ (const char **)names);
+ if (err)
+ goto free_params;
+
+ for (i = 0; i < numtxqs; i++) {
+ vi->rq[i]->rvq = vqs[i];
+ vi->sq[i]->svq = vqs[i + numtxqs];
+ }
+
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
+ vi->cvq = vqs[i + numtxqs];
+
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
+ vi->dev->features |= NETIF_F_HW_VLAN_FILTER;
+ }
+
+free_params:
+ if (names) {
+ for (i = 0; i < numtxqs * 2; i++)
+ kfree(names[i]);
+ kfree(names);
+ }
+
+ kfree(callbacks);
+ kfree(vqs);
+
+out:
+ if (err)
+ free_rq_sq(vi);
+
+ return err;
+}
+
static int virtnet_probe(struct virtio_device *vdev)
{
- int err;
+ int i, err;
+ u16 numtxqs;
struct net_device *dev;
struct virtnet_info *vi;
- struct virtqueue *vqs[3];
- vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
- const char *names[] = { "input", "output", "control" };
- int nvqs;
+
+ /*
+ * Find if host passed the number of RX/TX queues supported
+ * by the device
+ */
+ err = virtio_config_val(vdev, VIRTIO_NET_F_NUMTXQS,
+ offsetof(struct virtio_net_config, numtxqs),
+ &numtxqs);
+
+ /* We need atleast one txq */
+ if (err || !numtxqs)
+ numtxqs = 1;
+
+ if (numtxqs > VIRTIO_MAX_TXQS)
+ return -EINVAL;
/* Allocate ourselves a network device with room for our info */
- dev = alloc_etherdev(sizeof(struct virtnet_info));
+ dev = alloc_etherdev_mq(sizeof(struct virtnet_info), numtxqs);
if (!dev)
return -ENOMEM;
@@ -940,14 +1149,10 @@ static int virtnet_probe(struct virtio_d
/* Set up our device-specific information */
vi = netdev_priv(dev);
- netif_napi_add(dev, &vi->napi, virtnet_poll, napi_weight);
vi->dev = dev;
vi->vdev = vdev;
vdev->priv = vi;
- vi->pages = NULL;
- INIT_DELAYED_WORK(&vi->refill, refill_work);
- sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
- sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
+ vi->numtxqs = numtxqs;
/* If we can receive ANY GSO packets, we must allocate large ones. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
@@ -958,23 +1163,10 @@ static int virtnet_probe(struct virtio_d
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi->mergeable_rx_bufs = true;
- /* We expect two virtqueues, receive then send,
- * and optionally control. */
- nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
-
- err = vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names);
+ /* Initialize our rx/tx queue parameters, and invoke find_vqs */
+ err = initialize_vqs(vi, numtxqs);
if (err)
- goto free;
-
- vi->rvq = vqs[0];
- vi->svq = vqs[1];
-
- if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
- vi->cvq = vqs[2];
-
- if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
- dev->features |= NETIF_F_HW_VLAN_FILTER;
- }
+ goto free_netdev;
err = register_netdev(dev);
if (err) {
@@ -983,14 +1175,19 @@ static int virtnet_probe(struct virtio_d
}
/* Last of all, set up some receive buffers. */
- try_fill_recv(vi, GFP_KERNEL);
+ for (i = 0; i < numtxqs; i++) {
+ try_fill_recv(vi->rq[i], GFP_KERNEL);
- /* If we didn't even get one input buffer, we're useless. */
- if (vi->num == 0) {
- err = -ENOMEM;
- goto unregister;
+ /* If we didn't even get one input buffer, we're useless. */
+ if (vi->rq[i]->num == 0) {
+ err = -ENOMEM;
+ goto free_recv_bufs;
+ }
}
+ dev_info(&dev->dev, "(virtio-net) Allocated %d RX & TX vq's\n",
+ numtxqs);
+
/* Assume link up if device can't report link status,
otherwise get link status from config. */
if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
@@ -1001,15 +1198,21 @@ static int virtnet_probe(struct virtio_d
netif_carrier_on(dev);
}
- pr_debug("virtnet: registered device %s\n", dev->name);
+ pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
+ dev->name, numtxqs);
return 0;
-unregister:
+free_recv_bufs:
+ free_receive_bufs(vi);
unregister_netdev(dev);
- cancel_delayed_work_sync(&vi->refill);
+
free_vqs:
+ for (i = 0; i < numtxqs; i++)
+ cancel_delayed_work_sync(&vi->rq[i]->refill);
vdev->config->del_vqs(vdev);
-free:
+ free_rq_sq(vi);
+
+free_netdev:
free_netdev(dev);
return err;
}
@@ -1017,44 +1220,58 @@ free:
static void free_unused_bufs(struct virtnet_info *vi)
{
void *buf;
- while (1) {
- buf = virtqueue_detach_unused_buf(vi->svq);
- if (!buf)
- break;
- dev_kfree_skb(buf);
- }
- while (1) {
- buf = virtqueue_detach_unused_buf(vi->rvq);
- if (!buf)
- break;
- if (vi->mergeable_rx_bufs || vi->big_packets)
- give_pages(vi, buf);
- else
+ int i;
+
+ for (i = 0; i < vi->numtxqs; i++) {
+ struct virtqueue *svq = vi->sq[i]->svq;
+
+ while (1) {
+ buf = virtqueue_detach_unused_buf(svq);
+ if (!buf)
+ break;
dev_kfree_skb(buf);
- --vi->num;
+ }
+ }
+
+ for (i = 0; i < vi->numtxqs; i++) {
+ struct virtqueue *rvq = vi->rq[i]->rvq;
+
+ while (1) {
+ buf = virtqueue_detach_unused_buf(rvq);
+ if (!buf)
+ break;
+ if (vi->mergeable_rx_bufs || vi->big_packets)
+ give_pages(vi->rq[i], buf);
+ else
+ dev_kfree_skb(buf);
+ --vi->rq[i]->num;
+ }
+ BUG_ON(vi->rq[i]->num != 0);
}
- BUG_ON(vi->num != 0);
+
+ free_rq_sq(vi);
}
static void __devexit virtnet_remove(struct virtio_device *vdev)
{
struct virtnet_info *vi = vdev->priv;
+ int i;
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
unregister_netdev(vi->dev);
- cancel_delayed_work_sync(&vi->refill);
+
+ for (i = 0; i < vi->numtxqs; i++)
+ cancel_delayed_work_sync(&vi->rq[i]->refill);
/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);
vdev->config->del_vqs(vi->vdev);
- while (vi->pages)
- __free_pages(get_a_page(vi, GFP_KERNEL), 0);
-
+ free_receive_bufs(vi);
free_netdev(vi->dev);
}
@@ -1070,7 +1287,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
- VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
+ VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, VIRTIO_NET_F_NUMTXQS,
};
static struct virtio_driver virtio_net_driver = {
^ permalink raw reply
* [PATCH 1/3] [RFC] Change virtqueue structure
From: Krishna Kumar @ 2011-02-28 6:34 UTC (permalink / raw)
To: rusty, davem, mst
Cc: eric.dumazet, arnd, netdev, horms, avi, anthony, kvm,
Krishna Kumar
In-Reply-To: <20110228063427.24908.63561.sendpatchset@krkumar2.in.ibm.com>
Move queue_index from virtio_pci_vq_info to virtqueue. This
allows callback handlers to figure out the queue number for
the vq that needs attention.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
drivers/virtio/virtio_pci.c | 10 +++-------
include/linux/virtio.h | 1 +
2 files changed, 4 insertions(+), 7 deletions(-)
diff -ruNp org/include/linux/virtio.h new/include/linux/virtio.h
--- org/include/linux/virtio.h 2010-10-11 10:20:22.000000000 +0530
+++ new/include/linux/virtio.h 2011-02-23 16:26:18.000000000 +0530
@@ -22,6 +22,7 @@ struct virtqueue {
void (*callback)(struct virtqueue *vq);
const char *name;
struct virtio_device *vdev;
+ int queue_index; /* the index of the queue */
void *priv;
};
diff -ruNp org/drivers/virtio/virtio_pci.c new/drivers/virtio/virtio_pci.c
--- org/drivers/virtio/virtio_pci.c 2011-01-28 11:38:24.000000000 +0530
+++ new/drivers/virtio/virtio_pci.c 2011-02-25 10:11:22.000000000 +0530
@@ -75,9 +75,6 @@ struct virtio_pci_vq_info
/* the number of entries in the queue */
int num;
- /* the index of the queue */
- int queue_index;
-
/* the virtual address of the ring queue */
void *queue;
@@ -180,11 +177,10 @@ static void vp_reset(struct virtio_devic
static void vp_notify(struct virtqueue *vq)
{
struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
- struct virtio_pci_vq_info *info = vq->priv;
/* we write the queue's selector into the notification register to
* signal the other end */
- iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
+ iowrite16(vq->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
}
/* Handle a configuration change: Tell driver if it wants to know. */
@@ -380,7 +376,6 @@ static struct virtqueue *setup_vq(struct
if (!info)
return ERR_PTR(-ENOMEM);
- info->queue_index = index;
info->num = num;
info->msix_vector = msix_vec;
@@ -403,6 +398,7 @@ static struct virtqueue *setup_vq(struct
goto out_activate_queue;
}
+ vq->queue_index = index;
vq->priv = info;
info->vq = vq;
@@ -441,7 +437,7 @@ static void vp_del_vq(struct virtqueue *
list_del(&info->node);
spin_unlock_irqrestore(&vp_dev->lock, flags);
- iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+ iowrite16(vq->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
if (vp_dev->msix_enabled) {
iowrite16(VIRTIO_MSI_NO_VECTOR,
^ permalink raw reply
* [PATCH 0/3] [RFC] Implement multiqueue (RX & TX) virtio-net
From: Krishna Kumar @ 2011-02-28 6:34 UTC (permalink / raw)
To: rusty, davem, mst
Cc: eric.dumazet, arnd, netdev, horms, avi, anthony, kvm,
Krishna Kumar
This patch series is a continuation of an earlier one that
implemented guest MQ TX functionality. This new patchset
implements both RX and TX MQ. Qemu changes are not being
included at this time solely to aid in easier review.
Compatibility testing with old/new combinations of qemu/guest
and vhost was done without any issues.
Some early TCP/UDP test results are at the bottom of this
post, I plan to submit more test results in the coming days.
Please review and provide feedback on what can improve.
Thanks!
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
Test configuration:
Host: 8 Intel Xeon, 8 GB memory
Guest: 4 cpus, 2 GB memory
Each test case runs for 60 secs, results below are average over
two runs. Bandwidth numbers are in gbps. I have used default
netperf, and no testing/system tuning other than taskset each
vhost to 0xf (cpus 0-3). Comparison is testing original kernel
vs new kernel with #txqs=8 ("#" refers to number of netperf
sessions).
_______________________________________________________________________
TCP: Guest -> Local Host (TCP_STREAM)
# BW1 BW2 (%) SD1 SD2 (%) RSD1 RSD2 (%)
_______________________________________________________________________
1 7190 7170 (-.2) 0 0 (0) 3 4 (33.3)
2 8774 11235 (28.0) 3 3 (0) 16 14 (-12.5)
4 9753 15195 (55.7) 17 21 (23.5) 65 59 (-9.2)
8 10224 18265 (78.6) 71 115 (61.9) 251 240 (-4.3)
16 10749 18123 (68.6) 277 456 (64.6) 985 925 (-6.0)
32 11133 17351 (55.8) 1132 1947 (71.9) 3935 3831 (-2.6)
64 11223 17115 (52.4) 4682 7836 (67.3) 15949 15373 (-3.6)
128 11269 16986 (50.7) 19783 31505 (59.2) 66799 61759 (-7.5)
_______________________________________________________________________
Summary: BW: 37.6 SD: 61.2 RSD: -6.5
_______________________________________________________________________
TCP: Local Host -> Guest (TCP_MAERTS)
# BW1 BW2 (%) SD1 SD2 (%) RSD1 RSD2 (%)
_______________________________________________________________________
1 11490 10870 (-5.3) 0 0 (0) 2 2 (0)
2 10612 10554 (-.5) 2 3 (50.0) 12 12 (0)
4 10047 14320 (42.5) 13 16 (23.0) 53 53 (0)
8 9273 15182 (63.7) 56 84 (50.0) 228 233 (2.1)
16 9291 15853 (70.6) 235 390 (65.9) 934 965 (3.3)
32 9382 15741 (67.7) 969 1823 (88.1) 3868 4037 (4.3)
64 9270 14585 (57.3) 3966 8836 (122.7) 15415 17818 (15.5)
128 8997 14678 (63.1) 17024 36649 (115.2) 64933 72677 (11.9)
_______________________________________________________________________
SUM: BW: 24.8 SD: 114.6 RSD: 12.1
______________________________________________________
UDP: Local Host -> Guest (UDP_STREAM)
# BW1 BW2 (%) SD1 SD2 (%)
______________________________________________________
1 17236 16585 (-3.7) 1 1 (0)
2 16795 22693 (35.1) 5 6 (20.0)
4 13390 21502 (60.5) 37 36 (-2.7)
8 13261 24361 (83.7) 163 175 (7.3)
16 12772 23796 (86.3) 692 826 (19.3)
32 12832 23880 (86.0) 2812 2871 (2.0)
64 12779 24293 (90.1) 11299 11237 (-.5)
128 13006 24857 (91.1) 44778 43884 (-1.9)
______________________________________________________
Summary: BW: 37.1 SD: -1.2
^ permalink raw reply
* Re: KIND
From: Mr.David Gurupatham @ 2011-02-28 1:58 UTC (permalink / raw)
To: netdev@vger.kernel.org
I am Mr.David Gurupatham, an attorney at law. A deceased client of mine,
that shares the same last name as yours,died as the result of a heart-
related condition on March 12th 2005.His heart condition was due to the
death of allthe members of his family in the tsunami disaster on the 26th
December2004 in Sumatra
Indonesia .http://en.wikipedia.org/wiki/2004_Indian_Ocean_earthquake
I can be reached on (davidgurupatham9@gmail.com) for
Dollars (US$19 Million Dollars) left behind.
more information.My late Client has a deposit of Nineteen Million
Best regards,
Mr.David Gurupatham(Esq.)
Attorney at Law
^ permalink raw reply
* Re: SO_REUSEPORT - can it be done in kernel?
From: Herbert Xu @ 2011-02-28 4:26 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, rick.jones2, wsommerfeld, daniel.baluta, netdev
In-Reply-To: <AANLkTinxv4JC20YepcMvWi8SL_VTX2sXcRMydD62TT6x@mail.gmail.com>
On Sun, Feb 27, 2011 at 07:45:55PM -0800, Tom Herbert wrote:
> That sounds promising, but receive side will still have problems.
> There is lock contention on the queue as well as cache line bouncing
> on the sock structures. Also multiple threads sleeping on same socket
> typically leads to asymmetric load across the threads (and
> degenerative cases where receiving thread is woken up and other
> threads have already processed all the packets). TCP listener threads
> suffer from these same problems.
IOW this is something that we have to solve anyway. I'm just
being overly cautious here because user-space API changes are
something that we should not enter into lightly.
If this patch was completely internal to the kernel, then I would
have much less of an objection as we can always revert/change it
later on. With a user-space API we don't have that flexibility.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: David Miller @ 2011-02-28 4:12 UTC (permalink / raw)
To: amwang; +Cc: linux-kernel, nhorman, herbert, nhorman, eric.dumazet, netdev
In-Reply-To: <4D6B1742.6030108@redhat.com>
From: Cong Wang <amwang@redhat.com>
Date: Mon, 28 Feb 2011 11:32:18 +0800
> 于 2011年02月28日 08:12, David Miller 写道:
>> From: Amerigo Wang<amwang@redhat.com>
>> Date: Fri, 18 Feb 2011 17:43:34 +0800
>>
>>> Neil pointed out that we can't send ARP reply on behalf of slaves,
>>> we need to move the arp queue to their bond device.
>>>
>>> Signed-off-by: WANG Cong<amwang@redhat.com>
>>
>> Applied.
>
> Oops! Just found that this one I sent was not a refreshed patch.
> Please discard this one, and use the one below, that is the
> correct one in my git tree and the one that I tested.
Done.
^ permalink raw reply
* Re: SO_REUSEPORT - can it be done in kernel?
From: Tom Herbert @ 2011-02-28 3:45 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, rick.jones2, wsommerfeld, daniel.baluta, netdev
In-Reply-To: <20110227110614.GA6246@gondor.apana.org.au>
> I disagree completely.
>
> This patch adds a user-space API that we will have to carry
> with us for perpetuity. I would only support this if we had
> no other way around the problem.
>
> If this does turn out to be mostly due to sendmsg contention
> then fixing it is going to be much simpler than making the UDP
> stack multiqueue capable.
>
That sounds promising, but receive side will still have problems.
There is lock contention on the queue as well as cache line bouncing
on the sock structures. Also multiple threads sleeping on same socket
typically leads to asymmetric load across the threads (and
degenerative cases where receiving thread is woken up and other
threads have already processed all the packets). TCP listener threads
suffer from these same problems.
Tom
> I'm working on this right now.
>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
^ permalink raw reply
* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: Cong Wang @ 2011-02-28 3:32 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, nhorman, herbert, nhorman, eric.dumazet, netdev
In-Reply-To: <20110227.161225.200373200.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 569 bytes --]
于 2011年02月28日 08:12, David Miller 写道:
> From: Amerigo Wang<amwang@redhat.com>
> Date: Fri, 18 Feb 2011 17:43:34 +0800
>
>> Neil pointed out that we can't send ARP reply on behalf of slaves,
>> we need to move the arp queue to their bond device.
>>
>> Signed-off-by: WANG Cong<amwang@redhat.com>
>
> Applied.
Oops! Just found that this one I sent was not a refreshed patch.
Please discard this one, and use the one below, that is the
correct one in my git tree and the one that I tested.
Sorry for this.
----
Signed-off-by: WANG Cong <amwang@redhat.com>
[-- Attachment #2: bond-move-arp-queue-to-master.diff --]
[-- Type: text/plain, Size: 567 bytes --]
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index f68e694..06be243 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -193,6 +193,17 @@ void netpoll_poll_dev(struct net_device *dev)
poll_napi(dev);
+ if (dev->priv_flags & IFF_SLAVE) {
+ if (dev->npinfo) {
+ struct net_device *bond_dev = dev->master;
+ struct sk_buff *skb;
+ while ((skb = skb_dequeue(&dev->npinfo->arp_tx))) {
+ skb->dev = bond_dev;
+ skb_queue_tail(&bond_dev->npinfo->arp_tx, skb);
+ }
+ }
+ }
+
service_arp_queue(dev->npinfo);
zap_completion_queue();
^ permalink raw reply related
* Re: net-next: warnings from sysctl_net_exit
From: Lucian Adrian Grijincu @ 2011-02-28 1:11 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110227160852.4f748c1f@nehalam>
On Mon, Feb 28, 2011 at 2:08 AM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
>> The check is triggered at network namespace deletion, so a moment
>> before deleting the netns should be fine.
>
> Although the kernel was compiled with netns, I never use net namespaces.
This can be triggered at shutdown when the init_net network namespace
is dismantled.
Are you getting this at shutdown time? If not, is it easily reproducible?
Can you post a .config?
--
.
..: Lucian
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox