* [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo @ 2009-02-12 14:57 Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly 0 siblings, 2 replies; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 14:57 UTC (permalink / raw) To: David Miller Cc: John Stultz, Ronciak, John, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Hello! This revision of the patch series transports hardware time stamps from the hardware into user space via additional fields in struct skb_shared_info. It's based on net-next-2.6 as of this morning. This is the solution that emerged from the discussion of the other approaches suggested before (reuse tstamp, extend skbuff, optional structs): it has the advantage of not touching skbuff and also has the simplest implementation. The clocksource and timecompare patches have been reviewed by John Stultz. He doesn't mind merging them via the net subtree. John Ronciak reviewed the igb driver patches. He suggested to merge the patches as they; after all, PTPd already works fine. I just tested again on 32 and 64 bit x86, both with the timestamping example programs as well as with PTPd. Latest patched PTPd is here: http://github.com/pohly/ptpd/tree/master The open TODOs in the igb driver will be fixed. We think that this will be easier with the infrastructure and the driver in a regular kernel tree. David, I hope you consider the patches acceptable now. If not, then as before: please let me know what I can do to make them acceptable. -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c 2009-02-12 14:57 [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly 1 sibling, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly So far struct clocksource acted as the interface between time/timekeeping.c and hardware. This patch generalizes the concept so that a similar interface can also be used in other contexts. For that it introduces new structures and related functions *without* touching the existing struct clocksource. The reasons for adding these new structures to clocksource.[ch] are * the APIs are clearly related * struct clocksource could be cleaned up to use the new structs * avoids proliferation of files with similar names (timesource.h? timecounter.h?) As outlined in the discussion with John Stultz, this patch adds * struct cyclecounter: stateless API to hardware which counts clock cycles * struct timecounter: stateful utility code built on a cyclecounter which provides a nanosecond counter * only the function to read the nanosecond counter; deltas are used internally and not exposed to users of timecounter The code does no locking of the shared state. It must be called at least as often as the cycle counter wraps around to detect these wrap arounds. Both is the responsibility of the timecounter user. Acked-by: John Stultz <johnstul@us.ibm.com> --- include/linux/clocksource.h | 101 +++++++++++++++++++++++++++++++++++++++++++ kernel/time/clocksource.c | 76 ++++++++++++++++++++++++++++++++ 2 files changed, 177 insertions(+), 0 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index f88d32f..573819e 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -22,8 +22,109 @@ typedef u64 cycle_t; struct clocksource; /** + * struct cyclecounter - hardware abstraction for a free running counter + * Provides completely state-free accessors to the underlying hardware. + * Depending on which hardware it reads, the cycle counter may wrap + * around quickly. Locking rules (if necessary) have to be defined + * by the implementor and user of specific instances of this API. + * + * @read: returns the current cycle value + * @mask: bitmask for two's complement + * subtraction of non 64 bit counters, + * see CLOCKSOURCE_MASK() helper macro + * @mult: cycle to nanosecond multiplier + * @shift: cycle to nanosecond divisor (power of two) + */ +struct cyclecounter { + cycle_t (*read)(const struct cyclecounter *cc); + cycle_t mask; + u32 mult; + u32 shift; +}; + +/** + * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds + * Contains the state needed by timecounter_read() to detect + * cycle counter wrap around. Initialize with + * timecounter_init(). Also used to convert cycle counts into the + * corresponding nanosecond counts with timecounter_cyc2time(). Users + * of this code are responsible for initializing the underlying + * cycle counter hardware, locking issues and reading the time + * more often than the cycle counter wraps around. The nanosecond + * counter will only wrap around after ~585 years. + * + * @cc: the cycle counter used by this instance + * @cycle_last: most recent cycle counter value seen by + * timecounter_read() + * @nsec: continuously increasing count + */ +struct timecounter { + const struct cyclecounter *cc; + cycle_t cycle_last; + u64 nsec; +}; + +/** + * cyclecounter_cyc2ns - converts cycle counter cycles to nanoseconds + * @tc: Pointer to cycle counter. + * @cycles: Cycles + * + * XXX - This could use some mult_lxl_ll() asm optimization. Same code + * as in cyc2ns, but with unsigned result. + */ +static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc, + cycle_t cycles) +{ + u64 ret = (u64)cycles; + ret = (ret * cc->mult) >> cc->shift; + return ret; +} + +/** + * timecounter_init - initialize a time counter + * @tc: Pointer to time counter which is to be initialized/reset + * @cc: A cycle counter, ready to be used. + * @start_tstamp: Arbitrary initial time stamp. + * + * After this call the current cycle register (roughly) corresponds to + * the initial time stamp. Every call to timecounter_read() increments + * the time stamp counter by the number of elapsed nanoseconds. + */ +extern void timecounter_init(struct timecounter *tc, + const struct cyclecounter *cc, + u64 start_tstamp); + +/** + * timecounter_read - return nanoseconds elapsed since timecounter_init() + * plus the initial time stamp + * @tc: Pointer to time counter. + * + * In other words, keeps track of time since the same epoch as + * the function which generated the initial time stamp. + */ +extern u64 timecounter_read(struct timecounter *tc); + +/** + * timecounter_cyc2time - convert a cycle counter to same + * time base as values returned by + * timecounter_read() + * @tc: Pointer to time counter. + * @cycle: a value returned by tc->cc->read() + * + * Cycle counts that are converted correctly as long as they + * fall into the interval [-1/2 max cycle count, +1/2 max cycle count], + * with "max cycle count" == cs->mask+1. + * + * This allows conversion of cycle counter values which were generated + * in the past. + */ +extern u64 timecounter_cyc2time(struct timecounter *tc, + cycle_t cycle_tstamp); + +/** * struct clocksource - hardware abstraction for a free running counter * Provides mostly state-free accessors to the underlying hardware. + * This is the structure used for system time. * * @name: ptr to clocksource name * @list: list head for registration diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ca89e15..c46c931 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -31,6 +31,82 @@ #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */ #include <linux/tick.h> +void timecounter_init(struct timecounter *tc, + const struct cyclecounter *cc, + u64 start_tstamp) +{ + tc->cc = cc; + tc->cycle_last = cc->read(cc); + tc->nsec = start_tstamp; +} +EXPORT_SYMBOL(timecounter_init); + +/** + * timecounter_read_delta - get nanoseconds since last call of this function + * @tc: Pointer to time counter + * + * When the underlying cycle counter runs over, this will be handled + * correctly as long as it does not run over more than once between + * calls. + * + * The first call to this function for a new time counter initializes + * the time tracking and returns an undefined result. + */ +static u64 timecounter_read_delta(struct timecounter *tc) +{ + cycle_t cycle_now, cycle_delta; + u64 ns_offset; + + /* read cycle counter: */ + cycle_now = tc->cc->read(tc->cc); + + /* calculate the delta since the last timecounter_read_delta(): */ + cycle_delta = (cycle_now - tc->cycle_last) & tc->cc->mask; + + /* convert to nanoseconds: */ + ns_offset = cyclecounter_cyc2ns(tc->cc, cycle_delta); + + /* update time stamp of timecounter_read_delta() call: */ + tc->cycle_last = cycle_now; + + return ns_offset; +} + +u64 timecounter_read(struct timecounter *tc) +{ + u64 nsec; + + /* increment time by nanoseconds since last call */ + nsec = timecounter_read_delta(tc); + nsec += tc->nsec; + tc->nsec = nsec; + + return nsec; +} +EXPORT_SYMBOL(timecounter_read); + +u64 timecounter_cyc2time(struct timecounter *tc, + cycle_t cycle_tstamp) +{ + u64 cycle_delta = (cycle_tstamp - tc->cycle_last) & tc->cc->mask; + u64 nsec; + + /* + * Instead of always treating cycle_tstamp as more recent + * than tc->cycle_last, detect when it is too far in the + * future and treat it as old time stamp instead. + */ + if (cycle_delta > tc->cc->mask / 2) { + cycle_delta = (tc->cycle_last - cycle_tstamp) & tc->cc->mask; + nsec = tc->nsec - cyclecounter_cyc2ns(tc->cc, cycle_delta); + } else { + nsec = cyclecounter_cyc2ns(tc->cc, cycle_delta) + tc->nsec; + } + + return nsec; +} +EXPORT_SYMBOL(timecounter_cyc2time); + /* XXX - Would like a better way for initializing curr_clocksource */ extern struct clocksource clocksource_jiffies; -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases 2009-02-12 15:00 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Mapping from a struct timecounter to a time returned by functions like ktime_get_real() is implemented. This is sufficient to use this code in a network device driver which wants to support hardware time stamping and transformation of hardware time stamps to system time. The interface could have been made more versatile by not depending on a time counter, but this wasn't done to avoid writing glue code elsewhere. The method implemented here is the one used and analyzed under the name "assisted PTP" in the LCI PTP paper: http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf Acked-by: John Stultz <johnstul@us.ibm.com> --- include/linux/timecompare.h | 125 ++++++++++++++++++++++++++++ kernel/time/Makefile | 2 +- kernel/time/timecompare.c | 191 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 317 insertions(+), 1 deletions(-) create mode 100644 include/linux/timecompare.h create mode 100644 kernel/time/timecompare.c diff --git a/include/linux/timecompare.h b/include/linux/timecompare.h new file mode 100644 index 0000000..546e223 --- /dev/null +++ b/include/linux/timecompare.h @@ -0,0 +1,125 @@ +/* + * Utility code which helps transforming between two different time + * bases, called "source" and "target" time in this code. + * + * Source time has to be provided via the timecounter API while target + * time is accessed via a function callback whose prototype + * intentionally matches ktime_get() and ktime_get_real(). These + * interfaces where chosen like this so that the code serves its + * initial purpose without additional glue code. + * + * This purpose is synchronizing a hardware clock in a NIC with system + * time, in order to implement the Precision Time Protocol (PTP, + * IEEE1588) with more accurate hardware assisted time stamping. In + * that context only synchronization against system time (= + * ktime_get_real()) is currently needed. But this utility code might + * become useful in other situations, which is why it was written as + * general purpose utility code. + * + * The source timecounter is assumed to return monotonically + * increasing time (but this code does its best to compensate if that + * is not the case) whereas target time may jump. + * + * The target time corresponding to a source time is determined by + * reading target time, reading source time, reading target time + * again, then assuming that average target time corresponds to source + * time. In other words, the assumption is that reading the source + * time is slow and involves equal time for sending the request and + * receiving the reply, whereas reading target time is assumed to be + * fast. + * + * Copyright (C) 2009 Intel Corporation. + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ +#ifndef _LINUX_TIMECOMPARE_H +#define _LINUX_TIMECOMPARE_H + +#include <linux/clocksource.h> +#include <linux/ktime.h> + +/** + * struct timecompare - stores state and configuration for the two clocks + * + * Initialize to zero, then set source/target/num_samples. + * + * Transformation between source time and target time is done with: + * target_time = source_time + offset + + * (source_time - last_update) * skew / + * TIMECOMPARE_SKEW_RESOLUTION + * + * @source: used to get source time stamps via timecounter_read() + * @target: function returning target time (for example, ktime_get + * for monotonic time, or ktime_get_real for wall clock) + * @num_samples: number of times that source time and target time are to + * be compared when determining their offset + * @offset: (target time - source time) at the time of the last update + * @skew: average (target time - source time) / delta source time * + * TIMECOMPARE_SKEW_RESOLUTION + * @last_update: last source time stamp when time offset was measured + */ +struct timecompare { + struct timecounter *source; + ktime_t (*target)(void); + int num_samples; + + s64 offset; + s64 skew; + u64 last_update; +}; + +/** + * timecompare_transform - transform source time stamp into target time base + * @sync: context for time sync + * @source_tstamp: the result of timecounter_read() or + * timecounter_cyc2time() + */ +extern ktime_t timecompare_transform(struct timecompare *sync, + u64 source_tstamp); + +/** + * timecompare_offset - measure current (target time - source time) offset + * @sync: context for time sync + * @offset: average offset during sample period returned here + * @source_tstamp: average source time during sample period returned here + * + * Returns number of samples used. Might be zero (= no result) in the + * unlikely case that target time was monotonically decreasing for all + * samples (= broken). + */ +extern int timecompare_offset(struct timecompare *sync, + s64 *offset, + u64 *source_tstamp); + +extern void __timecompare_update(struct timecompare *sync, + u64 source_tstamp); + +/** + * timecompare_update - update offset and skew by measuring current offset + * @sync: context for time sync + * @source_tstamp: the result of timecounter_read() or + * timecounter_cyc2time(), pass zero to force update + * + * Updates are only done at most once per second. + */ +static inline void timecompare_update(struct timecompare *sync, + u64 source_tstamp) +{ + if (!source_tstamp || + (s64)(source_tstamp - sync->last_update) >= NSEC_PER_SEC) + __timecompare_update(sync, source_tstamp); +} + +#endif /* _LINUX_TIMECOMPARE_H */ diff --git a/kernel/time/Makefile b/kernel/time/Makefile index 905b0b5..0b0a636 100644 --- a/kernel/time/Makefile +++ b/kernel/time/Makefile @@ -1,4 +1,4 @@ -obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o +obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o timecompare.o obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD) += clockevents.o obj-$(CONFIG_GENERIC_CLOCKEVENTS) += tick-common.o diff --git a/kernel/time/timecompare.c b/kernel/time/timecompare.c new file mode 100644 index 0000000..71e7f1a --- /dev/null +++ b/kernel/time/timecompare.c @@ -0,0 +1,191 @@ +/* + * Copyright (C) 2009 Intel Corporation. + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <linux/timecompare.h> +#include <linux/module.h> +#include <linux/math64.h> + +/* + * fixed point arithmetic scale factor for skew + * + * Usually one would measure skew in ppb (parts per billion, 1e9), but + * using a factor of 2 simplifies the math. + */ +#define TIMECOMPARE_SKEW_RESOLUTION (((s64)1)<<30) + +ktime_t timecompare_transform(struct timecompare *sync, + u64 source_tstamp) +{ + u64 nsec; + + nsec = source_tstamp + sync->offset; + nsec += (s64)(source_tstamp - sync->last_update) * sync->skew / + TIMECOMPARE_SKEW_RESOLUTION; + + return ns_to_ktime(nsec); +} +EXPORT_SYMBOL(timecompare_transform); + +int timecompare_offset(struct timecompare *sync, + s64 *offset, + u64 *source_tstamp) +{ + u64 start_source = 0, end_source = 0; + struct { + s64 offset; + s64 duration_target; + } buffer[10], sample, *samples; + int counter = 0, i; + int used; + int index; + int num_samples = sync->num_samples; + + if (num_samples > sizeof(buffer)/sizeof(buffer[0])) { + samples = kmalloc(sizeof(*samples) * num_samples, GFP_ATOMIC); + if (!samples) { + samples = buffer; + num_samples = sizeof(buffer)/sizeof(buffer[0]); + } + } else { + samples = buffer; + } + + /* run until we have enough valid samples, but do not try forever */ + i = 0; + counter = 0; + while (1) { + u64 ts; + ktime_t start, end; + + start = sync->target(); + ts = timecounter_read(sync->source); + end = sync->target(); + + if (!i) + start_source = ts; + + /* ignore negative durations */ + sample.duration_target = ktime_to_ns(ktime_sub(end, start)); + if (sample.duration_target >= 0) { + /* + * assume symetric delay to and from source: + * average target time corresponds to measured + * source time + */ + sample.offset = + ktime_to_ns(ktime_add(end, start)) / 2 - + ts; + + /* simple insertion sort based on duration */ + index = counter - 1; + while (index >= 0) { + if (samples[index].duration_target < + sample.duration_target) + break; + samples[index + 1] = samples[index]; + index--; + } + samples[index + 1] = sample; + counter++; + } + + i++; + if (counter >= num_samples || i >= 100000) { + end_source = ts; + break; + } + } + + *source_tstamp = (end_source + start_source) / 2; + + /* remove outliers by only using 75% of the samples */ + used = counter * 3 / 4; + if (!used) + used = counter; + if (used) { + /* calculate average */ + s64 off = 0; + for (index = 0; index < used; index++) + off += samples[index].offset; + *offset = div_s64(off, used); + } + + if (samples && samples != buffer) + kfree(samples); + + return used; +} +EXPORT_SYMBOL(timecompare_offset); + +void __timecompare_update(struct timecompare *sync, + u64 source_tstamp) +{ + s64 offset; + u64 average_time; + + if (!timecompare_offset(sync, &offset, &average_time)) + return; + + if (!sync->last_update) { + sync->last_update = average_time; + sync->offset = offset; + sync->skew = 0; + } else { + s64 delta_nsec = average_time - sync->last_update; + + /* avoid division by negative or small deltas */ + if (delta_nsec >= 10000) { + s64 delta_offset_nsec = offset - sync->offset; + s64 skew; /* delta_offset_nsec * + TIMECOMPARE_SKEW_RESOLUTION / + delta_nsec */ + u64 divisor; + + /* div_s64() is limited to 32 bit divisor */ + skew = delta_offset_nsec * TIMECOMPARE_SKEW_RESOLUTION; + divisor = delta_nsec; + while (unlikely(divisor >= ((s64)1) << 32)) { + /* divide both by 2; beware, right shift + of negative value has undefined + behavior and can only be used for + the positive divisor */ + skew = div_s64(skew, 2); + divisor >>= 1; + } + skew = div_s64(skew, divisor); + + /* + * Calculate new overall skew as 4/16 the + * old value and 12/16 the new one. This is + * a rather arbitrary tradeoff between + * only using the latest measurement (0/16 and + * 16/16) and even more weight on past measurements. + */ +#define TIMECOMPARE_NEW_SKEW_PER_16 12 + sync->skew = + div_s64((16 - TIMECOMPARE_NEW_SKEW_PER_16) * + sync->skew + + TIMECOMPARE_NEW_SKEW_PER_16 * skew, + 16); + sync->last_update = average_time; + sync->offset = offset; + } + } +} +EXPORT_SYMBOL(__timecompare_update); -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets 2009-02-12 15:00 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly User space can request hardware and/or software time stamping. Reporting of the result(s) via a new control message is enabled separately for each field in the message because some of the fields may require additional computation and thus cause overhead. User space can tell the different kinds of time stamps apart and choose what suits its needs. When a TX timestamp operation is requested, the TX skb will be cloned and the clone will be time stamped (in hardware or software) and added to the socket error queue of the skb, if the skb has a socket associated with it. The actual TX timestamp will reach userspace as a RX timestamp on the cloned packet. If timestamping is requested and no timestamping is done in the device driver (potentially this may use hardware timestamping), it will be done in software after the device's start_hard_xmit routine. --- Documentation/networking/timestamping.txt | 178 +++++++ Documentation/networking/timestamping/.gitignore | 1 + Documentation/networking/timestamping/Makefile | 6 + .../networking/timestamping/timestamping.c | 533 ++++++++++++++++++++ arch/alpha/include/asm/socket.h | 3 + arch/arm/include/asm/socket.h | 3 + arch/avr32/include/asm/socket.h | 3 + arch/blackfin/include/asm/socket.h | 3 + arch/cris/include/asm/socket.h | 3 + arch/h8300/include/asm/socket.h | 3 + arch/ia64/include/asm/socket.h | 3 + arch/m68k/include/asm/socket.h | 3 + arch/mips/include/asm/socket.h | 3 + arch/parisc/include/asm/socket.h | 3 + arch/powerpc/include/asm/socket.h | 3 + arch/s390/include/asm/socket.h | 3 + arch/sh/include/asm/socket.h | 3 + arch/sparc/include/asm/socket.h | 3 + arch/x86/include/asm/socket.h | 3 + arch/xtensa/include/asm/socket.h | 3 + include/asm-frv/socket.h | 3 + include/asm-m32r/socket.h | 3 + include/asm-mn10300/socket.h | 3 + include/linux/errqueue.h | 1 + include/linux/net_tstamp.h | 104 ++++ include/linux/sockios.h | 3 + 26 files changed, 883 insertions(+), 0 deletions(-) create mode 100644 Documentation/networking/timestamping.txt create mode 100644 Documentation/networking/timestamping/.gitignore create mode 100644 Documentation/networking/timestamping/Makefile create mode 100644 Documentation/networking/timestamping/timestamping.c create mode 100644 include/linux/net_tstamp.h diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt new file mode 100644 index 0000000..a681a65 --- /dev/null +++ b/Documentation/networking/timestamping.txt @@ -0,0 +1,178 @@ +The existing interfaces for getting network packages time stamped are: + +* SO_TIMESTAMP + Generate time stamp for each incoming packet using the (not necessarily + monotonous!) system time. Result is returned via recv_msg() in a + control message as timeval (usec resolution). + +* SO_TIMESTAMPNS + Same time stamping mechanism as SO_TIMESTAMP, but returns result as + timespec (nsec resolution). + +* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS] + Only for multicasts: approximate send time stamp by receiving the looped + packet and using its receive time stamp. + +The following interface complements the existing ones: receive time +stamps can be generated and returned for arbitrary packets and much +closer to the point where the packet is really sent. Time stamps can +be generated in software (as before) or in hardware (if the hardware +has such a feature). + +SO_TIMESTAMPING: + +Instructs the socket layer which kind of information is wanted. The +parameter is an integer with some of the following bits set. Setting +other bits is an error and doesn't change the current state. + +SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamp in hardware +SOF_TIMESTAMPING_TX_SOFTWARE: if SOF_TIMESTAMPING_TX_HARDWARE is off or + fails, then do it in software +SOF_TIMESTAMPING_RX_HARDWARE: return the original, unmodified time stamp + as generated by the hardware +SOF_TIMESTAMPING_RX_SOFTWARE: if SOF_TIMESTAMPING_RX_HARDWARE is off or + fails, then do it in software +SOF_TIMESTAMPING_RAW_HARDWARE: return original raw hardware time stamp +SOF_TIMESTAMPING_SYS_HARDWARE: return hardware time stamp transformed to + the system time base +SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in + software + +SOF_TIMESTAMPING_TX/RX determine how time stamps are generated. +SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the +following control message: + struct scm_timestamping { + struct timespec systime; + struct timespec hwtimetrans; + struct timespec hwtimeraw; + }; + +recvmsg() can be used to get this control message for regular incoming +packets. For send time stamps the outgoing packet is looped back to +the socket's error queue with the send time stamp(s) attached. It can +be received with recvmsg(flags=MSG_ERRQUEUE). The call returns the +original outgoing packet data including all headers preprended down to +and including the link layer, the scm_timestamping control message and +a sock_extended_err control message with ee_errno==ENOMSG and +ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending +bounced packet is ready for reading as far as select() is concerned. + +All three values correspond to the same event in time, but were +generated in different ways. Each of these values may be empty (= all +zero), in which case no such value was available. If the application +is not interested in some of these values, they can be left blank to +avoid the potential overhead of calculating them. + +systime is the value of the system time at that moment. This +corresponds to the value also returned via SO_TIMESTAMP[NS]. If the +time stamp was generated by hardware, then this field is +empty. Otherwise it is filled in if SOF_TIMESTAMPING_SOFTWARE is +set. + +hwtimeraw is the original hardware time stamp. Filled in if +SOF_TIMESTAMPING_RAW_HARDWARE is set. No assumptions about its +relation to system time should be made. + +hwtimetrans is the hardware time stamp transformed so that it +corresponds as good as possible to system time. This correlation is +not perfect; as a consequence, sorting packets received via different +NICs by their hwtimetrans may differ from the order in which they were +received. hwtimetrans may be non-monotonic even for the same NIC. +Filled in if SOF_TIMESTAMPING_SYS_HARDWARE is set. Requires support +by the network device and will be empty without that support. + + +SIOCSHWTSTAMP: + +Hardware time stamping must also be initialized for each device driver +that is expected to do hardware time stamping. The parameter is: + +struct hwtstamp_config { + int flags; /* no flags defined right now, must be zero */ + int tx_type; /* HWTSTAMP_TX_* */ + int rx_filter; /* HWTSTAMP_FILTER_* */ +}; + +Desired behavior is passed into the kernel and to a specific device by +calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose +ifr_data points to a struct hwtstamp_config. The tx_type and +rx_filter are hints to the driver what it is expected to do. If +the requested fine-grained filtering for incoming packets is not +supported, the driver may time stamp more than just the requested types +of packets. + +A driver which supports hardware time stamping shall update the struct +with the actual, possibly more permissive configuration. If the +requested packets cannot be time stamped, then nothing should be +changed and ERANGE shall be returned (in contrast to EINVAL, which +indicates that SIOCSHWTSTAMP is not supported at all). + +Only a processes with admin rights may change the configuration. User +space is responsible to ensure that multiple processes don't interfere +with each other and that the settings are reset. + +/* possible values for hwtstamp_config->tx_type */ +enum { + /* + * no outgoing packet will need hardware time stamping; + * should a packet arrive which asks for it, no hardware + * time stamping will be done + */ + HWTSTAMP_TX_OFF, + + /* + * enables hardware time stamping for outgoing packets; + * the sender of the packet decides which are to be + * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE + * before sending the packet + */ + HWTSTAMP_TX_ON, +}; + +/* possible values for hwtstamp_config->rx_filter */ +enum { + /* time stamp no incoming packet at all */ + HWTSTAMP_FILTER_NONE, + + /* time stamp any incoming packet */ + HWTSTAMP_FILTER_ALL, + + /* return value: time stamp all packets requested plus some others */ + HWTSTAMP_FILTER_SOME, + + /* PTP v1, UDP, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V1_L4_EVENT, + + ... +}; + + +DEVICE IMPLEMENTATION + +A driver which supports hardware time stamping must support the +SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored +in the skb with skb_hwtstamp_set(). + +Time stamps for outgoing packets are to be generated as follows: +- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware() + returns non-zero. If yes, then the driver is expected + to do hardware time stamping. +- If this is possible for the skb and requested, then declare + that the driver is doing the time stamping by calling + skb_hwtstamp_tx_in_progress(). A driver not supporting + hardware time stamping doesn't do that. A driver must never + touch sk_buff::tstamp! It is used to store how time stamping + for an outgoing packets is to be done. +- As soon as the driver has sent the packet and/or obtained a + hardware time stamp for it, it passes the time stamp back by + calling skb_hwtstamp_tx() with the original skb, the raw + hardware time stamp and a handle to the device (necessary + to convert the hardware time stamp to system time). If obtaining + the hardware time stamp somehow fails, then the driver should + not fall back to software time stamping. The rationale is that + this would occur at a later time in the processing pipeline + than other software time stamping and therefore could lead + to unexpected deltas between time stamps. +- If the driver did not call skb_hwtstamp_tx_in_progress(), then + dev_hard_start_xmit() checks whether software time stamping + is wanted as fallback and potentially generates the time stamp. diff --git a/Documentation/networking/timestamping/.gitignore b/Documentation/networking/timestamping/.gitignore new file mode 100644 index 0000000..71e81eb --- /dev/null +++ b/Documentation/networking/timestamping/.gitignore @@ -0,0 +1 @@ +timestamping diff --git a/Documentation/networking/timestamping/Makefile b/Documentation/networking/timestamping/Makefile new file mode 100644 index 0000000..2a1489f --- /dev/null +++ b/Documentation/networking/timestamping/Makefile @@ -0,0 +1,6 @@ +CPPFLAGS = -I../../../include + +timestamping: timestamping.c + +clean: + rm -f timestamping diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c new file mode 100644 index 0000000..43d1431 --- /dev/null +++ b/Documentation/networking/timestamping/timestamping.c @@ -0,0 +1,533 @@ +/* + * This program demonstrates how the various time stamping features in + * the Linux kernel work. It emulates the behavior of a PTP + * implementation in stand-alone master mode by sending PTPv1 Sync + * multicasts once every second. It looks for similar packets, but + * beyond that doesn't actually implement PTP. + * + * Outgoing packets are time stamped with SO_TIMESTAMPING with or + * without hardware support. + * + * Incoming packets are time stamped with SO_TIMESTAMPING with or + * without hardware support, SIOCGSTAMP[NS] (per-socket time stamp) and + * SO_TIMESTAMP[NS]. + * + * Copyright (C) 2009 Intel Corporation. + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <errno.h> +#include <string.h> + +#include <sys/time.h> +#include <sys/socket.h> +#include <sys/select.h> +#include <sys/ioctl.h> +#include <arpa/inet.h> +#include <net/if.h> + +#include "asm/types.h" +#include "linux/net_tstamp.h" +#include "linux/errqueue.h" + +#ifndef SO_TIMESTAMPING +# define SO_TIMESTAMPING 37 +# define SCM_TIMESTAMPING SO_TIMESTAMPING +#endif + +#ifndef SO_TIMESTAMPNS +# define SO_TIMESTAMPNS 35 +#endif + +#ifndef SIOCGSTAMPNS +# define SIOCGSTAMPNS 0x8907 +#endif + +#ifndef SIOCSHWTSTAMP +# define SIOCSHWTSTAMP 0x89b0 +#endif + +static void usage(const char *error) +{ + if (error) + printf("invalid option: %s\n", error); + printf("timestamping interface option*\n\n" + "Options:\n" + " IP_MULTICAST_LOOP - looping outgoing multicasts\n" + " SO_TIMESTAMP - normal software time stamping, ms resolution\n" + " SO_TIMESTAMPNS - more accurate software time stamping\n" + " SOF_TIMESTAMPING_TX_HARDWARE - hardware time stamping of outgoing packets\n" + " SOF_TIMESTAMPING_TX_SOFTWARE - software fallback for outgoing packets\n" + " SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n" + " SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n" + " SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n" + " SOF_TIMESTAMPING_SYS_HARDWARE - request reporting of transformed HW time stamps\n" + " SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n" + " SIOCGSTAMP - check last socket time stamp\n" + " SIOCGSTAMPNS - more accurate socket time stamp\n"); + exit(1); +} + +static void bail(const char *error) +{ + printf("%s: %s\n", error, strerror(errno)); + exit(1); +} + +static const unsigned char sync[] = { + 0x00, 0x01, 0x00, 0x01, + 0x5f, 0x44, 0x46, 0x4c, + 0x54, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x01, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x01, 0x00, 0x37, + 0x00, 0x00, 0x00, 0x08, + 0x00, 0x00, 0x00, 0x00, + 0x49, 0x05, 0xcd, 0x01, + 0x29, 0xb1, 0x8d, 0xb0, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x00, 0x00, 0x37, + 0x00, 0x00, 0x00, 0x04, + 0x44, 0x46, 0x4c, 0x54, + 0x00, 0x00, 0xf0, 0x60, + 0x00, 0x01, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x01, + 0x00, 0x00, 0xf0, 0x60, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x04, + 0x44, 0x46, 0x4c, 0x54, + 0x00, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00 +}; + +static void sendpacket(int sock, struct sockaddr *addr, socklen_t addr_len) +{ + struct timeval now; + int res; + + res = sendto(sock, sync, sizeof(sync), 0, + addr, addr_len); + gettimeofday(&now, 0); + if (res < 0) + printf("%s: %s\n", "send", strerror(errno)); + else + printf("%ld.%06ld: sent %d bytes\n", + (long)now.tv_sec, (long)now.tv_usec, + res); +} + +static void printpacket(struct msghdr *msg, int res, + char *data, + int sock, int recvmsg_flags, + int siocgstamp, int siocgstampns) +{ + struct sockaddr_in *from_addr = (struct sockaddr_in *)msg->msg_name; + struct cmsghdr *cmsg; + struct timeval tv; + struct timespec ts; + struct timeval now; + + gettimeofday(&now, 0); + + printf("%ld.%06ld: received %s data, %d bytes from %s, %d bytes control messages\n", + (long)now.tv_sec, (long)now.tv_usec, + (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular", + res, + inet_ntoa(from_addr->sin_addr), + msg->msg_controllen); + for (cmsg = CMSG_FIRSTHDR(msg); + cmsg; + cmsg = CMSG_NXTHDR(msg, cmsg)) { + printf(" cmsg len %d: ", cmsg->cmsg_len); + switch (cmsg->cmsg_level) { + case SOL_SOCKET: + printf("SOL_SOCKET "); + switch (cmsg->cmsg_type) { + case SO_TIMESTAMP: { + struct timeval *stamp = + (struct timeval *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMP %ld.%06ld", + (long)stamp->tv_sec, + (long)stamp->tv_usec); + break; + } + case SO_TIMESTAMPNS: { + struct timespec *stamp = + (struct timespec *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMPNS %ld.%09ld", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + break; + } + case SO_TIMESTAMPING: { + struct timespec *stamp = + (struct timespec *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMPING "); + printf("SW %ld.%09ld ", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + stamp++; + printf("HW transformed %ld.%09ld ", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + stamp++; + printf("HW raw %ld.%09ld", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + break; + } + default: + printf("type %d", cmsg->cmsg_type); + break; + } + break; + case IPPROTO_IP: + printf("IPPROTO_IP "); + switch (cmsg->cmsg_type) { + case IP_RECVERR: { + struct sock_extended_err *err = + (struct sock_extended_err *)CMSG_DATA(cmsg); + printf("IP_RECVERR ee_errno '%s' ee_origin %d => %s", + strerror(err->ee_errno), + err->ee_origin, +#ifdef SO_EE_ORIGIN_TIMESTAMPING + err->ee_origin == SO_EE_ORIGIN_TIMESTAMPING ? + "bounced packet" : "unexpected origin" +#else + "probably SO_EE_ORIGIN_TIMESTAMPING" +#endif + ); + if (res < sizeof(sync)) + printf(" => truncated data?!"); + else if (!memcmp(sync, data + res - sizeof(sync), + sizeof(sync))) + printf(" => GOT OUR DATA BACK (HURRAY!)"); + break; + } + case IP_PKTINFO: { + struct in_pktinfo *pktinfo = + (struct in_pktinfo *)CMSG_DATA(cmsg); + printf("IP_PKTINFO interface index %u", + pktinfo->ipi_ifindex); + break; + } + default: + printf("type %d", cmsg->cmsg_type); + break; + } + break; + default: + printf("level %d type %d", + cmsg->cmsg_level, + cmsg->cmsg_type); + break; + } + printf("\n"); + } + + if (siocgstamp) { + if (ioctl(sock, SIOCGSTAMP, &tv)) + printf(" %s: %s\n", "SIOCGSTAMP", strerror(errno)); + else + printf("SIOCGSTAMP %ld.%06ld\n", + (long)tv.tv_sec, + (long)tv.tv_usec); + } + if (siocgstampns) { + if (ioctl(sock, SIOCGSTAMPNS, &ts)) + printf(" %s: %s\n", "SIOCGSTAMPNS", strerror(errno)); + else + printf("SIOCGSTAMPNS %ld.%09ld\n", + (long)ts.tv_sec, + (long)ts.tv_nsec); + } +} + +static void recvpacket(int sock, int recvmsg_flags, + int siocgstamp, int siocgstampns) +{ + char data[256]; + struct msghdr msg; + struct iovec entry; + struct sockaddr_in from_addr; + struct { + struct cmsghdr cm; + char control[512]; + } control; + int res; + + memset(&msg, 0, sizeof(msg)); + msg.msg_iov = &entry; + msg.msg_iovlen = 1; + entry.iov_base = data; + entry.iov_len = sizeof(data); + msg.msg_name = (caddr_t)&from_addr; + msg.msg_namelen = sizeof(from_addr); + msg.msg_control = &control; + msg.msg_controllen = sizeof(control); + + res = recvmsg(sock, &msg, recvmsg_flags|MSG_DONTWAIT); + if (res < 0) { + printf("%s %s: %s\n", + "recvmsg", + (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular", + strerror(errno)); + } else { + printpacket(&msg, res, data, + sock, recvmsg_flags, + siocgstamp, siocgstampns); + } +} + +int main(int argc, char **argv) +{ + int so_timestamping_flags = 0; + int so_timestamp = 0; + int so_timestampns = 0; + int siocgstamp = 0; + int siocgstampns = 0; + int ip_multicast_loop = 0; + char *interface; + int i; + int enabled = 1; + int sock; + struct ifreq device; + struct ifreq hwtstamp; + struct hwtstamp_config hwconfig, hwconfig_requested; + struct sockaddr_in addr; + struct ip_mreq imr; + struct in_addr iaddr; + int val; + socklen_t len; + struct timeval next; + + if (argc < 2) + usage(0); + interface = argv[1]; + + for (i = 2; i < argc; i++) { + if (!strcasecmp(argv[i], "SO_TIMESTAMP")) + so_timestamp = 1; + else if (!strcasecmp(argv[i], "SO_TIMESTAMPNS")) + so_timestampns = 1; + else if (!strcasecmp(argv[i], "SIOCGSTAMP")) + siocgstamp = 1; + else if (!strcasecmp(argv[i], "SIOCGSTAMPNS")) + siocgstampns = 1; + else if (!strcasecmp(argv[i], "IP_MULTICAST_LOOP")) + ip_multicast_loop = 1; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_TX_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_TX_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RX_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SYS_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_SYS_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE; + else + usage(argv[i]); + } + + sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP); + if (socket < 0) + bail("socket"); + + memset(&device, 0, sizeof(device)); + strncpy(device.ifr_name, interface, sizeof(device.ifr_name)); + if (ioctl(sock, SIOCGIFADDR, &device) < 0) + bail("getting interface IP address"); + + memset(&hwtstamp, 0, sizeof(hwtstamp)); + strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name)); + hwtstamp.ifr_data = (void *)&hwconfig; + memset(&hwconfig, 0, sizeof(&hwconfig)); + hwconfig.tx_type = + (so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ? + HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF; + hwconfig.rx_filter = + (so_timestamping_flags & SOF_TIMESTAMPING_RX_HARDWARE) ? + HWTSTAMP_FILTER_PTP_V1_L4_SYNC : HWTSTAMP_FILTER_NONE; + hwconfig_requested = hwconfig; + if (ioctl(sock, SIOCSHWTSTAMP, &hwtstamp) < 0) { + if ((errno == EINVAL || errno == ENOTSUP) && + hwconfig_requested.tx_type == HWTSTAMP_TX_OFF && + hwconfig_requested.rx_filter == HWTSTAMP_FILTER_NONE) + printf("SIOCSHWTSTAMP: disabling hardware time stamping not possible\n"); + else + bail("SIOCSHWTSTAMP"); + } + printf("SIOCSHWTSTAMP: tx_type %d requested, got %d; rx_filter %d requested, got %d\n", + hwconfig_requested.tx_type, hwconfig.tx_type, + hwconfig_requested.rx_filter, hwconfig.rx_filter); + + /* bind to PTP port */ + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = htonl(INADDR_ANY); + addr.sin_port = htons(319 /* PTP event port */); + if (bind(sock, + (struct sockaddr *)&addr, + sizeof(struct sockaddr_in)) < 0) + bail("bind"); + + /* set multicast group for outgoing packets */ + inet_aton("224.0.1.130", &iaddr); /* alternate PTP domain 1 */ + addr.sin_addr = iaddr; + imr.imr_multiaddr.s_addr = iaddr.s_addr; + imr.imr_interface.s_addr = + ((struct sockaddr_in *)&device.ifr_addr)->sin_addr.s_addr; + if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF, + &imr.imr_interface.s_addr, sizeof(struct in_addr)) < 0) + bail("set multicast"); + + /* join multicast group, loop our own packet */ + if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, + &imr, sizeof(struct ip_mreq)) < 0) + bail("join multicast group"); + + if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP, + &ip_multicast_loop, sizeof(enabled)) < 0) { + bail("loop multicast"); + } + + /* set socket options for time stamping */ + if (so_timestamp && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, + &enabled, sizeof(enabled)) < 0) + bail("setsockopt SO_TIMESTAMP"); + + if (so_timestampns && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, + &enabled, sizeof(enabled)) < 0) + bail("setsockopt SO_TIMESTAMPNS"); + + if (so_timestamping_flags && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, + &so_timestamping_flags, + sizeof(so_timestamping_flags)) < 0) + bail("setsockopt SO_TIMESTAMPING"); + + /* request IP_PKTINFO for debugging purposes */ + if (setsockopt(sock, SOL_IP, IP_PKTINFO, + &enabled, sizeof(enabled)) < 0) + printf("%s: %s\n", "setsockopt IP_PKTINFO", strerror(errno)); + + /* verify socket options */ + len = sizeof(val); + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &val, &len) < 0) + printf("%s: %s\n", "getsockopt SO_TIMESTAMP", strerror(errno)); + else + printf("SO_TIMESTAMP %d\n", val); + + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &val, &len) < 0) + printf("%s: %s\n", "getsockopt SO_TIMESTAMPNS", + strerror(errno)); + else + printf("SO_TIMESTAMPNS %d\n", val); + + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &val, &len) < 0) { + printf("%s: %s\n", "getsockopt SO_TIMESTAMPING", + strerror(errno)); + } else { + printf("SO_TIMESTAMPING %d\n", val); + if (val != so_timestamping_flags) + printf(" not the expected value %d\n", + so_timestamping_flags); + } + + /* send packets forever every five seconds */ + gettimeofday(&next, 0); + next.tv_sec = (next.tv_sec + 1) / 5 * 5; + next.tv_usec = 0; + while (1) { + struct timeval now; + struct timeval delta; + long delta_us; + int res; + fd_set readfs, errorfs; + + gettimeofday(&now, 0); + delta_us = (long)(next.tv_sec - now.tv_sec) * 1000000 + + (long)(next.tv_usec - now.tv_usec); + if (delta_us > 0) { + /* continue waiting for timeout or data */ + delta.tv_sec = delta_us / 1000000; + delta.tv_usec = delta_us % 1000000; + + FD_ZERO(&readfs); + FD_ZERO(&errorfs); + FD_SET(sock, &readfs); + FD_SET(sock, &errorfs); + printf("%ld.%06ld: select %ldus\n", + (long)now.tv_sec, (long)now.tv_usec, + delta_us); + res = select(sock + 1, &readfs, 0, &errorfs, &delta); + gettimeofday(&now, 0); + printf("%ld.%06ld: select returned: %d, %s\n", + (long)now.tv_sec, (long)now.tv_usec, + res, + res < 0 ? strerror(errno) : "success"); + if (res > 0) { + if (FD_ISSET(sock, &readfs)) + printf("ready for reading\n"); + if (FD_ISSET(sock, &errorfs)) + printf("has error\n"); + recvpacket(sock, 0, + siocgstamp, + siocgstampns); + recvpacket(sock, MSG_ERRQUEUE, + siocgstamp, + siocgstampns); + } + } else { + /* write one packet */ + sendpacket(sock, + (struct sockaddr *)&addr, + sizeof(addr)); + next.tv_sec += 5; + continue; + } + } + + return 0; +} diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h index a1057c2..3641ec1 100644 --- a/arch/alpha/include/asm/socket.h +++ b/arch/alpha/include/asm/socket.h @@ -62,6 +62,9 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + /* O_NONBLOCK clashes with the bits used for socket types. Therefore we * have to define SOCK_NONBLOCK to a different value here. */ diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h index 6817be9..537de4e 100644 --- a/arch/arm/include/asm/socket.h +++ b/arch/arm/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/avr32/include/asm/socket.h b/arch/avr32/include/asm/socket.h index 35863f2..04c8606 100644 --- a/arch/avr32/include/asm/socket.h +++ b/arch/avr32/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* __ASM_AVR32_SOCKET_H */ diff --git a/arch/blackfin/include/asm/socket.h b/arch/blackfin/include/asm/socket.h index 2ca702e..fac7fe9 100644 --- a/arch/blackfin/include/asm/socket.h +++ b/arch/blackfin/include/asm/socket.h @@ -53,4 +53,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/cris/include/asm/socket.h b/arch/cris/include/asm/socket.h index 9df0ca8..d5cf740 100644 --- a/arch/cris/include/asm/socket.h +++ b/arch/cris/include/asm/socket.h @@ -56,6 +56,9 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h index da2520d..602518a 100644 --- a/arch/h8300/include/asm/socket.h +++ b/arch/h8300/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/ia64/include/asm/socket.h b/arch/ia64/include/asm/socket.h index d5ef0aa..7454212 100644 --- a/arch/ia64/include/asm/socket.h +++ b/arch/ia64/include/asm/socket.h @@ -63,4 +63,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_IA64_SOCKET_H */ diff --git a/arch/m68k/include/asm/socket.h b/arch/m68k/include/asm/socket.h index dbc64e9..ca87f93 100644 --- a/arch/m68k/include/asm/socket.h +++ b/arch/m68k/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/mips/include/asm/socket.h b/arch/mips/include/asm/socket.h index facc2d7..2abca17 100644 --- a/arch/mips/include/asm/socket.h +++ b/arch/mips/include/asm/socket.h @@ -75,6 +75,9 @@ To add: #define SO_REUSEPORT 0x0200 /* Allow local address and port reuse. */ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #ifdef __KERNEL__ /** sock_type - Socket types diff --git a/arch/parisc/include/asm/socket.h b/arch/parisc/include/asm/socket.h index fba402c..885472b 100644 --- a/arch/parisc/include/asm/socket.h +++ b/arch/parisc/include/asm/socket.h @@ -54,6 +54,9 @@ #define SO_MARK 0x401f +#define SO_TIMESTAMPING 0x4020 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + /* O_NONBLOCK clashes with the bits used for socket types. Therefore we * have to define SOCK_NONBLOCK to a different value here. */ diff --git a/arch/powerpc/include/asm/socket.h b/arch/powerpc/include/asm/socket.h index f5a4e16..1e5cfad 100644 --- a/arch/powerpc/include/asm/socket.h +++ b/arch/powerpc/include/asm/socket.h @@ -61,4 +61,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_POWERPC_SOCKET_H */ diff --git a/arch/s390/include/asm/socket.h b/arch/s390/include/asm/socket.h index c786ab6..02330c5 100644 --- a/arch/s390/include/asm/socket.h +++ b/arch/s390/include/asm/socket.h @@ -62,4 +62,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/sh/include/asm/socket.h b/arch/sh/include/asm/socket.h index 6d4bf65..345653b 100644 --- a/arch/sh/include/asm/socket.h +++ b/arch/sh/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* __ASM_SH_SOCKET_H */ diff --git a/arch/sparc/include/asm/socket.h b/arch/sparc/include/asm/socket.h index bf50d0c..982a12f 100644 --- a/arch/sparc/include/asm/socket.h +++ b/arch/sparc/include/asm/socket.h @@ -50,6 +50,9 @@ #define SO_MARK 0x0022 +#define SO_TIMESTAMPING 0x0023 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 0x5001 #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002 diff --git a/arch/x86/include/asm/socket.h b/arch/x86/include/asm/socket.h index 8ab9cc8..ca8bf2c 100644 --- a/arch/x86/include/asm/socket.h +++ b/arch/x86/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_X86_SOCKET_H */ diff --git a/arch/xtensa/include/asm/socket.h b/arch/xtensa/include/asm/socket.h index 6100682..dd1a7a4 100644 --- a/arch/xtensa/include/asm/socket.h +++ b/arch/xtensa/include/asm/socket.h @@ -65,4 +65,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _XTENSA_SOCKET_H */ diff --git a/include/asm-frv/socket.h b/include/asm-frv/socket.h index e51ca67..57c3d40 100644 --- a/include/asm-frv/socket.h +++ b/include/asm-frv/socket.h @@ -54,5 +54,8 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/include/asm-m32r/socket.h b/include/asm-m32r/socket.h index 9a0e200..be7ed58 100644 --- a/include/asm-m32r/socket.h +++ b/include/asm-m32r/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_M32R_SOCKET_H */ diff --git a/include/asm-mn10300/socket.h b/include/asm-mn10300/socket.h index 80af9c4..fb5daf4 100644 --- a/include/asm-mn10300/socket.h +++ b/include/asm-mn10300/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/include/linux/errqueue.h b/include/linux/errqueue.h index ceb1454..ec12cc7 100644 --- a/include/linux/errqueue.h +++ b/include/linux/errqueue.h @@ -18,6 +18,7 @@ struct sock_extended_err #define SO_EE_ORIGIN_LOCAL 1 #define SO_EE_ORIGIN_ICMP 2 #define SO_EE_ORIGIN_ICMP6 3 +#define SO_EE_ORIGIN_TIMESTAMPING 4 #define SO_EE_OFFENDER(ee) ((struct sockaddr*)((ee)+1)) diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h new file mode 100644 index 0000000..a3b8546 --- /dev/null +++ b/include/linux/net_tstamp.h @@ -0,0 +1,104 @@ +/* + * Userspace API for hardware time stamping of network packets + * + * Copyright (C) 2008,2009 Intel Corporation + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + */ + +#ifndef _NET_TIMESTAMPING_H +#define _NET_TIMESTAMPING_H + +#include <linux/socket.h> /* for SO_TIMESTAMPING */ + +/* SO_TIMESTAMPING gets an integer bit field comprised of these values */ +enum { + SOF_TIMESTAMPING_TX_HARDWARE = (1<<0), + SOF_TIMESTAMPING_TX_SOFTWARE = (1<<1), + SOF_TIMESTAMPING_RX_HARDWARE = (1<<2), + SOF_TIMESTAMPING_RX_SOFTWARE = (1<<3), + SOF_TIMESTAMPING_SOFTWARE = (1<<4), + SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5), + SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6), + SOF_TIMESTAMPING_MASK = + (SOF_TIMESTAMPING_RAW_HARDWARE - 1) | + SOF_TIMESTAMPING_RAW_HARDWARE +}; + +/** + * struct hwtstamp_config - %SIOCSHWTSTAMP parameter + * + * @flags: no flags defined right now, must be zero + * @tx_type: one of HWTSTAMP_TX_* + * @rx_type: one of one of HWTSTAMP_FILTER_* + * + * %SIOCSHWTSTAMP expects a &struct ifreq with a ifr_data pointer to + * this structure. dev_ifsioc() in the kernel takes care of the + * translation between 32 bit userspace and 64 bit kernel. The + * structure is intentionally chosen so that it has the same layout on + * 32 and 64 bit systems, don't break this! + */ +struct hwtstamp_config { + int flags; + int tx_type; + int rx_filter; +}; + +/* possible values for hwtstamp_config->tx_type */ +enum { + /* + * No outgoing packet will need hardware time stamping; + * should a packet arrive which asks for it, no hardware + * time stamping will be done. + */ + HWTSTAMP_TX_OFF, + + /* + * Enables hardware time stamping for outgoing packets; + * the sender of the packet decides which are to be + * time stamped by setting %SOF_TIMESTAMPING_TX_SOFTWARE + * before sending the packet. + */ + HWTSTAMP_TX_ON, +}; + +/* possible values for hwtstamp_config->rx_filter */ +enum { + /* time stamp no incoming packet at all */ + HWTSTAMP_FILTER_NONE, + + /* time stamp any incoming packet */ + HWTSTAMP_FILTER_ALL, + + /* return value: time stamp all packets requested plus some others */ + HWTSTAMP_FILTER_SOME, + + /* PTP v1, UDP, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V1_L4_EVENT, + /* PTP v1, UDP, Sync packet */ + HWTSTAMP_FILTER_PTP_V1_L4_SYNC, + /* PTP v1, UDP, Delay_req packet */ + HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ, + /* PTP v2, UDP, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V2_L4_EVENT, + /* PTP v2, UDP, Sync packet */ + HWTSTAMP_FILTER_PTP_V2_L4_SYNC, + /* PTP v2, UDP, Delay_req packet */ + HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ, + + /* 802.AS1, Ethernet, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V2_L2_EVENT, + /* 802.AS1, Ethernet, Sync packet */ + HWTSTAMP_FILTER_PTP_V2_L2_SYNC, + /* 802.AS1, Ethernet, Delay_req packet */ + HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ, + + /* PTP v2/802.AS1, any layer, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V2_EVENT, + /* PTP v2/802.AS1, any layer, Sync packet */ + HWTSTAMP_FILTER_PTP_V2_SYNC, + /* PTP v2/802.AS1, any layer, Delay_req packet */ + HWTSTAMP_FILTER_PTP_V2_DELAY_REQ, +}; + +#endif /* _NET_TIMESTAMPING_H */ diff --git a/include/linux/sockios.h b/include/linux/sockios.h index abef759..241f179 100644 --- a/include/linux/sockios.h +++ b/include/linux/sockios.h @@ -122,6 +122,9 @@ #define SIOCBRADDIF 0x89a2 /* add interface to bridge */ #define SIOCBRDELIF 0x89a3 /* remove interface from bridge */ +/* hardware time stamping: parameters in linux/net_tstamp.h */ +#define SIOCSHWTSTAMP 0x89b0 + /* Device private ioctl calls */ /* -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping 2009-02-12 15:00 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly The additional per-packet information (16 bytes for time stamps, 1 byte for flags) is stored for all packets in the skb_shared_info struct. This implementation detail is hidden from users of that information via skb_* accessor functions. A separate struct resp. union is used for the additional information so that it can be stored/copied easily outside of skb_shared_info. Compared to previous implementations (reusing the tstamp field depending on the context, optional additional structures) this is the simplest solution. It does not extend sk_buff itself. TX time stamping is implemented in software if the device driver doesn't support hardware time stamping. The new semantic for hardware/software time stamping around ndo_start_xmit() is based on two assumptions about existing network device drivers which don't support hardware time stamping and know nothing about it: - they leave the new skb_shared_tx unmodified - the keep the connection to the originating socket in skb->sk alive, i.e., don't call skb_orphan() Given that skb_shared_tx is new, the first assumption is safe. The second is only true for some drivers. As a result, software TX time stamping currently works with the bnx2 driver, but not with the unmodified igb driver (the two drivers this patch series was tested with). --- include/linux/skbuff.h | 91 +++++++++++++++++++++++++++++++++++++++++++++++- net/core/dev.c | 32 ++++++++++++++++- net/core/skbuff.c | 41 +++++++++++++++++++++ 3 files changed, 161 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9247008..f96bc91 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -132,6 +132,57 @@ struct skb_frag_struct { __u32 size; }; +#define HAVE_HW_TIME_STAMP + +/** + * skb_shared_hwtstamps - hardware time stamps + * + * @hwtstamp: hardware time stamp transformed into duration + * since arbitrary point in time + * @syststamp: hwtstamp transformed to system time base + * + * Software time stamps generated by ktime_get_real() are stored in + * skb->tstamp. The relation between the different kinds of time + * stamps is as follows: + * + * syststamp and tstamp can be compared against each other in + * arbitrary combinations. The accuracy of a + * syststamp/tstamp/"syststamp from other device" comparison is + * limited by the accuracy of the transformation into system time + * base. This depends on the device driver and its underlying + * hardware. + * + * hwtstamps can only be compared against other hwtstamps from + * the same device. + * + * This structure is attached to packets as part of the + * &skb_shared_info. Use skb_hwtstamps() to get a pointer. + */ +struct skb_shared_hwtstamps { + ktime_t hwtstamp; + ktime_t syststamp; +}; + +/** + * skb_shared_tx - instructions for time stamping of outgoing packets + * + * @hardware: generate hardware time stamp + * @software: generate software time stamp + * @in_progress: device driver is going to provide + * hardware time stamp + * + * These flags are attached to packets as part of the + * &skb_shared_info. Use skb_tx() to get a pointer. + */ +union skb_shared_tx { + struct { + __u8 hardware:1, + software:1, + in_progress:1; + }; + __u8 flags; +}; + /* This data is invariant across clones and lives at * the end of the header data, ie. at skb->end. */ @@ -143,10 +194,12 @@ struct skb_shared_info { unsigned short gso_segs; unsigned short gso_type; __be32 ip6_frag_id; + union skb_shared_tx tx_flags; #ifdef CONFIG_HAS_DMA unsigned int num_dma_maps; #endif struct sk_buff *frag_list; + struct skb_shared_hwtstamps hwtstamps; skb_frag_t frags[MAX_SKB_FRAGS]; #ifdef CONFIG_HAS_DMA dma_addr_t dma_maps[MAX_SKB_FRAGS + 1]; @@ -465,6 +518,16 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) /* Internal */ #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) +static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb) +{ + return &skb_shinfo(skb)->hwtstamps; +} + +static inline union skb_shared_tx *skb_tx(struct sk_buff *skb) +{ + return &skb_shinfo(skb)->tx_flags; +} + /** * skb_queue_empty - check if a queue is empty * @list: queue head @@ -1730,6 +1793,11 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb, extern void skb_init(void); +static inline ktime_t skb_get_ktime(const struct sk_buff *skb) +{ + return skb->tstamp; +} + /** * skb_get_timestamp - get timestamp from a skb * @skb: skb to get stamp from @@ -1739,11 +1807,18 @@ extern void skb_init(void); * This function converts the offset back to a struct timeval and stores * it in stamp. */ -static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval *stamp) +static inline void skb_get_timestamp(const struct sk_buff *skb, + struct timeval *stamp) { *stamp = ktime_to_timeval(skb->tstamp); } +static inline void skb_get_timestampns(const struct sk_buff *skb, + struct timespec *stamp) +{ + *stamp = ktime_to_timespec(skb->tstamp); +} + static inline void __net_timestamp(struct sk_buff *skb) { skb->tstamp = ktime_get_real(); @@ -1759,6 +1834,20 @@ static inline ktime_t net_invalid_timestamp(void) return ktime_set(0, 0); } +/** + * skb_tstamp_tx - queue clone of skb with send time stamps + * @orig_skb: the original outgoing packet + * @hwtstamps: hardware time stamps, may be NULL if not available + * + * If the skb has a socket associated, then this function clones the + * skb (thus sharing the actual data and optional structures), stores + * the optional hardware time stamping information (if non NULL) or + * generates a software time stamp (otherwise), then queues the clone + * to the error queue of the socket. Errors are silently ignored. + */ +extern void skb_tstamp_tx(struct sk_buff *orig_skb, + struct skb_shared_hwtstamps *hwtstamps); + extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len); extern __sum16 __skb_checksum_complete(struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index 1e27a67..d20c28e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1672,10 +1672,21 @@ static int dev_gso_segment(struct sk_buff *skb) return 0; } +static void tstamp_tx(struct sk_buff *skb) +{ + union skb_shared_tx *shtx = + skb_tx(skb); + if (unlikely(shtx->software && + !shtx->in_progress)) { + skb_tstamp_tx(skb, NULL); + } +} + int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq) { const struct net_device_ops *ops = dev->netdev_ops; + int rc; prefetch(&dev->netdev_ops->ndo_start_xmit); if (likely(!skb->next)) { @@ -1689,13 +1700,29 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, goto gso; } - return ops->ndo_start_xmit(skb, dev); + rc = ops->ndo_start_xmit(skb, dev); + /* + * TODO: if skb_orphan() was called by + * dev->hard_start_xmit() (for example, the unmodified + * igb driver does that; bnx2 doesn't), then + * skb_tx_software_timestamp() will be unable to send + * back the time stamp. + * + * How can this be prevented? Always create another + * reference to the socket before calling + * dev->hard_start_xmit()? Prevent that skb_orphan() + * does anything in dev->hard_start_xmit() by clearing + * the skb destructor before the call and restoring it + * afterwards, then doing the skb_orphan() ourselves? + */ + if (likely(!rc)) + tstamp_tx(skb); + return rc; } gso: do { struct sk_buff *nskb = skb->next; - int rc; skb->next = nskb->next; nskb->next = NULL; @@ -1705,6 +1732,7 @@ gso: skb->next = nskb; return rc; } + tstamp_tx(skb); if (unlikely(netif_tx_queue_stopped(txq) && skb->next)) return NETDEV_TX_BUSY; } while (skb->next); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7657cec..7b831a7 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -55,6 +55,7 @@ #include <linux/rtnetlink.h> #include <linux/init.h> #include <linux/scatterlist.h> +#include <linux/errqueue.h> #include <net/protocol.h> #include <net/dst.h> @@ -215,7 +216,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, shinfo->gso_segs = 0; shinfo->gso_type = 0; shinfo->ip6_frag_id = 0; + shinfo->tx_flags.flags = 0; shinfo->frag_list = NULL; + memset(&shinfo->hwtstamps, 0, sizeof(shinfo->hwtstamps)); if (fclone) { struct sk_buff *child = skb + 1; @@ -2940,6 +2943,44 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer) } EXPORT_SYMBOL_GPL(skb_cow_data); +void skb_tstamp_tx(struct sk_buff *orig_skb, + struct skb_shared_hwtstamps *hwtstamps) +{ + struct sock *sk = orig_skb->sk; + struct sock_exterr_skb *serr; + struct sk_buff *skb; + int err; + + if (!sk) + return; + + skb = skb_clone(orig_skb, GFP_ATOMIC); + if (!skb) + return; + + if (hwtstamps) { + *skb_hwtstamps(skb) = + *hwtstamps; + } else { + /* + * no hardware time stamps available, + * so keep the skb_shared_tx and only + * store software time stamp + */ + skb->tstamp = ktime_get_real(); + } + + serr = SKB_EXT_ERR(skb); + memset(serr, 0, sizeof(*serr)); + serr->ee.ee_errno = ENOMSG; + serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING; + err = sock_queue_err_skb(sk, skb); + if (err) + kfree_skb(skb); +} +EXPORT_SYMBOL_GPL(skb_tstamp_tx); + + /** * skb_partial_csum_set - set up and verify partial csum values for packet * @skb: the skb to set -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING 2009-02-12 15:00 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly The overlap with the old SO_TIMESTAMP[NS] options is handled so that time stamping in software (net_enable_timestamp()) is enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE is set. It's disabled if all of these are off. --- include/net/sock.h | 44 +++++++++++++++++++++++++-- net/compat.c | 19 +++++++---- net/core/sock.c | 81 ++++++++++++++++++++++++++++++++++++++++++------- net/socket.c | 84 +++++++++++++++++++++++++++++++++++++++------------ 4 files changed, 186 insertions(+), 42 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index c047def..0a54d9b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -158,7 +158,7 @@ struct sock_common { * @sk_allocation: allocation mode * @sk_sndbuf: size of send buffer in bytes * @sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, - * %SO_OOBINLINE settings + * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings * @sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets * @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO) * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) @@ -488,6 +488,13 @@ enum sock_flags { SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */ + SOCK_TIMESTAMPING_TX_HARDWARE, /* %SOF_TIMESTAMPING_TX_HARDWARE */ + SOCK_TIMESTAMPING_TX_SOFTWARE, /* %SOF_TIMESTAMPING_TX_SOFTWARE */ + SOCK_TIMESTAMPING_RX_HARDWARE, /* %SOF_TIMESTAMPING_RX_HARDWARE */ + SOCK_TIMESTAMPING_RX_SOFTWARE, /* %SOF_TIMESTAMPING_RX_SOFTWARE */ + SOCK_TIMESTAMPING_SOFTWARE, /* %SOF_TIMESTAMPING_SOFTWARE */ + SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */ + SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */ }; static inline void sock_copy_flags(struct sock *nsk, struct sock *osk) @@ -1346,14 +1353,45 @@ static __inline__ void sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { ktime_t kt = skb->tstamp; + struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); - if (sock_flag(sk, SOCK_RCVTSTAMP)) + /* + * generate control messages if + * - receive time stamping in software requested (SOCK_RCVTSTAMP + * or SOCK_TIMESTAMPING_RX_SOFTWARE) + * - software time stamp available and wanted + * (SOCK_TIMESTAMPING_SOFTWARE) + * - hardware time stamps available and wanted + * (SOCK_TIMESTAMPING_SYS_HARDWARE or + * SOCK_TIMESTAMPING_RAW_HARDWARE) + */ + if (sock_flag(sk, SOCK_RCVTSTAMP) || + sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE) || + (kt.tv64 && sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) || + (hwtstamps->hwtstamp.tv64 && + sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) || + (hwtstamps->syststamp.tv64 && + sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE))) __sock_recv_timestamp(msg, sk, skb); else sk->sk_stamp = kt; } /** + * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped + * @msg: outgoing packet + * @sk: socket sending this packet + * @shtx: filled with instructions for time stamping + * + * Currently only depends on SOCK_TIMESTAMPING* flags. Returns error code if + * parameters are invalid. + */ +extern int sock_tx_timestamp(struct msghdr *msg, + struct sock *sk, + union skb_shared_tx *shtx); + + +/** * sk_eat_skb - Release a skb if it is no longer needed * @sk: socket to eat this skb from * @skb: socket buffer to eat @@ -1421,7 +1459,7 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb) return NULL; } -extern void sock_enable_timestamp(struct sock *sk); +extern void sock_enable_timestamp(struct sock *sk, int flag); extern int sock_get_timestamp(struct sock *, struct timeval __user *); extern int sock_get_timestampns(struct sock *, struct timespec __user *); diff --git a/net/compat.c b/net/compat.c index a3a2ba0..8d73905 100644 --- a/net/compat.c +++ b/net/compat.c @@ -216,7 +216,7 @@ Efault: int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *data) { struct compat_timeval ctv; - struct compat_timespec cts; + struct compat_timespec cts[3]; struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control; struct compat_cmsghdr cmhdr; int cmlen; @@ -233,12 +233,17 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat data = &ctv; len = sizeof(ctv); } - if (level == SOL_SOCKET && type == SCM_TIMESTAMPNS) { + if (level == SOL_SOCKET && + (type == SCM_TIMESTAMPNS || type == SCM_TIMESTAMPING)) { + int count = type == SCM_TIMESTAMPNS ? 1 : 3; + int i; struct timespec *ts = (struct timespec *)data; - cts.tv_sec = ts->tv_sec; - cts.tv_nsec = ts->tv_nsec; + for (i = 0; i < count; i++) { + cts[i].tv_sec = ts[i].tv_sec; + cts[i].tv_nsec = ts[i].tv_nsec; + } data = &cts; - len = sizeof(cts); + len = sizeof(cts[0]) * count; } cmlen = CMSG_COMPAT_LEN(len); @@ -455,7 +460,7 @@ int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp) struct timeval tv; if (!sock_flag(sk, SOCK_TIMESTAMP)) - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); tv = ktime_to_timeval(sk->sk_stamp); if (tv.tv_sec == -1) return err; @@ -479,7 +484,7 @@ int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *usersta struct timespec ts; if (!sock_flag(sk, SOCK_TIMESTAMP)) - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); ts = ktime_to_timespec(sk->sk_stamp); if (ts.tv_sec == -1) return err; diff --git a/net/core/sock.c b/net/core/sock.c index c64996f..d22933a 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -120,6 +120,7 @@ #include <net/net_namespace.h> #include <net/request_sock.h> #include <net/sock.h> +#include <linux/net_tstamp.h> #include <net/xfrm.h> #include <linux/ipsec.h> @@ -255,11 +256,14 @@ static void sock_warn_obsolete_bsdism(const char *name) } } -static void sock_disable_timestamp(struct sock *sk) +static void sock_disable_timestamp(struct sock *sk, int flag) { - if (sock_flag(sk, SOCK_TIMESTAMP)) { - sock_reset_flag(sk, SOCK_TIMESTAMP); - net_disable_timestamp(); + if (sock_flag(sk, flag)) { + sock_reset_flag(sk, flag); + if (!sock_flag(sk, SOCK_TIMESTAMP) && + !sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) { + net_disable_timestamp(); + } } } @@ -614,13 +618,38 @@ set_rcvbuf: else sock_set_flag(sk, SOCK_RCVTSTAMPNS); sock_set_flag(sk, SOCK_RCVTSTAMP); - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); } else { sock_reset_flag(sk, SOCK_RCVTSTAMP); sock_reset_flag(sk, SOCK_RCVTSTAMPNS); } break; + case SO_TIMESTAMPING: + if (val & ~SOF_TIMESTAMPING_MASK) { + ret = EINVAL; + break; + } + sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE, + val & SOF_TIMESTAMPING_TX_HARDWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE, + val & SOF_TIMESTAMPING_TX_SOFTWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE, + val & SOF_TIMESTAMPING_RX_HARDWARE); + if (val & SOF_TIMESTAMPING_RX_SOFTWARE) + sock_enable_timestamp(sk, + SOCK_TIMESTAMPING_RX_SOFTWARE); + else + sock_disable_timestamp(sk, + SOCK_TIMESTAMPING_RX_SOFTWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_SOFTWARE, + val & SOF_TIMESTAMPING_SOFTWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE, + val & SOF_TIMESTAMPING_SYS_HARDWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE, + val & SOF_TIMESTAMPING_RAW_HARDWARE); + break; + case SO_RCVLOWAT: if (val < 0) val = INT_MAX; @@ -766,6 +795,24 @@ int sock_getsockopt(struct socket *sock, int level, int optname, v.val = sock_flag(sk, SOCK_RCVTSTAMPNS); break; + case SO_TIMESTAMPING: + v.val = 0; + if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE)) + v.val |= SOF_TIMESTAMPING_TX_HARDWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE)) + v.val |= SOF_TIMESTAMPING_TX_SOFTWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE)) + v.val |= SOF_TIMESTAMPING_RX_HARDWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) + v.val |= SOF_TIMESTAMPING_RX_SOFTWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) + v.val |= SOF_TIMESTAMPING_SOFTWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE)) + v.val |= SOF_TIMESTAMPING_SYS_HARDWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) + v.val |= SOF_TIMESTAMPING_RAW_HARDWARE; + break; + case SO_RCVTIMEO: lv=sizeof(struct timeval); if (sk->sk_rcvtimeo == MAX_SCHEDULE_TIMEOUT) { @@ -967,7 +1014,8 @@ void sk_free(struct sock *sk) rcu_assign_pointer(sk->sk_filter, NULL); } - sock_disable_timestamp(sk); + sock_disable_timestamp(sk, SOCK_TIMESTAMP); + sock_disable_timestamp(sk, SOCK_TIMESTAMPING_RX_SOFTWARE); if (atomic_read(&sk->sk_omem_alloc)) printk(KERN_DEBUG "%s: optmem leakage (%d bytes) detected.\n", @@ -1785,7 +1833,7 @@ int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp) { struct timeval tv; if (!sock_flag(sk, SOCK_TIMESTAMP)) - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); tv = ktime_to_timeval(sk->sk_stamp); if (tv.tv_sec == -1) return -ENOENT; @@ -1801,7 +1849,7 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp) { struct timespec ts; if (!sock_flag(sk, SOCK_TIMESTAMP)) - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); ts = ktime_to_timespec(sk->sk_stamp); if (ts.tv_sec == -1) return -ENOENT; @@ -1813,11 +1861,20 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp) } EXPORT_SYMBOL(sock_get_timestampns); -void sock_enable_timestamp(struct sock *sk) +void sock_enable_timestamp(struct sock *sk, int flag) { - if (!sock_flag(sk, SOCK_TIMESTAMP)) { - sock_set_flag(sk, SOCK_TIMESTAMP); - net_enable_timestamp(); + if (!sock_flag(sk, flag)) { + sock_set_flag(sk, flag); + /* + * we just set one of the two flags which require net + * time stamping, but time stamping might have been on + * already because of the other one + */ + if (!sock_flag(sk, + flag == SOCK_TIMESTAMP ? + SOCK_TIMESTAMPING_RX_SOFTWARE : + SOCK_TIMESTAMP)) + net_enable_timestamp(); } } diff --git a/net/socket.c b/net/socket.c index 35dd737..47a3dc0 100644 --- a/net/socket.c +++ b/net/socket.c @@ -545,6 +545,18 @@ void sock_release(struct socket *sock) sock->file = NULL; } +int sock_tx_timestamp(struct msghdr *msg, struct sock *sk, + union skb_shared_tx *shtx) +{ + shtx->flags = 0; + if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE)) + shtx->hardware = 1; + if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE)) + shtx->software = 1; + return 0; +} +EXPORT_SYMBOL(sock_tx_timestamp); + static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size) { @@ -595,33 +607,65 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg, return result; } +static int ktime2ts(ktime_t kt, struct timespec *ts) +{ + if (kt.tv64) { + *ts = ktime_to_timespec(kt); + return 1; + } else { + return 0; + } +} + /* * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP) */ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { - ktime_t kt = skb->tstamp; - - if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) { - struct timeval tv; - /* Race occurred between timestamp enabling and packet - receiving. Fill in the current time for now. */ - if (kt.tv64 == 0) - kt = ktime_get_real(); - skb->tstamp = kt; - tv = ktime_to_timeval(kt); - put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP, sizeof(tv), &tv); - } else { - struct timespec ts; - /* Race occurred between timestamp enabling and packet - receiving. Fill in the current time for now. */ - if (kt.tv64 == 0) - kt = ktime_get_real(); - skb->tstamp = kt; - ts = ktime_to_timespec(kt); - put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS, sizeof(ts), &ts); + int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP); + struct timespec ts[3]; + int empty = 1; + struct skb_shared_hwtstamps *shhwtstamps = + skb_hwtstamps(skb); + + /* Race occurred between timestamp enabling and packet + receiving. Fill in the current time for now. */ + if (need_software_tstamp && skb->tstamp.tv64 == 0) + __net_timestamp(skb); + + if (need_software_tstamp) { + if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) { + struct timeval tv; + skb_get_timestamp(skb, &tv); + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP, + sizeof(tv), &tv); + } else { + struct timespec ts; + skb_get_timestampns(skb, &ts); + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS, + sizeof(ts), &ts); + } + } + + + memset(ts, 0, sizeof(ts)); + if (skb->tstamp.tv64 && + sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) { + skb_get_timestampns(skb, ts + 0); + empty = 0; + } + if (shhwtstamps) { + if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE) && + ktime2ts(shhwtstamps->syststamp, ts + 1)) + empty = 0; + if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE) && + ktime2ts(shhwtstamps->hwtstamp, ts + 2)) + empty = 0; } + if (!empty) + put_cmsg(msg, SOL_SOCKET, + SCM_TIMESTAMPING, sizeof(ts), &ts); } EXPORT_SYMBOL_GPL(__sock_recv_timestamp); -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets 2009-02-12 15:00 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Instructions for time stamping outgoing packets are take from the socket layer and later copied into the new skb. --- Documentation/networking/timestamping.txt | 2 ++ include/net/ip.h | 1 + net/can/raw.c | 3 +++ net/ipv4/icmp.c | 2 ++ net/ipv4/ip_output.c | 6 ++++++ net/ipv4/raw.c | 1 + net/ipv4/udp.c | 4 ++++ 7 files changed, 19 insertions(+), 0 deletions(-) diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index a681a65..0e58b45 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -56,6 +56,8 @@ and including the link layer, the scm_timestamping control message and a sock_extended_err control message with ee_errno==ENOMSG and ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending bounced packet is ready for reading as far as select() is concerned. +If the outgoing packet has to be fragmented, then only the first +fragment is time stamped and returned to the sending socket. All three values correspond to the same event in time, but were generated in different ways. Each of these values may be empty (= all diff --git a/include/net/ip.h b/include/net/ip.h index 1086813..4ac7577 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -55,6 +55,7 @@ struct ipcm_cookie __be32 addr; int oif; struct ip_options *opt; + union skb_shared_tx shtx; }; #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb)) diff --git a/net/can/raw.c b/net/can/raw.c index 0703cba..6aa154e 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -648,6 +648,9 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock, err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size); if (err < 0) goto free_skb; + err = sock_tx_timestamp(msg, sk, skb_tx(skb)); + if (err < 0) + goto free_skb; skb->dev = dev; skb->sk = sk; diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 705b33b..382800a 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -375,6 +375,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) inet->tos = ip_hdr(skb)->tos; daddr = ipc.addr = rt->rt_src; ipc.opt = NULL; + ipc.shtx.flags = 0; if (icmp_param->replyopts.optlen) { ipc.opt = &icmp_param->replyopts; if (ipc.opt->srr) @@ -532,6 +533,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) inet_sk(sk)->tos = tos; ipc.addr = iph->saddr; ipc.opt = &icmp_param.replyopts; + ipc.shtx.flags = 0; { struct flowi fl = { diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 8ebe86d..3e7e910 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -935,6 +935,10 @@ alloc_new_skb: sk->sk_allocation); if (unlikely(skb == NULL)) err = -ENOBUFS; + else + /* only the initial fragment is + time stamped */ + ipc->shtx.flags = 0; } if (skb == NULL) goto error; @@ -945,6 +949,7 @@ alloc_new_skb: skb->ip_summed = csummode; skb->csum = 0; skb_reserve(skb, hh_len); + *skb_tx(skb) = ipc->shtx; /* * Find where to start putting bytes. @@ -1364,6 +1369,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar daddr = ipc.addr = rt->rt_src; ipc.opt = NULL; + ipc.shtx.flags = 0; if (replyopts.opt.optlen) { ipc.opt = &replyopts.opt; diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index dff8bc4..f774651 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -493,6 +493,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, ipc.addr = inet->saddr; ipc.opt = NULL; + ipc.shtx.flags = 0; ipc.oif = sk->sk_bound_dev_if; if (msg->msg_controllen) { diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c47c989..4bd178a 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -596,6 +596,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, return -EOPNOTSUPP; ipc.opt = NULL; + ipc.shtx.flags = 0; if (up->pending) { /* @@ -643,6 +644,9 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, ipc.addr = inet->saddr; ipc.oif = sk->sk_bound_dev_if; + err = sock_tx_timestamp(msg, sk, &ipc.shtx); + if (err) + return err; if (msg->msg_controllen) { err = ip_cmsg_send(sock_net(sk), msg, &ipc); if (err) -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers 2009-02-12 15:00 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly --- fs/compat_ioctl.c | 6 ++++++ net/core/dev.c | 2 ++ 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c index c03c10d..ab7585a 100644 --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -522,6 +522,11 @@ static int dev_ifsioc(unsigned int fd, unsigned int cmd, unsigned long arg) if (err) return -EFAULT; break; + case SIOCSHWTSTAMP: + if (copy_from_user(&ifr, uifr32, sizeof(*uifr32))) + return -EFAULT; + ifr.ifr_data = compat_ptr(uifr32->ifr_ifru.ifru_data); + break; default: if (copy_from_user(&ifr, uifr32, sizeof(*uifr32))) return -EFAULT; @@ -2563,6 +2568,7 @@ HANDLE_IOCTL(SIOCSIFMAP, dev_ifsioc) HANDLE_IOCTL(SIOCGIFADDR, dev_ifsioc) HANDLE_IOCTL(SIOCSIFADDR, dev_ifsioc) HANDLE_IOCTL(SIOCSIFHWBROADCAST, dev_ifsioc) +HANDLE_IOCTL(SIOCSHWTSTAMP, dev_ifsioc) /* ioctls used by appletalk ddp.c */ HANDLE_IOCTL(SIOCATALKDIFADDR, dev_ifsioc) diff --git a/net/core/dev.c b/net/core/dev.c index d20c28e..d393fc9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4019,6 +4019,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd) cmd == SIOCSMIIREG || cmd == SIOCBRADDIF || cmd == SIOCBRDELIF || + cmd == SIOCSHWTSTAMP || cmd == SIOCWANDEV) { err = -EOPNOTSUPP; if (ops->ndo_do_ioctl) { @@ -4173,6 +4174,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg) case SIOCBONDCHANGEACTIVE: case SIOCBRADDIF: case SIOCBRDELIF: + case SIOCSHWTSTAMP: if (!capable(CAP_NET_ADMIN)) return -EPERM; /* fall through */ -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 08/10] igb: access to NIC time 2009-02-12 15:00 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Adds the register definitions and code to read the time register. Signed-off-by: John Ronciak <john.ronciak@intel.com> --- drivers/net/igb/e1000_regs.h | 28 +++++++++++ drivers/net/igb/igb.h | 4 ++ drivers/net/igb/igb_main.c | 111 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 143 insertions(+), 0 deletions(-) diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h index 5038b73..64d95cd 100644 --- a/drivers/net/igb/e1000_regs.h +++ b/drivers/net/igb/e1000_regs.h @@ -75,6 +75,34 @@ #define E1000_FCRTH 0x02168 /* Flow Control Receive Threshold High - RW */ #define E1000_RDFPCQ(_n) (0x02430 + (0x4 * (_n))) #define E1000_FCRTV 0x02460 /* Flow Control Refresh Timer Value - RW */ + +/* IEEE 1588 TIMESYNCH */ +#define E1000_TSYNCTXCTL 0x0B614 +#define E1000_TSYNCRXCTL 0x0B620 +#define E1000_TSYNCRXCFG 0x05F50 + +#define E1000_SYSTIML 0x0B600 +#define E1000_SYSTIMH 0x0B604 +#define E1000_TIMINCA 0x0B608 + +#define E1000_RXMTRL 0x0B634 +#define E1000_RXSTMPL 0x0B624 +#define E1000_RXSTMPH 0x0B628 +#define E1000_RXSATRL 0x0B62C +#define E1000_RXSATRH 0x0B630 + +#define E1000_TXSTMPL 0x0B618 +#define E1000_TXSTMPH 0x0B61C + +#define E1000_ETQF0 0x05CB0 +#define E1000_ETQF1 0x05CB4 +#define E1000_ETQF2 0x05CB8 +#define E1000_ETQF3 0x05CBC +#define E1000_ETQF4 0x05CC0 +#define E1000_ETQF5 0x05CC4 +#define E1000_ETQF6 0x05CC8 +#define E1000_ETQF7 0x05CCC + /* Split and Replication RX Control - RW */ /* * Convenience macros diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h index e507449..797a9fe 100644 --- a/drivers/net/igb/igb.h +++ b/drivers/net/igb/igb.h @@ -34,6 +34,8 @@ #include "e1000_mac.h" #include "e1000_82575.h" +#include <linux/clocksource.h> + struct igb_adapter; /* Interrupt defines */ @@ -251,6 +253,8 @@ struct igb_adapter { struct napi_struct napi; struct pci_dev *pdev; struct net_device_stats net_stats; + struct cyclecounter cycles; + struct timecounter clock; /* structs defined in e1000_hw.h */ struct e1000_hw hw; diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index f8c2919..8b2ba42 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -175,6 +175,54 @@ MODULE_DESCRIPTION("Intel(R) Gigabit Ethernet Network Driver"); MODULE_LICENSE("GPL"); MODULE_VERSION(DRV_VERSION); +/** + * Scale the NIC clock cycle by a large factor so that + * relatively small clock corrections can be added or + * substracted at each clock tick. The drawbacks of a + * large factor are a) that the clock register overflows + * more quickly (not such a big deal) and b) that the + * increment per tick has to fit into 24 bits. + * + * Note that + * TIMINCA = IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS * + * IGB_TSYNC_SCALE + * TIMINCA += TIMINCA * adjustment [ppm] / 1e9 + * + * The base scale factor is intentionally a power of two + * so that the division in %struct timecounter can be done with + * a shift. + */ +#define IGB_TSYNC_SHIFT (19) +#define IGB_TSYNC_SCALE (1<<IGB_TSYNC_SHIFT) + +/** + * The duration of one clock cycle of the NIC. + * + * @todo This hard-coded value is part of the specification and might change + * in future hardware revisions. Add revision check. + */ +#define IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS 16 + +#if (IGB_TSYNC_SCALE * IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS) >= (1<<24) +# error IGB_TSYNC_SCALE and/or IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS are too large to fit into TIMINCA +#endif + +/** + * igb_read_clock - read raw cycle counter (to be used by time counter) + */ +static cycle_t igb_read_clock(const struct cyclecounter *tc) +{ + struct igb_adapter *adapter = + container_of(tc, struct igb_adapter, cycles); + struct e1000_hw *hw = &adapter->hw; + u64 stamp; + + stamp = rd32(E1000_SYSTIML); + stamp |= (u64)rd32(E1000_SYSTIMH) << 32ULL; + + return stamp; +} + #ifdef DEBUG /** * igb_get_hw_dev_name - return device name string @@ -185,6 +233,29 @@ char *igb_get_hw_dev_name(struct e1000_hw *hw) struct igb_adapter *adapter = hw->back; return adapter->netdev->name; } + +/** + * igb_get_time_str - format current NIC and system time as string + */ +static char *igb_get_time_str(struct igb_adapter *adapter, + char buffer[160]) +{ + cycle_t hw = adapter->cycles.read(&adapter->cycles); + struct timespec nic = ns_to_timespec(timecounter_read(&adapter->clock)); + struct timespec sys; + struct timespec delta; + getnstimeofday(&sys); + + delta = timespec_sub(nic, sys); + + sprintf(buffer, + "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns", + (long)nic.tv_sec, nic.tv_nsec, + (long)sys.tv_sec, sys.tv_nsec, + (long)delta.tv_sec, delta.tv_nsec); + + return buffer; +} #endif /** @@ -1298,6 +1369,46 @@ static int __devinit igb_probe(struct pci_dev *pdev, } #endif + /* + * Initialize hardware timer: we keep it running just in case + * that some program needs it later on. + */ + memset(&adapter->cycles, 0, sizeof(adapter->cycles)); + adapter->cycles.read = igb_read_clock; + adapter->cycles.mask = CLOCKSOURCE_MASK(64); + adapter->cycles.mult = 1; + adapter->cycles.shift = IGB_TSYNC_SHIFT; + wr32(E1000_TIMINCA, + (1<<24) | + IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS * IGB_TSYNC_SCALE); +#if 0 + /* + * Avoid rollover while we initialize by resetting the time counter. + */ + wr32(E1000_SYSTIML, 0x00000000); + wr32(E1000_SYSTIMH, 0x00000000); +#else + /* + * Set registers so that rollover occurs soon to test this. + */ + wr32(E1000_SYSTIML, 0x00000000); + wr32(E1000_SYSTIMH, 0xFF800000); +#endif + wrfl(); + timecounter_init(&adapter->clock, + &adapter->cycles, + ktime_to_ns(ktime_get_real())); + +#ifdef DEBUG + { + char buffer[160]; + printk(KERN_DEBUG + "igb: %s: hw %p initialized timer\n", + igb_get_time_str(adapter, buffer), + &adapter->hw); + } +#endif + dev_info(&pdev->dev, "Intel(R) Gigabit Ethernet Network Connection\n"); /* print bus type/speed/width info */ dev_info(&pdev->dev, "%s: (PCIe:%s:%s) %pM\n", -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP 2009-02-12 15:00 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Signed-off-by: John Ronciak <john.ronciak@intel.com> --- drivers/net/igb/igb_main.c | 31 +++++++++++++++++++++++++++++++ 1 files changed, 31 insertions(+), 0 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index 8b2ba42..90090bb 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -34,6 +34,7 @@ #include <linux/ipv6.h> #include <net/checksum.h> #include <net/ip6_checksum.h> +#include <linux/net_tstamp.h> #include <linux/mii.h> #include <linux/ethtool.h> #include <linux/if_vlan.h> @@ -4182,6 +4183,34 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) } /** + * igb_hwtstamp_ioctl - control hardware time stamping + * @netdev: + * @ifreq: + * @cmd: + * + * Currently cannot enable any kind of hardware time stamping, but + * supports SIOCSHWTSTAMP in general. + **/ +static int igb_hwtstamp_ioctl(struct net_device *netdev, + struct ifreq *ifr, int cmd) +{ + struct hwtstamp_config config; + + if (copy_from_user(&config, ifr->ifr_data, sizeof(config))) + return -EFAULT; + + /* reserved for future extensions */ + if (config.flags) + return -EINVAL; + + if (config.tx_type == HWTSTAMP_TX_OFF && + config.rx_filter == HWTSTAMP_FILTER_NONE) + return 0; + + return -ERANGE; +} + +/** * igb_ioctl - * @netdev: * @ifreq: @@ -4194,6 +4223,8 @@ static int igb_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) case SIOCGMIIREG: case SIOCSMIIREG: return igb_mii_ioctl(netdev, ifr, cmd); + case SIOCSHWTSTAMP: + return igb_hwtstamp_ioctl(netdev, ifr, cmd); default: return -EOPNOTSUPP; } -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping 2009-02-12 15:00 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly @ 2009-02-12 15:00 ` Patrick Ohly 0 siblings, 0 replies; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Both TX and RX hardware time stamping are implemented. Due to hardware limitations it is not possible to verify reliably which packet was time stamped when multiple were pending for sending; this could be solved by only allowing one packet marked for hardware time stamping into the queue (not implemented yet). RX time stamping relies on the flag in the packet descriptor which marks packets that were time stamped. In "all packet" mode this flag is not set. TODO: also support that mode (even though it'll suffer from race conditions). Signed-off-by: John Ronciak <john.ronciak@intel.com> --- drivers/net/igb/e1000_82575.h | 1 + drivers/net/igb/e1000_defines.h | 1 + drivers/net/igb/e1000_regs.h | 40 ++++++ drivers/net/igb/igb.h | 4 + drivers/net/igb/igb_main.c | 266 +++++++++++++++++++++++++++++++++++++-- 5 files changed, 304 insertions(+), 8 deletions(-) diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h index dd50237..e613d5a 100644 --- a/drivers/net/igb/e1000_82575.h +++ b/drivers/net/igb/e1000_82575.h @@ -116,6 +116,7 @@ union e1000_adv_tx_desc { }; /* Adv Transmit Descriptor Config Masks */ +#define E1000_ADVTXD_MAC_TSTAMP 0x00080000 /* IEEE1588 Timestamp packet */ #define E1000_ADVTXD_DTYP_CTXT 0x00200000 /* Advanced Context Descriptor */ #define E1000_ADVTXD_DTYP_DATA 0x00300000 /* Advanced Data Descriptor */ #define E1000_ADVTXD_DCMD_IFCS 0x02000000 /* Insert FCS (Ethernet CRC) */ diff --git a/drivers/net/igb/e1000_defines.h b/drivers/net/igb/e1000_defines.h index 5342e23..79168ee 100644 --- a/drivers/net/igb/e1000_defines.h +++ b/drivers/net/igb/e1000_defines.h @@ -104,6 +104,7 @@ #define E1000_RXD_STAT_UDPCS 0x10 /* UDP xsum calculated */ #define E1000_RXD_STAT_TCPCS 0x20 /* TCP xsum calculated */ #define E1000_RXD_STAT_DYNINT 0x800 /* Pkt caused INT via DYNINT */ +#define E1000_RXD_STAT_TS 0x10000 /* Pkt was time stamped */ #define E1000_RXD_ERR_CE 0x01 /* CRC Error */ #define E1000_RXD_ERR_SE 0x02 /* Symbol Error */ #define E1000_RXD_ERR_SEQ 0x04 /* Sequence Error */ diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h index 64d95cd..1fb19ca 100644 --- a/drivers/net/igb/e1000_regs.h +++ b/drivers/net/igb/e1000_regs.h @@ -78,9 +78,37 @@ /* IEEE 1588 TIMESYNCH */ #define E1000_TSYNCTXCTL 0x0B614 +#define E1000_TSYNCTXCTL_VALID (1<<0) +#define E1000_TSYNCTXCTL_ENABLED (1<<4) #define E1000_TSYNCRXCTL 0x0B620 +#define E1000_TSYNCRXCTL_VALID (1<<0) +#define E1000_TSYNCRXCTL_ENABLED (1<<4) +enum { + E1000_TSYNCRXCTL_TYPE_L2_V2 = 0, + E1000_TSYNCRXCTL_TYPE_L4_V1 = (1<<1), + E1000_TSYNCRXCTL_TYPE_L2_L4_V2 = (1<<2), + E1000_TSYNCRXCTL_TYPE_ALL = (1<<3), + E1000_TSYNCRXCTL_TYPE_EVENT_V2 = (1<<3) | (1<<1), +}; #define E1000_TSYNCRXCFG 0x05F50 +enum { + E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE = 0<<0, + E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE = 1<<0, + E1000_TSYNCRXCFG_PTP_V1_FOLLOWUP_MESSAGE = 2<<0, + E1000_TSYNCRXCFG_PTP_V1_DELAY_RESP_MESSAGE = 3<<0, + E1000_TSYNCRXCFG_PTP_V1_MANAGEMENT_MESSAGE = 4<<0, + E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE = 0<<8, + E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE = 1<<8, + E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_REQ_MESSAGE = 2<<8, + E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_RESP_MESSAGE = 3<<8, + E1000_TSYNCRXCFG_PTP_V2_FOLLOWUP_MESSAGE = 8<<8, + E1000_TSYNCRXCFG_PTP_V2_DELAY_RESP_MESSAGE = 9<<8, + E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_FOLLOWUP_MESSAGE = 0xA<<8, + E1000_TSYNCRXCFG_PTP_V2_ANNOUNCE_MESSAGE = 0xB<<8, + E1000_TSYNCRXCFG_PTP_V2_SIGNALLING_MESSAGE = 0xC<<8, + E1000_TSYNCRXCFG_PTP_V2_MANAGEMENT_MESSAGE = 0xD<<8, +}; #define E1000_SYSTIML 0x0B600 #define E1000_SYSTIMH 0x0B604 #define E1000_TIMINCA 0x0B608 @@ -103,6 +131,18 @@ #define E1000_ETQF6 0x05CC8 #define E1000_ETQF7 0x05CCC +/* Filtering Registers */ +#define E1000_SAQF(_n) (0x5980 + 4 * (_n)) +#define E1000_DAQF(_n) (0x59A0 + 4 * (_n)) +#define E1000_SPQF(_n) (0x59C0 + 4 * (_n)) +#define E1000_FTQF(_n) (0x59E0 + 4 * (_n)) +#define E1000_SAQF0 E1000_SAQF(0) +#define E1000_DAQF0 E1000_DAQF(0) +#define E1000_SPQF0 E1000_SPQF(0) +#define E1000_FTQF0 E1000_FTQF(0) +#define E1000_SYNQF(_n) (0x055FC + (4 * (_n))) /* SYN Packet Queue Fltr */ +#define E1000_ETQF(_n) (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */ + /* Split and Replication RX Control - RW */ /* * Convenience macros diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h index 797a9fe..bb8c35c 100644 --- a/drivers/net/igb/igb.h +++ b/drivers/net/igb/igb.h @@ -35,6 +35,8 @@ #include "e1000_82575.h" #include <linux/clocksource.h> +#include <linux/timecompare.h> +#include <linux/net_tstamp.h> struct igb_adapter; @@ -255,6 +257,8 @@ struct igb_adapter { struct net_device_stats net_stats; struct cyclecounter cycles; struct timecounter clock; + struct timecompare compare; + struct hwtstamp_config hwtstamp_config; /* structs defined in e1000_hw.h */ struct e1000_hw hw; diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index 90090bb..f9d576b 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -250,7 +250,8 @@ static char *igb_get_time_str(struct igb_adapter *adapter, delta = timespec_sub(nic, sys); sprintf(buffer, - "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns", + "HW %llu, NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns", + hw, (long)nic.tv_sec, nic.tv_nsec, (long)sys.tv_sec, sys.tv_nsec, (long)delta.tv_sec, delta.tv_nsec); @@ -1400,6 +1401,18 @@ static int __devinit igb_probe(struct pci_dev *pdev, &adapter->cycles, ktime_to_ns(ktime_get_real())); + /* + * Synchronize our NIC clock against system wall clock. NIC + * time stamp reading requires ~3us per sample, each sample + * was pretty stable even under load => only require 10 + * samples for each offset comparison. + */ + memset(&adapter->compare, 0, sizeof(adapter->compare)); + adapter->compare.source = &adapter->clock; + adapter->compare.target = ktime_get_real; + adapter->compare.num_samples = 10; + timecompare_update(&adapter->compare, 0); + #ifdef DEBUG { char buffer[160]; @@ -2748,6 +2761,7 @@ set_itr_now: #define IGB_TX_FLAGS_VLAN 0x00000002 #define IGB_TX_FLAGS_TSO 0x00000004 #define IGB_TX_FLAGS_IPV4 0x00000008 +#define IGB_TX_FLAGS_TSTAMP 0x00000010 #define IGB_TX_FLAGS_VLAN_MASK 0xffff0000 #define IGB_TX_FLAGS_VLAN_SHIFT 16 @@ -2975,6 +2989,9 @@ static inline void igb_tx_queue_adv(struct igb_adapter *adapter, if (tx_flags & IGB_TX_FLAGS_VLAN) cmd_type_len |= E1000_ADVTXD_DCMD_VLE; + if (tx_flags & IGB_TX_FLAGS_TSTAMP) + cmd_type_len |= E1000_ADVTXD_MAC_TSTAMP; + if (tx_flags & IGB_TX_FLAGS_TSO) { cmd_type_len |= E1000_ADVTXD_DCMD_TSE; @@ -3065,6 +3082,7 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb, unsigned int tx_flags = 0; u8 hdr_len = 0; int tso = 0; + union skb_shared_tx *shtx; if (test_bit(__IGB_DOWN, &adapter->state)) { dev_kfree_skb_any(skb); @@ -3085,7 +3103,29 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb, /* this is a hard error */ return NETDEV_TX_BUSY; } - skb_orphan(skb); + + /* + * TODO: check that there currently is no other packet with + * time stamping in the queue + * + * When doing time stamping, keep the connection to the socket + * a while longer: it is still needed by skb_hwtstamp_tx(), + * called either in igb_tx_hwtstamp() or by our caller when + * doing software time stamping. + */ + shtx = skb_tx(skb); + if (unlikely(shtx->hardware)) { + shtx->in_progress = 1; + tx_flags |= IGB_TX_FLAGS_TSTAMP; + } else if (likely(!shtx->software)) { + /* + * TODO: can this be solved in dev.c:dev_hard_start_xmit()? + * There are probably unmodified driver which do something + * like this and thus don't work in combination with + * SOF_TIMESTAMPING_TX_SOFTWARE. + */ + skb_orphan(skb); + } if (adapter->vlgrp && vlan_tx_tag_present(skb)) { tx_flags |= IGB_TX_FLAGS_VLAN; @@ -3744,6 +3784,43 @@ static int igb_clean_rx_ring_msix(struct napi_struct *napi, int budget) } /** + * igb_hwtstamp - utility function which checks for TX time stamp + * @adapter: board private structure + * @skb: packet that was just sent + * + * If we were asked to do hardware stamping and such a time stamp is + * available, then it must have been for this skb here because we only + * allow only one such packet into the queue. + */ +static void igb_tx_hwtstamp(struct igb_adapter *adapter, struct sk_buff *skb) +{ + union skb_shared_tx *shtx = skb_tx(skb); + struct e1000_hw *hw = &adapter->hw; + + if (unlikely(shtx->hardware)) { + u32 valid = rd32(E1000_TSYNCTXCTL) & E1000_TSYNCTXCTL_VALID; + if (valid) { + u64 regval = rd32(E1000_TXSTMPL); + u64 ns; + struct skb_shared_hwtstamps shhwtstamps; + + memset(&shhwtstamps, 0, sizeof(shhwtstamps)); + regval |= (u64)rd32(E1000_TXSTMPH) << 32; + ns = timecounter_cyc2time(&adapter->clock, + regval); + timecompare_update(&adapter->compare, ns); + shhwtstamps.hwtstamp = ns_to_ktime(ns); + shhwtstamps.syststamp = + timecompare_transform(&adapter->compare, ns); + skb_tstamp_tx(skb, &shhwtstamps); + } + + /* delayed orphaning: skb_tstamp_tx() needs the socket */ + skb_orphan(skb); + } +} + +/** * igb_clean_tx_irq - Reclaim resources after transmit completes * @adapter: board private structure * returns true if ring is completely cleaned @@ -3781,6 +3858,8 @@ static bool igb_clean_tx_irq(struct igb_ring *tx_ring) skb->len; total_packets += segs; total_bytes += bytecount; + + igb_tx_hwtstamp(adapter, skb); } igb_unmap_and_free_tx_resource(adapter, buffer_info); @@ -3914,6 +3993,7 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring, { struct igb_adapter *adapter = rx_ring->adapter; struct net_device *netdev = adapter->netdev; + struct e1000_hw *hw = &adapter->hw; struct pci_dev *pdev = adapter->pdev; union e1000_adv_rx_desc *rx_desc , *next_rxd; struct igb_buffer *buffer_info , *next_buffer; @@ -4006,6 +4086,47 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring, goto next_desc; } send_up: + /* + * If this bit is set, then the RX registers contain + * the time stamp. No other packet will be time + * stamped until we read these registers, so read the + * registers to make them available again. Because + * only one packet can be time stamped at a time, we + * know that the register values must belong to this + * one here and therefore we don't need to compare + * any of the additional attributes stored for it. + * + * If nothing went wrong, then it should have a + * skb_shared_tx that we can turn into a + * skb_shared_hwtstamps. + * + * TODO: can time stamping be triggered (thus locking + * the registers) without the packet reaching this point + * here? In that case RX time stamping would get stuck. + * + * TODO: in "time stamp all packets" mode this bit is + * not set. Need a global flag for this mode and then + * always read the registers. Cannot be done without + * a race condition. + */ + if (unlikely(staterr & E1000_RXD_STAT_TS)) { + u64 regval; + u64 ns; + struct skb_shared_hwtstamps *shhwtstamps = + skb_hwtstamps(skb); + + WARN(!(rd32(E1000_TSYNCRXCTL) & E1000_TSYNCRXCTL_VALID), + "igb: no RX time stamp available for time stamped packet"); + regval = rd32(E1000_RXSTMPL); + regval |= (u64)rd32(E1000_RXSTMPH) << 32; + ns = timecounter_cyc2time(&adapter->clock, regval); + timecompare_update(&adapter->compare, ns); + memset(shhwtstamps, 0, sizeof(*shhwtstamps)); + shhwtstamps->hwtstamp = ns_to_ktime(ns); + shhwtstamps->syststamp = + timecompare_transform(&adapter->compare, ns); + } + if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) { dev_kfree_skb_irq(skb); goto next_desc; @@ -4188,13 +4309,33 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) * @ifreq: * @cmd: * - * Currently cannot enable any kind of hardware time stamping, but - * supports SIOCSHWTSTAMP in general. + * Outgoing time stamping can be enabled and disabled. Play nice and + * disable it when requested, although it shouldn't case any overhead + * when no packet needs it. At most one packet in the queue may be + * marked for time stamping, otherwise it would be impossible to tell + * for sure to which packet the hardware time stamp belongs. + * + * Incoming time stamping has to be configured via the hardware + * filters. Not all combinations are supported, in particular event + * type has to be specified. Matching the kind of event packet is + * not supported, with the exception of "all V2 events regardless of + * level 2 or 4". + * **/ static int igb_hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) { + struct igb_adapter *adapter = netdev_priv(netdev); + struct e1000_hw *hw = &adapter->hw; struct hwtstamp_config config; + u32 tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED; + u32 tsync_rx_ctl_bit = E1000_TSYNCRXCTL_ENABLED; + u32 tsync_rx_ctl_type = 0; + u32 tsync_rx_cfg = 0; + int is_l4 = 0; + int is_l2 = 0; + short port = 319; /* PTP */ + u32 regval; if (copy_from_user(&config, ifr->ifr_data, sizeof(config))) return -EFAULT; @@ -4203,11 +4344,120 @@ static int igb_hwtstamp_ioctl(struct net_device *netdev, if (config.flags) return -EINVAL; - if (config.tx_type == HWTSTAMP_TX_OFF && - config.rx_filter == HWTSTAMP_FILTER_NONE) - return 0; + switch (config.tx_type) { + case HWTSTAMP_TX_OFF: + tsync_tx_ctl_bit = 0; + break; + case HWTSTAMP_TX_ON: + tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED; + break; + default: + return -ERANGE; + } + + switch (config.rx_filter) { + case HWTSTAMP_FILTER_NONE: + tsync_rx_ctl_bit = 0; + break; + case HWTSTAMP_FILTER_PTP_V1_L4_EVENT: + case HWTSTAMP_FILTER_PTP_V2_L4_EVENT: + case HWTSTAMP_FILTER_PTP_V2_L2_EVENT: + case HWTSTAMP_FILTER_ALL: + /* + * register TSYNCRXCFG must be set, therefore it is not + * possible to time stamp both Sync and Delay_Req messages + * => fall back to time stamping all packets + */ + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_ALL; + config.rx_filter = HWTSTAMP_FILTER_ALL; + break; + case HWTSTAMP_FILTER_PTP_V1_L4_SYNC: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1; + tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE; + is_l4 = 1; + break; + case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1; + tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE; + is_l4 = 1; + break; + case HWTSTAMP_FILTER_PTP_V2_L2_SYNC: + case HWTSTAMP_FILTER_PTP_V2_L4_SYNC: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2; + tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE; + is_l2 = 1; + is_l4 = 1; + config.rx_filter = HWTSTAMP_FILTER_SOME; + break; + case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ: + case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2; + tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE; + is_l2 = 1; + is_l4 = 1; + config.rx_filter = HWTSTAMP_FILTER_SOME; + break; + case HWTSTAMP_FILTER_PTP_V2_EVENT: + case HWTSTAMP_FILTER_PTP_V2_SYNC: + case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_EVENT_V2; + config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT; + is_l2 = 1; + break; + default: + return -ERANGE; + } + + /* enable/disable TX */ + regval = rd32(E1000_TSYNCTXCTL); + regval = (regval & ~E1000_TSYNCTXCTL_ENABLED) | tsync_tx_ctl_bit; + wr32(E1000_TSYNCTXCTL, regval); + + /* enable/disable RX, define which PTP packets are time stamped */ + regval = rd32(E1000_TSYNCRXCTL); + regval = (regval & ~E1000_TSYNCRXCTL_ENABLED) | tsync_rx_ctl_bit; + regval = (regval & ~0xE) | tsync_rx_ctl_type; + wr32(E1000_TSYNCRXCTL, regval); + wr32(E1000_TSYNCRXCFG, tsync_rx_cfg); + + /* + * Ethertype Filter Queue Filter[0][15:0] = 0x88F7 + * (Ethertype to filter on) + * Ethertype Filter Queue Filter[0][26] = 0x1 (Enable filter) + * Ethertype Filter Queue Filter[0][30] = 0x1 (Enable Timestamping) + */ + wr32(E1000_ETQF0, is_l2 ? 0x440088f7 : 0); + + /* L4 Queue Filter[0]: only filter by source and destination port */ + wr32(E1000_SPQF0, htons(port)); + wr32(E1000_IMIREXT(0), is_l4 ? + ((1<<12) | (1<<19) /* bypass size and control flags */) : 0); + wr32(E1000_IMIR(0), is_l4 ? + (htons(port) + | (0<<16) /* immediate interrupt disabled */ + | 0 /* (1<<17) bit cleared: do not bypass + destination port check */) + : 0); + wr32(E1000_FTQF0, is_l4 ? + (0x11 /* UDP */ + | (1<<15) /* VF not compared */ + | (1<<27) /* Enable Timestamping */ + | (7<<28) /* only source port filter enabled, + source/target address and protocol + masked */) + : ((1<<15) | (15<<28) /* all mask bits set = filter not + enabled */)); + + wrfl(); + + adapter->hwtstamp_config = config; + + /* clear TX/RX time stamp registers, just to be sure */ + regval = rd32(E1000_TXSTMPH); + regval = rd32(E1000_RXSTMPH); - return -ERANGE; + return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ? + -EFAULT : 0; } /** -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo 2009-02-12 14:57 [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly 2009-02-16 7:16 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo David Miller 1 sibling, 2 replies; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: John Stultz, Ronciak, John, linux-kernel@vger.kernel.org, netdev@vger.kernel.org [once more, with signed-off - sorry for the noise] This revision of the patch series transports hardware time stamps from the hardware into user space via additional fields in struct skb_shared_info. It's based on net-next-2.6 as of this morning. This is the solution that emerged from the discussion of the other approaches suggested before (reuse tstamp, extend skbuff, optional structs): it has the advantage of not touching skbuff and also has the simplest implementation. The clocksource and timecompare patches have been reviewed by John Stultz. He doesn't mind merging them via the net subtree. John Ronciak reviewed the igb driver patches. He suggested to merge the patches as they; after all, PTPd already works fine. I just tested again on 32 and 64 bit x86, both with the timestamping example programs as well as with PTPd. Latest patched PTPd is here: http://github.com/pohly/ptpd/tree/master The open TODOs in the igb driver will be fixed. We think that this will be easier with the infrastructure and the driver in a regular kernel tree. David, I hope you consider the patches acceptable now. If not, then as before: please let me know what I can do to make them acceptable. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c 2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly 2009-02-16 7:16 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo David Miller 1 sibling, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly So far struct clocksource acted as the interface between time/timekeeping.c and hardware. This patch generalizes the concept so that a similar interface can also be used in other contexts. For that it introduces new structures and related functions *without* touching the existing struct clocksource. The reasons for adding these new structures to clocksource.[ch] are * the APIs are clearly related * struct clocksource could be cleaned up to use the new structs * avoids proliferation of files with similar names (timesource.h? timecounter.h?) As outlined in the discussion with John Stultz, this patch adds * struct cyclecounter: stateless API to hardware which counts clock cycles * struct timecounter: stateful utility code built on a cyclecounter which provides a nanosecond counter * only the function to read the nanosecond counter; deltas are used internally and not exposed to users of timecounter The code does no locking of the shared state. It must be called at least as often as the cycle counter wraps around to detect these wrap arounds. Both is the responsibility of the timecounter user. Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- include/linux/clocksource.h | 101 +++++++++++++++++++++++++++++++++++++++++++ kernel/time/clocksource.c | 76 ++++++++++++++++++++++++++++++++ 2 files changed, 177 insertions(+), 0 deletions(-) diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index f88d32f..573819e 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -22,8 +22,109 @@ typedef u64 cycle_t; struct clocksource; /** + * struct cyclecounter - hardware abstraction for a free running counter + * Provides completely state-free accessors to the underlying hardware. + * Depending on which hardware it reads, the cycle counter may wrap + * around quickly. Locking rules (if necessary) have to be defined + * by the implementor and user of specific instances of this API. + * + * @read: returns the current cycle value + * @mask: bitmask for two's complement + * subtraction of non 64 bit counters, + * see CLOCKSOURCE_MASK() helper macro + * @mult: cycle to nanosecond multiplier + * @shift: cycle to nanosecond divisor (power of two) + */ +struct cyclecounter { + cycle_t (*read)(const struct cyclecounter *cc); + cycle_t mask; + u32 mult; + u32 shift; +}; + +/** + * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds + * Contains the state needed by timecounter_read() to detect + * cycle counter wrap around. Initialize with + * timecounter_init(). Also used to convert cycle counts into the + * corresponding nanosecond counts with timecounter_cyc2time(). Users + * of this code are responsible for initializing the underlying + * cycle counter hardware, locking issues and reading the time + * more often than the cycle counter wraps around. The nanosecond + * counter will only wrap around after ~585 years. + * + * @cc: the cycle counter used by this instance + * @cycle_last: most recent cycle counter value seen by + * timecounter_read() + * @nsec: continuously increasing count + */ +struct timecounter { + const struct cyclecounter *cc; + cycle_t cycle_last; + u64 nsec; +}; + +/** + * cyclecounter_cyc2ns - converts cycle counter cycles to nanoseconds + * @tc: Pointer to cycle counter. + * @cycles: Cycles + * + * XXX - This could use some mult_lxl_ll() asm optimization. Same code + * as in cyc2ns, but with unsigned result. + */ +static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc, + cycle_t cycles) +{ + u64 ret = (u64)cycles; + ret = (ret * cc->mult) >> cc->shift; + return ret; +} + +/** + * timecounter_init - initialize a time counter + * @tc: Pointer to time counter which is to be initialized/reset + * @cc: A cycle counter, ready to be used. + * @start_tstamp: Arbitrary initial time stamp. + * + * After this call the current cycle register (roughly) corresponds to + * the initial time stamp. Every call to timecounter_read() increments + * the time stamp counter by the number of elapsed nanoseconds. + */ +extern void timecounter_init(struct timecounter *tc, + const struct cyclecounter *cc, + u64 start_tstamp); + +/** + * timecounter_read - return nanoseconds elapsed since timecounter_init() + * plus the initial time stamp + * @tc: Pointer to time counter. + * + * In other words, keeps track of time since the same epoch as + * the function which generated the initial time stamp. + */ +extern u64 timecounter_read(struct timecounter *tc); + +/** + * timecounter_cyc2time - convert a cycle counter to same + * time base as values returned by + * timecounter_read() + * @tc: Pointer to time counter. + * @cycle: a value returned by tc->cc->read() + * + * Cycle counts that are converted correctly as long as they + * fall into the interval [-1/2 max cycle count, +1/2 max cycle count], + * with "max cycle count" == cs->mask+1. + * + * This allows conversion of cycle counter values which were generated + * in the past. + */ +extern u64 timecounter_cyc2time(struct timecounter *tc, + cycle_t cycle_tstamp); + +/** * struct clocksource - hardware abstraction for a free running counter * Provides mostly state-free accessors to the underlying hardware. + * This is the structure used for system time. * * @name: ptr to clocksource name * @list: list head for registration diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ca89e15..c46c931 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -31,6 +31,82 @@ #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */ #include <linux/tick.h> +void timecounter_init(struct timecounter *tc, + const struct cyclecounter *cc, + u64 start_tstamp) +{ + tc->cc = cc; + tc->cycle_last = cc->read(cc); + tc->nsec = start_tstamp; +} +EXPORT_SYMBOL(timecounter_init); + +/** + * timecounter_read_delta - get nanoseconds since last call of this function + * @tc: Pointer to time counter + * + * When the underlying cycle counter runs over, this will be handled + * correctly as long as it does not run over more than once between + * calls. + * + * The first call to this function for a new time counter initializes + * the time tracking and returns an undefined result. + */ +static u64 timecounter_read_delta(struct timecounter *tc) +{ + cycle_t cycle_now, cycle_delta; + u64 ns_offset; + + /* read cycle counter: */ + cycle_now = tc->cc->read(tc->cc); + + /* calculate the delta since the last timecounter_read_delta(): */ + cycle_delta = (cycle_now - tc->cycle_last) & tc->cc->mask; + + /* convert to nanoseconds: */ + ns_offset = cyclecounter_cyc2ns(tc->cc, cycle_delta); + + /* update time stamp of timecounter_read_delta() call: */ + tc->cycle_last = cycle_now; + + return ns_offset; +} + +u64 timecounter_read(struct timecounter *tc) +{ + u64 nsec; + + /* increment time by nanoseconds since last call */ + nsec = timecounter_read_delta(tc); + nsec += tc->nsec; + tc->nsec = nsec; + + return nsec; +} +EXPORT_SYMBOL(timecounter_read); + +u64 timecounter_cyc2time(struct timecounter *tc, + cycle_t cycle_tstamp) +{ + u64 cycle_delta = (cycle_tstamp - tc->cycle_last) & tc->cc->mask; + u64 nsec; + + /* + * Instead of always treating cycle_tstamp as more recent + * than tc->cycle_last, detect when it is too far in the + * future and treat it as old time stamp instead. + */ + if (cycle_delta > tc->cc->mask / 2) { + cycle_delta = (tc->cycle_last - cycle_tstamp) & tc->cc->mask; + nsec = tc->nsec - cyclecounter_cyc2ns(tc->cc, cycle_delta); + } else { + nsec = cyclecounter_cyc2ns(tc->cc, cycle_delta) + tc->nsec; + } + + return nsec; +} +EXPORT_SYMBOL(timecounter_cyc2time); + /* XXX - Would like a better way for initializing curr_clocksource */ extern struct clocksource clocksource_jiffies; -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases 2009-02-12 15:03 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Mapping from a struct timecounter to a time returned by functions like ktime_get_real() is implemented. This is sufficient to use this code in a network device driver which wants to support hardware time stamping and transformation of hardware time stamps to system time. The interface could have been made more versatile by not depending on a time counter, but this wasn't done to avoid writing glue code elsewhere. The method implemented here is the one used and analyzed under the name "assisted PTP" in the LCI PTP paper: http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- include/linux/timecompare.h | 125 ++++++++++++++++++++++++++++ kernel/time/Makefile | 2 +- kernel/time/timecompare.c | 191 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 317 insertions(+), 1 deletions(-) create mode 100644 include/linux/timecompare.h create mode 100644 kernel/time/timecompare.c diff --git a/include/linux/timecompare.h b/include/linux/timecompare.h new file mode 100644 index 0000000..546e223 --- /dev/null +++ b/include/linux/timecompare.h @@ -0,0 +1,125 @@ +/* + * Utility code which helps transforming between two different time + * bases, called "source" and "target" time in this code. + * + * Source time has to be provided via the timecounter API while target + * time is accessed via a function callback whose prototype + * intentionally matches ktime_get() and ktime_get_real(). These + * interfaces where chosen like this so that the code serves its + * initial purpose without additional glue code. + * + * This purpose is synchronizing a hardware clock in a NIC with system + * time, in order to implement the Precision Time Protocol (PTP, + * IEEE1588) with more accurate hardware assisted time stamping. In + * that context only synchronization against system time (= + * ktime_get_real()) is currently needed. But this utility code might + * become useful in other situations, which is why it was written as + * general purpose utility code. + * + * The source timecounter is assumed to return monotonically + * increasing time (but this code does its best to compensate if that + * is not the case) whereas target time may jump. + * + * The target time corresponding to a source time is determined by + * reading target time, reading source time, reading target time + * again, then assuming that average target time corresponds to source + * time. In other words, the assumption is that reading the source + * time is slow and involves equal time for sending the request and + * receiving the reply, whereas reading target time is assumed to be + * fast. + * + * Copyright (C) 2009 Intel Corporation. + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ +#ifndef _LINUX_TIMECOMPARE_H +#define _LINUX_TIMECOMPARE_H + +#include <linux/clocksource.h> +#include <linux/ktime.h> + +/** + * struct timecompare - stores state and configuration for the two clocks + * + * Initialize to zero, then set source/target/num_samples. + * + * Transformation between source time and target time is done with: + * target_time = source_time + offset + + * (source_time - last_update) * skew / + * TIMECOMPARE_SKEW_RESOLUTION + * + * @source: used to get source time stamps via timecounter_read() + * @target: function returning target time (for example, ktime_get + * for monotonic time, or ktime_get_real for wall clock) + * @num_samples: number of times that source time and target time are to + * be compared when determining their offset + * @offset: (target time - source time) at the time of the last update + * @skew: average (target time - source time) / delta source time * + * TIMECOMPARE_SKEW_RESOLUTION + * @last_update: last source time stamp when time offset was measured + */ +struct timecompare { + struct timecounter *source; + ktime_t (*target)(void); + int num_samples; + + s64 offset; + s64 skew; + u64 last_update; +}; + +/** + * timecompare_transform - transform source time stamp into target time base + * @sync: context for time sync + * @source_tstamp: the result of timecounter_read() or + * timecounter_cyc2time() + */ +extern ktime_t timecompare_transform(struct timecompare *sync, + u64 source_tstamp); + +/** + * timecompare_offset - measure current (target time - source time) offset + * @sync: context for time sync + * @offset: average offset during sample period returned here + * @source_tstamp: average source time during sample period returned here + * + * Returns number of samples used. Might be zero (= no result) in the + * unlikely case that target time was monotonically decreasing for all + * samples (= broken). + */ +extern int timecompare_offset(struct timecompare *sync, + s64 *offset, + u64 *source_tstamp); + +extern void __timecompare_update(struct timecompare *sync, + u64 source_tstamp); + +/** + * timecompare_update - update offset and skew by measuring current offset + * @sync: context for time sync + * @source_tstamp: the result of timecounter_read() or + * timecounter_cyc2time(), pass zero to force update + * + * Updates are only done at most once per second. + */ +static inline void timecompare_update(struct timecompare *sync, + u64 source_tstamp) +{ + if (!source_tstamp || + (s64)(source_tstamp - sync->last_update) >= NSEC_PER_SEC) + __timecompare_update(sync, source_tstamp); +} + +#endif /* _LINUX_TIMECOMPARE_H */ diff --git a/kernel/time/Makefile b/kernel/time/Makefile index 905b0b5..0b0a636 100644 --- a/kernel/time/Makefile +++ b/kernel/time/Makefile @@ -1,4 +1,4 @@ -obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o +obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o timecompare.o obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD) += clockevents.o obj-$(CONFIG_GENERIC_CLOCKEVENTS) += tick-common.o diff --git a/kernel/time/timecompare.c b/kernel/time/timecompare.c new file mode 100644 index 0000000..71e7f1a --- /dev/null +++ b/kernel/time/timecompare.c @@ -0,0 +1,191 @@ +/* + * Copyright (C) 2009 Intel Corporation. + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <linux/timecompare.h> +#include <linux/module.h> +#include <linux/math64.h> + +/* + * fixed point arithmetic scale factor for skew + * + * Usually one would measure skew in ppb (parts per billion, 1e9), but + * using a factor of 2 simplifies the math. + */ +#define TIMECOMPARE_SKEW_RESOLUTION (((s64)1)<<30) + +ktime_t timecompare_transform(struct timecompare *sync, + u64 source_tstamp) +{ + u64 nsec; + + nsec = source_tstamp + sync->offset; + nsec += (s64)(source_tstamp - sync->last_update) * sync->skew / + TIMECOMPARE_SKEW_RESOLUTION; + + return ns_to_ktime(nsec); +} +EXPORT_SYMBOL(timecompare_transform); + +int timecompare_offset(struct timecompare *sync, + s64 *offset, + u64 *source_tstamp) +{ + u64 start_source = 0, end_source = 0; + struct { + s64 offset; + s64 duration_target; + } buffer[10], sample, *samples; + int counter = 0, i; + int used; + int index; + int num_samples = sync->num_samples; + + if (num_samples > sizeof(buffer)/sizeof(buffer[0])) { + samples = kmalloc(sizeof(*samples) * num_samples, GFP_ATOMIC); + if (!samples) { + samples = buffer; + num_samples = sizeof(buffer)/sizeof(buffer[0]); + } + } else { + samples = buffer; + } + + /* run until we have enough valid samples, but do not try forever */ + i = 0; + counter = 0; + while (1) { + u64 ts; + ktime_t start, end; + + start = sync->target(); + ts = timecounter_read(sync->source); + end = sync->target(); + + if (!i) + start_source = ts; + + /* ignore negative durations */ + sample.duration_target = ktime_to_ns(ktime_sub(end, start)); + if (sample.duration_target >= 0) { + /* + * assume symetric delay to and from source: + * average target time corresponds to measured + * source time + */ + sample.offset = + ktime_to_ns(ktime_add(end, start)) / 2 - + ts; + + /* simple insertion sort based on duration */ + index = counter - 1; + while (index >= 0) { + if (samples[index].duration_target < + sample.duration_target) + break; + samples[index + 1] = samples[index]; + index--; + } + samples[index + 1] = sample; + counter++; + } + + i++; + if (counter >= num_samples || i >= 100000) { + end_source = ts; + break; + } + } + + *source_tstamp = (end_source + start_source) / 2; + + /* remove outliers by only using 75% of the samples */ + used = counter * 3 / 4; + if (!used) + used = counter; + if (used) { + /* calculate average */ + s64 off = 0; + for (index = 0; index < used; index++) + off += samples[index].offset; + *offset = div_s64(off, used); + } + + if (samples && samples != buffer) + kfree(samples); + + return used; +} +EXPORT_SYMBOL(timecompare_offset); + +void __timecompare_update(struct timecompare *sync, + u64 source_tstamp) +{ + s64 offset; + u64 average_time; + + if (!timecompare_offset(sync, &offset, &average_time)) + return; + + if (!sync->last_update) { + sync->last_update = average_time; + sync->offset = offset; + sync->skew = 0; + } else { + s64 delta_nsec = average_time - sync->last_update; + + /* avoid division by negative or small deltas */ + if (delta_nsec >= 10000) { + s64 delta_offset_nsec = offset - sync->offset; + s64 skew; /* delta_offset_nsec * + TIMECOMPARE_SKEW_RESOLUTION / + delta_nsec */ + u64 divisor; + + /* div_s64() is limited to 32 bit divisor */ + skew = delta_offset_nsec * TIMECOMPARE_SKEW_RESOLUTION; + divisor = delta_nsec; + while (unlikely(divisor >= ((s64)1) << 32)) { + /* divide both by 2; beware, right shift + of negative value has undefined + behavior and can only be used for + the positive divisor */ + skew = div_s64(skew, 2); + divisor >>= 1; + } + skew = div_s64(skew, divisor); + + /* + * Calculate new overall skew as 4/16 the + * old value and 12/16 the new one. This is + * a rather arbitrary tradeoff between + * only using the latest measurement (0/16 and + * 16/16) and even more weight on past measurements. + */ +#define TIMECOMPARE_NEW_SKEW_PER_16 12 + sync->skew = + div_s64((16 - TIMECOMPARE_NEW_SKEW_PER_16) * + sync->skew + + TIMECOMPARE_NEW_SKEW_PER_16 * skew, + 16); + sync->last_update = average_time; + sync->offset = offset; + } + } +} +EXPORT_SYMBOL(__timecompare_update); -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets 2009-02-12 15:03 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly User space can request hardware and/or software time stamping. Reporting of the result(s) via a new control message is enabled separately for each field in the message because some of the fields may require additional computation and thus cause overhead. User space can tell the different kinds of time stamps apart and choose what suits its needs. When a TX timestamp operation is requested, the TX skb will be cloned and the clone will be time stamped (in hardware or software) and added to the socket error queue of the skb, if the skb has a socket associated with it. The actual TX timestamp will reach userspace as a RX timestamp on the cloned packet. If timestamping is requested and no timestamping is done in the device driver (potentially this may use hardware timestamping), it will be done in software after the device's start_hard_xmit routine. Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- Documentation/networking/timestamping.txt | 178 +++++++ Documentation/networking/timestamping/.gitignore | 1 + Documentation/networking/timestamping/Makefile | 6 + .../networking/timestamping/timestamping.c | 533 ++++++++++++++++++++ arch/alpha/include/asm/socket.h | 3 + arch/arm/include/asm/socket.h | 3 + arch/avr32/include/asm/socket.h | 3 + arch/blackfin/include/asm/socket.h | 3 + arch/cris/include/asm/socket.h | 3 + arch/h8300/include/asm/socket.h | 3 + arch/ia64/include/asm/socket.h | 3 + arch/m68k/include/asm/socket.h | 3 + arch/mips/include/asm/socket.h | 3 + arch/parisc/include/asm/socket.h | 3 + arch/powerpc/include/asm/socket.h | 3 + arch/s390/include/asm/socket.h | 3 + arch/sh/include/asm/socket.h | 3 + arch/sparc/include/asm/socket.h | 3 + arch/x86/include/asm/socket.h | 3 + arch/xtensa/include/asm/socket.h | 3 + include/asm-frv/socket.h | 3 + include/asm-m32r/socket.h | 3 + include/asm-mn10300/socket.h | 3 + include/linux/errqueue.h | 1 + include/linux/net_tstamp.h | 104 ++++ include/linux/sockios.h | 3 + 26 files changed, 883 insertions(+), 0 deletions(-) create mode 100644 Documentation/networking/timestamping.txt create mode 100644 Documentation/networking/timestamping/.gitignore create mode 100644 Documentation/networking/timestamping/Makefile create mode 100644 Documentation/networking/timestamping/timestamping.c create mode 100644 include/linux/net_tstamp.h diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt new file mode 100644 index 0000000..a681a65 --- /dev/null +++ b/Documentation/networking/timestamping.txt @@ -0,0 +1,178 @@ +The existing interfaces for getting network packages time stamped are: + +* SO_TIMESTAMP + Generate time stamp for each incoming packet using the (not necessarily + monotonous!) system time. Result is returned via recv_msg() in a + control message as timeval (usec resolution). + +* SO_TIMESTAMPNS + Same time stamping mechanism as SO_TIMESTAMP, but returns result as + timespec (nsec resolution). + +* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS] + Only for multicasts: approximate send time stamp by receiving the looped + packet and using its receive time stamp. + +The following interface complements the existing ones: receive time +stamps can be generated and returned for arbitrary packets and much +closer to the point where the packet is really sent. Time stamps can +be generated in software (as before) or in hardware (if the hardware +has such a feature). + +SO_TIMESTAMPING: + +Instructs the socket layer which kind of information is wanted. The +parameter is an integer with some of the following bits set. Setting +other bits is an error and doesn't change the current state. + +SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamp in hardware +SOF_TIMESTAMPING_TX_SOFTWARE: if SOF_TIMESTAMPING_TX_HARDWARE is off or + fails, then do it in software +SOF_TIMESTAMPING_RX_HARDWARE: return the original, unmodified time stamp + as generated by the hardware +SOF_TIMESTAMPING_RX_SOFTWARE: if SOF_TIMESTAMPING_RX_HARDWARE is off or + fails, then do it in software +SOF_TIMESTAMPING_RAW_HARDWARE: return original raw hardware time stamp +SOF_TIMESTAMPING_SYS_HARDWARE: return hardware time stamp transformed to + the system time base +SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in + software + +SOF_TIMESTAMPING_TX/RX determine how time stamps are generated. +SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the +following control message: + struct scm_timestamping { + struct timespec systime; + struct timespec hwtimetrans; + struct timespec hwtimeraw; + }; + +recvmsg() can be used to get this control message for regular incoming +packets. For send time stamps the outgoing packet is looped back to +the socket's error queue with the send time stamp(s) attached. It can +be received with recvmsg(flags=MSG_ERRQUEUE). The call returns the +original outgoing packet data including all headers preprended down to +and including the link layer, the scm_timestamping control message and +a sock_extended_err control message with ee_errno==ENOMSG and +ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending +bounced packet is ready for reading as far as select() is concerned. + +All three values correspond to the same event in time, but were +generated in different ways. Each of these values may be empty (= all +zero), in which case no such value was available. If the application +is not interested in some of these values, they can be left blank to +avoid the potential overhead of calculating them. + +systime is the value of the system time at that moment. This +corresponds to the value also returned via SO_TIMESTAMP[NS]. If the +time stamp was generated by hardware, then this field is +empty. Otherwise it is filled in if SOF_TIMESTAMPING_SOFTWARE is +set. + +hwtimeraw is the original hardware time stamp. Filled in if +SOF_TIMESTAMPING_RAW_HARDWARE is set. No assumptions about its +relation to system time should be made. + +hwtimetrans is the hardware time stamp transformed so that it +corresponds as good as possible to system time. This correlation is +not perfect; as a consequence, sorting packets received via different +NICs by their hwtimetrans may differ from the order in which they were +received. hwtimetrans may be non-monotonic even for the same NIC. +Filled in if SOF_TIMESTAMPING_SYS_HARDWARE is set. Requires support +by the network device and will be empty without that support. + + +SIOCSHWTSTAMP: + +Hardware time stamping must also be initialized for each device driver +that is expected to do hardware time stamping. The parameter is: + +struct hwtstamp_config { + int flags; /* no flags defined right now, must be zero */ + int tx_type; /* HWTSTAMP_TX_* */ + int rx_filter; /* HWTSTAMP_FILTER_* */ +}; + +Desired behavior is passed into the kernel and to a specific device by +calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose +ifr_data points to a struct hwtstamp_config. The tx_type and +rx_filter are hints to the driver what it is expected to do. If +the requested fine-grained filtering for incoming packets is not +supported, the driver may time stamp more than just the requested types +of packets. + +A driver which supports hardware time stamping shall update the struct +with the actual, possibly more permissive configuration. If the +requested packets cannot be time stamped, then nothing should be +changed and ERANGE shall be returned (in contrast to EINVAL, which +indicates that SIOCSHWTSTAMP is not supported at all). + +Only a processes with admin rights may change the configuration. User +space is responsible to ensure that multiple processes don't interfere +with each other and that the settings are reset. + +/* possible values for hwtstamp_config->tx_type */ +enum { + /* + * no outgoing packet will need hardware time stamping; + * should a packet arrive which asks for it, no hardware + * time stamping will be done + */ + HWTSTAMP_TX_OFF, + + /* + * enables hardware time stamping for outgoing packets; + * the sender of the packet decides which are to be + * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE + * before sending the packet + */ + HWTSTAMP_TX_ON, +}; + +/* possible values for hwtstamp_config->rx_filter */ +enum { + /* time stamp no incoming packet at all */ + HWTSTAMP_FILTER_NONE, + + /* time stamp any incoming packet */ + HWTSTAMP_FILTER_ALL, + + /* return value: time stamp all packets requested plus some others */ + HWTSTAMP_FILTER_SOME, + + /* PTP v1, UDP, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V1_L4_EVENT, + + ... +}; + + +DEVICE IMPLEMENTATION + +A driver which supports hardware time stamping must support the +SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored +in the skb with skb_hwtstamp_set(). + +Time stamps for outgoing packets are to be generated as follows: +- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware() + returns non-zero. If yes, then the driver is expected + to do hardware time stamping. +- If this is possible for the skb and requested, then declare + that the driver is doing the time stamping by calling + skb_hwtstamp_tx_in_progress(). A driver not supporting + hardware time stamping doesn't do that. A driver must never + touch sk_buff::tstamp! It is used to store how time stamping + for an outgoing packets is to be done. +- As soon as the driver has sent the packet and/or obtained a + hardware time stamp for it, it passes the time stamp back by + calling skb_hwtstamp_tx() with the original skb, the raw + hardware time stamp and a handle to the device (necessary + to convert the hardware time stamp to system time). If obtaining + the hardware time stamp somehow fails, then the driver should + not fall back to software time stamping. The rationale is that + this would occur at a later time in the processing pipeline + than other software time stamping and therefore could lead + to unexpected deltas between time stamps. +- If the driver did not call skb_hwtstamp_tx_in_progress(), then + dev_hard_start_xmit() checks whether software time stamping + is wanted as fallback and potentially generates the time stamp. diff --git a/Documentation/networking/timestamping/.gitignore b/Documentation/networking/timestamping/.gitignore new file mode 100644 index 0000000..71e81eb --- /dev/null +++ b/Documentation/networking/timestamping/.gitignore @@ -0,0 +1 @@ +timestamping diff --git a/Documentation/networking/timestamping/Makefile b/Documentation/networking/timestamping/Makefile new file mode 100644 index 0000000..2a1489f --- /dev/null +++ b/Documentation/networking/timestamping/Makefile @@ -0,0 +1,6 @@ +CPPFLAGS = -I../../../include + +timestamping: timestamping.c + +clean: + rm -f timestamping diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c new file mode 100644 index 0000000..43d1431 --- /dev/null +++ b/Documentation/networking/timestamping/timestamping.c @@ -0,0 +1,533 @@ +/* + * This program demonstrates how the various time stamping features in + * the Linux kernel work. It emulates the behavior of a PTP + * implementation in stand-alone master mode by sending PTPv1 Sync + * multicasts once every second. It looks for similar packets, but + * beyond that doesn't actually implement PTP. + * + * Outgoing packets are time stamped with SO_TIMESTAMPING with or + * without hardware support. + * + * Incoming packets are time stamped with SO_TIMESTAMPING with or + * without hardware support, SIOCGSTAMP[NS] (per-socket time stamp) and + * SO_TIMESTAMP[NS]. + * + * Copyright (C) 2009 Intel Corporation. + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <errno.h> +#include <string.h> + +#include <sys/time.h> +#include <sys/socket.h> +#include <sys/select.h> +#include <sys/ioctl.h> +#include <arpa/inet.h> +#include <net/if.h> + +#include "asm/types.h" +#include "linux/net_tstamp.h" +#include "linux/errqueue.h" + +#ifndef SO_TIMESTAMPING +# define SO_TIMESTAMPING 37 +# define SCM_TIMESTAMPING SO_TIMESTAMPING +#endif + +#ifndef SO_TIMESTAMPNS +# define SO_TIMESTAMPNS 35 +#endif + +#ifndef SIOCGSTAMPNS +# define SIOCGSTAMPNS 0x8907 +#endif + +#ifndef SIOCSHWTSTAMP +# define SIOCSHWTSTAMP 0x89b0 +#endif + +static void usage(const char *error) +{ + if (error) + printf("invalid option: %s\n", error); + printf("timestamping interface option*\n\n" + "Options:\n" + " IP_MULTICAST_LOOP - looping outgoing multicasts\n" + " SO_TIMESTAMP - normal software time stamping, ms resolution\n" + " SO_TIMESTAMPNS - more accurate software time stamping\n" + " SOF_TIMESTAMPING_TX_HARDWARE - hardware time stamping of outgoing packets\n" + " SOF_TIMESTAMPING_TX_SOFTWARE - software fallback for outgoing packets\n" + " SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n" + " SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n" + " SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n" + " SOF_TIMESTAMPING_SYS_HARDWARE - request reporting of transformed HW time stamps\n" + " SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n" + " SIOCGSTAMP - check last socket time stamp\n" + " SIOCGSTAMPNS - more accurate socket time stamp\n"); + exit(1); +} + +static void bail(const char *error) +{ + printf("%s: %s\n", error, strerror(errno)); + exit(1); +} + +static const unsigned char sync[] = { + 0x00, 0x01, 0x00, 0x01, + 0x5f, 0x44, 0x46, 0x4c, + 0x54, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x01, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x01, 0x00, 0x37, + 0x00, 0x00, 0x00, 0x08, + 0x00, 0x00, 0x00, 0x00, + 0x49, 0x05, 0xcd, 0x01, + 0x29, 0xb1, 0x8d, 0xb0, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x00, 0x00, 0x37, + 0x00, 0x00, 0x00, 0x04, + 0x44, 0x46, 0x4c, 0x54, + 0x00, 0x00, 0xf0, 0x60, + 0x00, 0x01, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x01, + 0x00, 0x00, 0xf0, 0x60, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x04, + 0x44, 0x46, 0x4c, 0x54, + 0x00, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00 +}; + +static void sendpacket(int sock, struct sockaddr *addr, socklen_t addr_len) +{ + struct timeval now; + int res; + + res = sendto(sock, sync, sizeof(sync), 0, + addr, addr_len); + gettimeofday(&now, 0); + if (res < 0) + printf("%s: %s\n", "send", strerror(errno)); + else + printf("%ld.%06ld: sent %d bytes\n", + (long)now.tv_sec, (long)now.tv_usec, + res); +} + +static void printpacket(struct msghdr *msg, int res, + char *data, + int sock, int recvmsg_flags, + int siocgstamp, int siocgstampns) +{ + struct sockaddr_in *from_addr = (struct sockaddr_in *)msg->msg_name; + struct cmsghdr *cmsg; + struct timeval tv; + struct timespec ts; + struct timeval now; + + gettimeofday(&now, 0); + + printf("%ld.%06ld: received %s data, %d bytes from %s, %d bytes control messages\n", + (long)now.tv_sec, (long)now.tv_usec, + (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular", + res, + inet_ntoa(from_addr->sin_addr), + msg->msg_controllen); + for (cmsg = CMSG_FIRSTHDR(msg); + cmsg; + cmsg = CMSG_NXTHDR(msg, cmsg)) { + printf(" cmsg len %d: ", cmsg->cmsg_len); + switch (cmsg->cmsg_level) { + case SOL_SOCKET: + printf("SOL_SOCKET "); + switch (cmsg->cmsg_type) { + case SO_TIMESTAMP: { + struct timeval *stamp = + (struct timeval *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMP %ld.%06ld", + (long)stamp->tv_sec, + (long)stamp->tv_usec); + break; + } + case SO_TIMESTAMPNS: { + struct timespec *stamp = + (struct timespec *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMPNS %ld.%09ld", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + break; + } + case SO_TIMESTAMPING: { + struct timespec *stamp = + (struct timespec *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMPING "); + printf("SW %ld.%09ld ", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + stamp++; + printf("HW transformed %ld.%09ld ", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + stamp++; + printf("HW raw %ld.%09ld", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + break; + } + default: + printf("type %d", cmsg->cmsg_type); + break; + } + break; + case IPPROTO_IP: + printf("IPPROTO_IP "); + switch (cmsg->cmsg_type) { + case IP_RECVERR: { + struct sock_extended_err *err = + (struct sock_extended_err *)CMSG_DATA(cmsg); + printf("IP_RECVERR ee_errno '%s' ee_origin %d => %s", + strerror(err->ee_errno), + err->ee_origin, +#ifdef SO_EE_ORIGIN_TIMESTAMPING + err->ee_origin == SO_EE_ORIGIN_TIMESTAMPING ? + "bounced packet" : "unexpected origin" +#else + "probably SO_EE_ORIGIN_TIMESTAMPING" +#endif + ); + if (res < sizeof(sync)) + printf(" => truncated data?!"); + else if (!memcmp(sync, data + res - sizeof(sync), + sizeof(sync))) + printf(" => GOT OUR DATA BACK (HURRAY!)"); + break; + } + case IP_PKTINFO: { + struct in_pktinfo *pktinfo = + (struct in_pktinfo *)CMSG_DATA(cmsg); + printf("IP_PKTINFO interface index %u", + pktinfo->ipi_ifindex); + break; + } + default: + printf("type %d", cmsg->cmsg_type); + break; + } + break; + default: + printf("level %d type %d", + cmsg->cmsg_level, + cmsg->cmsg_type); + break; + } + printf("\n"); + } + + if (siocgstamp) { + if (ioctl(sock, SIOCGSTAMP, &tv)) + printf(" %s: %s\n", "SIOCGSTAMP", strerror(errno)); + else + printf("SIOCGSTAMP %ld.%06ld\n", + (long)tv.tv_sec, + (long)tv.tv_usec); + } + if (siocgstampns) { + if (ioctl(sock, SIOCGSTAMPNS, &ts)) + printf(" %s: %s\n", "SIOCGSTAMPNS", strerror(errno)); + else + printf("SIOCGSTAMPNS %ld.%09ld\n", + (long)ts.tv_sec, + (long)ts.tv_nsec); + } +} + +static void recvpacket(int sock, int recvmsg_flags, + int siocgstamp, int siocgstampns) +{ + char data[256]; + struct msghdr msg; + struct iovec entry; + struct sockaddr_in from_addr; + struct { + struct cmsghdr cm; + char control[512]; + } control; + int res; + + memset(&msg, 0, sizeof(msg)); + msg.msg_iov = &entry; + msg.msg_iovlen = 1; + entry.iov_base = data; + entry.iov_len = sizeof(data); + msg.msg_name = (caddr_t)&from_addr; + msg.msg_namelen = sizeof(from_addr); + msg.msg_control = &control; + msg.msg_controllen = sizeof(control); + + res = recvmsg(sock, &msg, recvmsg_flags|MSG_DONTWAIT); + if (res < 0) { + printf("%s %s: %s\n", + "recvmsg", + (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular", + strerror(errno)); + } else { + printpacket(&msg, res, data, + sock, recvmsg_flags, + siocgstamp, siocgstampns); + } +} + +int main(int argc, char **argv) +{ + int so_timestamping_flags = 0; + int so_timestamp = 0; + int so_timestampns = 0; + int siocgstamp = 0; + int siocgstampns = 0; + int ip_multicast_loop = 0; + char *interface; + int i; + int enabled = 1; + int sock; + struct ifreq device; + struct ifreq hwtstamp; + struct hwtstamp_config hwconfig, hwconfig_requested; + struct sockaddr_in addr; + struct ip_mreq imr; + struct in_addr iaddr; + int val; + socklen_t len; + struct timeval next; + + if (argc < 2) + usage(0); + interface = argv[1]; + + for (i = 2; i < argc; i++) { + if (!strcasecmp(argv[i], "SO_TIMESTAMP")) + so_timestamp = 1; + else if (!strcasecmp(argv[i], "SO_TIMESTAMPNS")) + so_timestampns = 1; + else if (!strcasecmp(argv[i], "SIOCGSTAMP")) + siocgstamp = 1; + else if (!strcasecmp(argv[i], "SIOCGSTAMPNS")) + siocgstampns = 1; + else if (!strcasecmp(argv[i], "IP_MULTICAST_LOOP")) + ip_multicast_loop = 1; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_TX_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_TX_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RX_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SYS_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_SYS_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE; + else + usage(argv[i]); + } + + sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP); + if (socket < 0) + bail("socket"); + + memset(&device, 0, sizeof(device)); + strncpy(device.ifr_name, interface, sizeof(device.ifr_name)); + if (ioctl(sock, SIOCGIFADDR, &device) < 0) + bail("getting interface IP address"); + + memset(&hwtstamp, 0, sizeof(hwtstamp)); + strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name)); + hwtstamp.ifr_data = (void *)&hwconfig; + memset(&hwconfig, 0, sizeof(&hwconfig)); + hwconfig.tx_type = + (so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ? + HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF; + hwconfig.rx_filter = + (so_timestamping_flags & SOF_TIMESTAMPING_RX_HARDWARE) ? + HWTSTAMP_FILTER_PTP_V1_L4_SYNC : HWTSTAMP_FILTER_NONE; + hwconfig_requested = hwconfig; + if (ioctl(sock, SIOCSHWTSTAMP, &hwtstamp) < 0) { + if ((errno == EINVAL || errno == ENOTSUP) && + hwconfig_requested.tx_type == HWTSTAMP_TX_OFF && + hwconfig_requested.rx_filter == HWTSTAMP_FILTER_NONE) + printf("SIOCSHWTSTAMP: disabling hardware time stamping not possible\n"); + else + bail("SIOCSHWTSTAMP"); + } + printf("SIOCSHWTSTAMP: tx_type %d requested, got %d; rx_filter %d requested, got %d\n", + hwconfig_requested.tx_type, hwconfig.tx_type, + hwconfig_requested.rx_filter, hwconfig.rx_filter); + + /* bind to PTP port */ + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = htonl(INADDR_ANY); + addr.sin_port = htons(319 /* PTP event port */); + if (bind(sock, + (struct sockaddr *)&addr, + sizeof(struct sockaddr_in)) < 0) + bail("bind"); + + /* set multicast group for outgoing packets */ + inet_aton("224.0.1.130", &iaddr); /* alternate PTP domain 1 */ + addr.sin_addr = iaddr; + imr.imr_multiaddr.s_addr = iaddr.s_addr; + imr.imr_interface.s_addr = + ((struct sockaddr_in *)&device.ifr_addr)->sin_addr.s_addr; + if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF, + &imr.imr_interface.s_addr, sizeof(struct in_addr)) < 0) + bail("set multicast"); + + /* join multicast group, loop our own packet */ + if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, + &imr, sizeof(struct ip_mreq)) < 0) + bail("join multicast group"); + + if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP, + &ip_multicast_loop, sizeof(enabled)) < 0) { + bail("loop multicast"); + } + + /* set socket options for time stamping */ + if (so_timestamp && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, + &enabled, sizeof(enabled)) < 0) + bail("setsockopt SO_TIMESTAMP"); + + if (so_timestampns && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, + &enabled, sizeof(enabled)) < 0) + bail("setsockopt SO_TIMESTAMPNS"); + + if (so_timestamping_flags && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, + &so_timestamping_flags, + sizeof(so_timestamping_flags)) < 0) + bail("setsockopt SO_TIMESTAMPING"); + + /* request IP_PKTINFO for debugging purposes */ + if (setsockopt(sock, SOL_IP, IP_PKTINFO, + &enabled, sizeof(enabled)) < 0) + printf("%s: %s\n", "setsockopt IP_PKTINFO", strerror(errno)); + + /* verify socket options */ + len = sizeof(val); + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &val, &len) < 0) + printf("%s: %s\n", "getsockopt SO_TIMESTAMP", strerror(errno)); + else + printf("SO_TIMESTAMP %d\n", val); + + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &val, &len) < 0) + printf("%s: %s\n", "getsockopt SO_TIMESTAMPNS", + strerror(errno)); + else + printf("SO_TIMESTAMPNS %d\n", val); + + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &val, &len) < 0) { + printf("%s: %s\n", "getsockopt SO_TIMESTAMPING", + strerror(errno)); + } else { + printf("SO_TIMESTAMPING %d\n", val); + if (val != so_timestamping_flags) + printf(" not the expected value %d\n", + so_timestamping_flags); + } + + /* send packets forever every five seconds */ + gettimeofday(&next, 0); + next.tv_sec = (next.tv_sec + 1) / 5 * 5; + next.tv_usec = 0; + while (1) { + struct timeval now; + struct timeval delta; + long delta_us; + int res; + fd_set readfs, errorfs; + + gettimeofday(&now, 0); + delta_us = (long)(next.tv_sec - now.tv_sec) * 1000000 + + (long)(next.tv_usec - now.tv_usec); + if (delta_us > 0) { + /* continue waiting for timeout or data */ + delta.tv_sec = delta_us / 1000000; + delta.tv_usec = delta_us % 1000000; + + FD_ZERO(&readfs); + FD_ZERO(&errorfs); + FD_SET(sock, &readfs); + FD_SET(sock, &errorfs); + printf("%ld.%06ld: select %ldus\n", + (long)now.tv_sec, (long)now.tv_usec, + delta_us); + res = select(sock + 1, &readfs, 0, &errorfs, &delta); + gettimeofday(&now, 0); + printf("%ld.%06ld: select returned: %d, %s\n", + (long)now.tv_sec, (long)now.tv_usec, + res, + res < 0 ? strerror(errno) : "success"); + if (res > 0) { + if (FD_ISSET(sock, &readfs)) + printf("ready for reading\n"); + if (FD_ISSET(sock, &errorfs)) + printf("has error\n"); + recvpacket(sock, 0, + siocgstamp, + siocgstampns); + recvpacket(sock, MSG_ERRQUEUE, + siocgstamp, + siocgstampns); + } + } else { + /* write one packet */ + sendpacket(sock, + (struct sockaddr *)&addr, + sizeof(addr)); + next.tv_sec += 5; + continue; + } + } + + return 0; +} diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h index a1057c2..3641ec1 100644 --- a/arch/alpha/include/asm/socket.h +++ b/arch/alpha/include/asm/socket.h @@ -62,6 +62,9 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + /* O_NONBLOCK clashes with the bits used for socket types. Therefore we * have to define SOCK_NONBLOCK to a different value here. */ diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h index 6817be9..537de4e 100644 --- a/arch/arm/include/asm/socket.h +++ b/arch/arm/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/avr32/include/asm/socket.h b/arch/avr32/include/asm/socket.h index 35863f2..04c8606 100644 --- a/arch/avr32/include/asm/socket.h +++ b/arch/avr32/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* __ASM_AVR32_SOCKET_H */ diff --git a/arch/blackfin/include/asm/socket.h b/arch/blackfin/include/asm/socket.h index 2ca702e..fac7fe9 100644 --- a/arch/blackfin/include/asm/socket.h +++ b/arch/blackfin/include/asm/socket.h @@ -53,4 +53,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/cris/include/asm/socket.h b/arch/cris/include/asm/socket.h index 9df0ca8..d5cf740 100644 --- a/arch/cris/include/asm/socket.h +++ b/arch/cris/include/asm/socket.h @@ -56,6 +56,9 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h index da2520d..602518a 100644 --- a/arch/h8300/include/asm/socket.h +++ b/arch/h8300/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/ia64/include/asm/socket.h b/arch/ia64/include/asm/socket.h index d5ef0aa..7454212 100644 --- a/arch/ia64/include/asm/socket.h +++ b/arch/ia64/include/asm/socket.h @@ -63,4 +63,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_IA64_SOCKET_H */ diff --git a/arch/m68k/include/asm/socket.h b/arch/m68k/include/asm/socket.h index dbc64e9..ca87f93 100644 --- a/arch/m68k/include/asm/socket.h +++ b/arch/m68k/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/mips/include/asm/socket.h b/arch/mips/include/asm/socket.h index facc2d7..2abca17 100644 --- a/arch/mips/include/asm/socket.h +++ b/arch/mips/include/asm/socket.h @@ -75,6 +75,9 @@ To add: #define SO_REUSEPORT 0x0200 /* Allow local address and port reuse. */ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #ifdef __KERNEL__ /** sock_type - Socket types diff --git a/arch/parisc/include/asm/socket.h b/arch/parisc/include/asm/socket.h index fba402c..885472b 100644 --- a/arch/parisc/include/asm/socket.h +++ b/arch/parisc/include/asm/socket.h @@ -54,6 +54,9 @@ #define SO_MARK 0x401f +#define SO_TIMESTAMPING 0x4020 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + /* O_NONBLOCK clashes with the bits used for socket types. Therefore we * have to define SOCK_NONBLOCK to a different value here. */ diff --git a/arch/powerpc/include/asm/socket.h b/arch/powerpc/include/asm/socket.h index f5a4e16..1e5cfad 100644 --- a/arch/powerpc/include/asm/socket.h +++ b/arch/powerpc/include/asm/socket.h @@ -61,4 +61,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_POWERPC_SOCKET_H */ diff --git a/arch/s390/include/asm/socket.h b/arch/s390/include/asm/socket.h index c786ab6..02330c5 100644 --- a/arch/s390/include/asm/socket.h +++ b/arch/s390/include/asm/socket.h @@ -62,4 +62,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/arch/sh/include/asm/socket.h b/arch/sh/include/asm/socket.h index 6d4bf65..345653b 100644 --- a/arch/sh/include/asm/socket.h +++ b/arch/sh/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* __ASM_SH_SOCKET_H */ diff --git a/arch/sparc/include/asm/socket.h b/arch/sparc/include/asm/socket.h index bf50d0c..982a12f 100644 --- a/arch/sparc/include/asm/socket.h +++ b/arch/sparc/include/asm/socket.h @@ -50,6 +50,9 @@ #define SO_MARK 0x0022 +#define SO_TIMESTAMPING 0x0023 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 0x5001 #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002 diff --git a/arch/x86/include/asm/socket.h b/arch/x86/include/asm/socket.h index 8ab9cc8..ca8bf2c 100644 --- a/arch/x86/include/asm/socket.h +++ b/arch/x86/include/asm/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_X86_SOCKET_H */ diff --git a/arch/xtensa/include/asm/socket.h b/arch/xtensa/include/asm/socket.h index 6100682..dd1a7a4 100644 --- a/arch/xtensa/include/asm/socket.h +++ b/arch/xtensa/include/asm/socket.h @@ -65,4 +65,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _XTENSA_SOCKET_H */ diff --git a/include/asm-frv/socket.h b/include/asm-frv/socket.h index e51ca67..57c3d40 100644 --- a/include/asm-frv/socket.h +++ b/include/asm-frv/socket.h @@ -54,5 +54,8 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/include/asm-m32r/socket.h b/include/asm-m32r/socket.h index 9a0e200..be7ed58 100644 --- a/include/asm-m32r/socket.h +++ b/include/asm-m32r/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_M32R_SOCKET_H */ diff --git a/include/asm-mn10300/socket.h b/include/asm-mn10300/socket.h index 80af9c4..fb5daf4 100644 --- a/include/asm-mn10300/socket.h +++ b/include/asm-mn10300/socket.h @@ -54,4 +54,7 @@ #define SO_MARK 36 +#define SO_TIMESTAMPING 37 +#define SCM_TIMESTAMPING SO_TIMESTAMPING + #endif /* _ASM_SOCKET_H */ diff --git a/include/linux/errqueue.h b/include/linux/errqueue.h index ceb1454..ec12cc7 100644 --- a/include/linux/errqueue.h +++ b/include/linux/errqueue.h @@ -18,6 +18,7 @@ struct sock_extended_err #define SO_EE_ORIGIN_LOCAL 1 #define SO_EE_ORIGIN_ICMP 2 #define SO_EE_ORIGIN_ICMP6 3 +#define SO_EE_ORIGIN_TIMESTAMPING 4 #define SO_EE_OFFENDER(ee) ((struct sockaddr*)((ee)+1)) diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h new file mode 100644 index 0000000..a3b8546 --- /dev/null +++ b/include/linux/net_tstamp.h @@ -0,0 +1,104 @@ +/* + * Userspace API for hardware time stamping of network packets + * + * Copyright (C) 2008,2009 Intel Corporation + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + */ + +#ifndef _NET_TIMESTAMPING_H +#define _NET_TIMESTAMPING_H + +#include <linux/socket.h> /* for SO_TIMESTAMPING */ + +/* SO_TIMESTAMPING gets an integer bit field comprised of these values */ +enum { + SOF_TIMESTAMPING_TX_HARDWARE = (1<<0), + SOF_TIMESTAMPING_TX_SOFTWARE = (1<<1), + SOF_TIMESTAMPING_RX_HARDWARE = (1<<2), + SOF_TIMESTAMPING_RX_SOFTWARE = (1<<3), + SOF_TIMESTAMPING_SOFTWARE = (1<<4), + SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5), + SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6), + SOF_TIMESTAMPING_MASK = + (SOF_TIMESTAMPING_RAW_HARDWARE - 1) | + SOF_TIMESTAMPING_RAW_HARDWARE +}; + +/** + * struct hwtstamp_config - %SIOCSHWTSTAMP parameter + * + * @flags: no flags defined right now, must be zero + * @tx_type: one of HWTSTAMP_TX_* + * @rx_type: one of one of HWTSTAMP_FILTER_* + * + * %SIOCSHWTSTAMP expects a &struct ifreq with a ifr_data pointer to + * this structure. dev_ifsioc() in the kernel takes care of the + * translation between 32 bit userspace and 64 bit kernel. The + * structure is intentionally chosen so that it has the same layout on + * 32 and 64 bit systems, don't break this! + */ +struct hwtstamp_config { + int flags; + int tx_type; + int rx_filter; +}; + +/* possible values for hwtstamp_config->tx_type */ +enum { + /* + * No outgoing packet will need hardware time stamping; + * should a packet arrive which asks for it, no hardware + * time stamping will be done. + */ + HWTSTAMP_TX_OFF, + + /* + * Enables hardware time stamping for outgoing packets; + * the sender of the packet decides which are to be + * time stamped by setting %SOF_TIMESTAMPING_TX_SOFTWARE + * before sending the packet. + */ + HWTSTAMP_TX_ON, +}; + +/* possible values for hwtstamp_config->rx_filter */ +enum { + /* time stamp no incoming packet at all */ + HWTSTAMP_FILTER_NONE, + + /* time stamp any incoming packet */ + HWTSTAMP_FILTER_ALL, + + /* return value: time stamp all packets requested plus some others */ + HWTSTAMP_FILTER_SOME, + + /* PTP v1, UDP, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V1_L4_EVENT, + /* PTP v1, UDP, Sync packet */ + HWTSTAMP_FILTER_PTP_V1_L4_SYNC, + /* PTP v1, UDP, Delay_req packet */ + HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ, + /* PTP v2, UDP, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V2_L4_EVENT, + /* PTP v2, UDP, Sync packet */ + HWTSTAMP_FILTER_PTP_V2_L4_SYNC, + /* PTP v2, UDP, Delay_req packet */ + HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ, + + /* 802.AS1, Ethernet, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V2_L2_EVENT, + /* 802.AS1, Ethernet, Sync packet */ + HWTSTAMP_FILTER_PTP_V2_L2_SYNC, + /* 802.AS1, Ethernet, Delay_req packet */ + HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ, + + /* PTP v2/802.AS1, any layer, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V2_EVENT, + /* PTP v2/802.AS1, any layer, Sync packet */ + HWTSTAMP_FILTER_PTP_V2_SYNC, + /* PTP v2/802.AS1, any layer, Delay_req packet */ + HWTSTAMP_FILTER_PTP_V2_DELAY_REQ, +}; + +#endif /* _NET_TIMESTAMPING_H */ diff --git a/include/linux/sockios.h b/include/linux/sockios.h index abef759..241f179 100644 --- a/include/linux/sockios.h +++ b/include/linux/sockios.h @@ -122,6 +122,9 @@ #define SIOCBRADDIF 0x89a2 /* add interface to bridge */ #define SIOCBRDELIF 0x89a3 /* remove interface from bridge */ +/* hardware time stamping: parameters in linux/net_tstamp.h */ +#define SIOCSHWTSTAMP 0x89b0 + /* Device private ioctl calls */ /* -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping 2009-02-12 15:03 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly The additional per-packet information (16 bytes for time stamps, 1 byte for flags) is stored for all packets in the skb_shared_info struct. This implementation detail is hidden from users of that information via skb_* accessor functions. A separate struct resp. union is used for the additional information so that it can be stored/copied easily outside of skb_shared_info. Compared to previous implementations (reusing the tstamp field depending on the context, optional additional structures) this is the simplest solution. It does not extend sk_buff itself. TX time stamping is implemented in software if the device driver doesn't support hardware time stamping. The new semantic for hardware/software time stamping around ndo_start_xmit() is based on two assumptions about existing network device drivers which don't support hardware time stamping and know nothing about it: - they leave the new skb_shared_tx unmodified - the keep the connection to the originating socket in skb->sk alive, i.e., don't call skb_orphan() Given that skb_shared_tx is new, the first assumption is safe. The second is only true for some drivers. As a result, software TX time stamping currently works with the bnx2 driver, but not with the unmodified igb driver (the two drivers this patch series was tested with). Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- include/linux/skbuff.h | 91 +++++++++++++++++++++++++++++++++++++++++++++++- net/core/dev.c | 32 ++++++++++++++++- net/core/skbuff.c | 41 +++++++++++++++++++++ 3 files changed, 161 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9247008..f96bc91 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -132,6 +132,57 @@ struct skb_frag_struct { __u32 size; }; +#define HAVE_HW_TIME_STAMP + +/** + * skb_shared_hwtstamps - hardware time stamps + * + * @hwtstamp: hardware time stamp transformed into duration + * since arbitrary point in time + * @syststamp: hwtstamp transformed to system time base + * + * Software time stamps generated by ktime_get_real() are stored in + * skb->tstamp. The relation between the different kinds of time + * stamps is as follows: + * + * syststamp and tstamp can be compared against each other in + * arbitrary combinations. The accuracy of a + * syststamp/tstamp/"syststamp from other device" comparison is + * limited by the accuracy of the transformation into system time + * base. This depends on the device driver and its underlying + * hardware. + * + * hwtstamps can only be compared against other hwtstamps from + * the same device. + * + * This structure is attached to packets as part of the + * &skb_shared_info. Use skb_hwtstamps() to get a pointer. + */ +struct skb_shared_hwtstamps { + ktime_t hwtstamp; + ktime_t syststamp; +}; + +/** + * skb_shared_tx - instructions for time stamping of outgoing packets + * + * @hardware: generate hardware time stamp + * @software: generate software time stamp + * @in_progress: device driver is going to provide + * hardware time stamp + * + * These flags are attached to packets as part of the + * &skb_shared_info. Use skb_tx() to get a pointer. + */ +union skb_shared_tx { + struct { + __u8 hardware:1, + software:1, + in_progress:1; + }; + __u8 flags; +}; + /* This data is invariant across clones and lives at * the end of the header data, ie. at skb->end. */ @@ -143,10 +194,12 @@ struct skb_shared_info { unsigned short gso_segs; unsigned short gso_type; __be32 ip6_frag_id; + union skb_shared_tx tx_flags; #ifdef CONFIG_HAS_DMA unsigned int num_dma_maps; #endif struct sk_buff *frag_list; + struct skb_shared_hwtstamps hwtstamps; skb_frag_t frags[MAX_SKB_FRAGS]; #ifdef CONFIG_HAS_DMA dma_addr_t dma_maps[MAX_SKB_FRAGS + 1]; @@ -465,6 +518,16 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) /* Internal */ #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) +static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb) +{ + return &skb_shinfo(skb)->hwtstamps; +} + +static inline union skb_shared_tx *skb_tx(struct sk_buff *skb) +{ + return &skb_shinfo(skb)->tx_flags; +} + /** * skb_queue_empty - check if a queue is empty * @list: queue head @@ -1730,6 +1793,11 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb, extern void skb_init(void); +static inline ktime_t skb_get_ktime(const struct sk_buff *skb) +{ + return skb->tstamp; +} + /** * skb_get_timestamp - get timestamp from a skb * @skb: skb to get stamp from @@ -1739,11 +1807,18 @@ extern void skb_init(void); * This function converts the offset back to a struct timeval and stores * it in stamp. */ -static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval *stamp) +static inline void skb_get_timestamp(const struct sk_buff *skb, + struct timeval *stamp) { *stamp = ktime_to_timeval(skb->tstamp); } +static inline void skb_get_timestampns(const struct sk_buff *skb, + struct timespec *stamp) +{ + *stamp = ktime_to_timespec(skb->tstamp); +} + static inline void __net_timestamp(struct sk_buff *skb) { skb->tstamp = ktime_get_real(); @@ -1759,6 +1834,20 @@ static inline ktime_t net_invalid_timestamp(void) return ktime_set(0, 0); } +/** + * skb_tstamp_tx - queue clone of skb with send time stamps + * @orig_skb: the original outgoing packet + * @hwtstamps: hardware time stamps, may be NULL if not available + * + * If the skb has a socket associated, then this function clones the + * skb (thus sharing the actual data and optional structures), stores + * the optional hardware time stamping information (if non NULL) or + * generates a software time stamp (otherwise), then queues the clone + * to the error queue of the socket. Errors are silently ignored. + */ +extern void skb_tstamp_tx(struct sk_buff *orig_skb, + struct skb_shared_hwtstamps *hwtstamps); + extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len); extern __sum16 __skb_checksum_complete(struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index 1e27a67..d20c28e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1672,10 +1672,21 @@ static int dev_gso_segment(struct sk_buff *skb) return 0; } +static void tstamp_tx(struct sk_buff *skb) +{ + union skb_shared_tx *shtx = + skb_tx(skb); + if (unlikely(shtx->software && + !shtx->in_progress)) { + skb_tstamp_tx(skb, NULL); + } +} + int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq) { const struct net_device_ops *ops = dev->netdev_ops; + int rc; prefetch(&dev->netdev_ops->ndo_start_xmit); if (likely(!skb->next)) { @@ -1689,13 +1700,29 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, goto gso; } - return ops->ndo_start_xmit(skb, dev); + rc = ops->ndo_start_xmit(skb, dev); + /* + * TODO: if skb_orphan() was called by + * dev->hard_start_xmit() (for example, the unmodified + * igb driver does that; bnx2 doesn't), then + * skb_tx_software_timestamp() will be unable to send + * back the time stamp. + * + * How can this be prevented? Always create another + * reference to the socket before calling + * dev->hard_start_xmit()? Prevent that skb_orphan() + * does anything in dev->hard_start_xmit() by clearing + * the skb destructor before the call and restoring it + * afterwards, then doing the skb_orphan() ourselves? + */ + if (likely(!rc)) + tstamp_tx(skb); + return rc; } gso: do { struct sk_buff *nskb = skb->next; - int rc; skb->next = nskb->next; nskb->next = NULL; @@ -1705,6 +1732,7 @@ gso: skb->next = nskb; return rc; } + tstamp_tx(skb); if (unlikely(netif_tx_queue_stopped(txq) && skb->next)) return NETDEV_TX_BUSY; } while (skb->next); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7657cec..7b831a7 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -55,6 +55,7 @@ #include <linux/rtnetlink.h> #include <linux/init.h> #include <linux/scatterlist.h> +#include <linux/errqueue.h> #include <net/protocol.h> #include <net/dst.h> @@ -215,7 +216,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, shinfo->gso_segs = 0; shinfo->gso_type = 0; shinfo->ip6_frag_id = 0; + shinfo->tx_flags.flags = 0; shinfo->frag_list = NULL; + memset(&shinfo->hwtstamps, 0, sizeof(shinfo->hwtstamps)); if (fclone) { struct sk_buff *child = skb + 1; @@ -2940,6 +2943,44 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer) } EXPORT_SYMBOL_GPL(skb_cow_data); +void skb_tstamp_tx(struct sk_buff *orig_skb, + struct skb_shared_hwtstamps *hwtstamps) +{ + struct sock *sk = orig_skb->sk; + struct sock_exterr_skb *serr; + struct sk_buff *skb; + int err; + + if (!sk) + return; + + skb = skb_clone(orig_skb, GFP_ATOMIC); + if (!skb) + return; + + if (hwtstamps) { + *skb_hwtstamps(skb) = + *hwtstamps; + } else { + /* + * no hardware time stamps available, + * so keep the skb_shared_tx and only + * store software time stamp + */ + skb->tstamp = ktime_get_real(); + } + + serr = SKB_EXT_ERR(skb); + memset(serr, 0, sizeof(*serr)); + serr->ee.ee_errno = ENOMSG; + serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING; + err = sock_queue_err_skb(sk, skb); + if (err) + kfree_skb(skb); +} +EXPORT_SYMBOL_GPL(skb_tstamp_tx); + + /** * skb_partial_csum_set - set up and verify partial csum values for packet * @skb: the skb to set -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING 2009-02-12 15:03 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly The overlap with the old SO_TIMESTAMP[NS] options is handled so that time stamping in software (net_enable_timestamp()) is enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE is set. It's disabled if all of these are off. Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- include/net/sock.h | 44 +++++++++++++++++++++++++-- net/compat.c | 19 +++++++---- net/core/sock.c | 81 ++++++++++++++++++++++++++++++++++++++++++------- net/socket.c | 84 +++++++++++++++++++++++++++++++++++++++------------ 4 files changed, 186 insertions(+), 42 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index c047def..0a54d9b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -158,7 +158,7 @@ struct sock_common { * @sk_allocation: allocation mode * @sk_sndbuf: size of send buffer in bytes * @sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, - * %SO_OOBINLINE settings + * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings * @sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets * @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO) * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) @@ -488,6 +488,13 @@ enum sock_flags { SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */ + SOCK_TIMESTAMPING_TX_HARDWARE, /* %SOF_TIMESTAMPING_TX_HARDWARE */ + SOCK_TIMESTAMPING_TX_SOFTWARE, /* %SOF_TIMESTAMPING_TX_SOFTWARE */ + SOCK_TIMESTAMPING_RX_HARDWARE, /* %SOF_TIMESTAMPING_RX_HARDWARE */ + SOCK_TIMESTAMPING_RX_SOFTWARE, /* %SOF_TIMESTAMPING_RX_SOFTWARE */ + SOCK_TIMESTAMPING_SOFTWARE, /* %SOF_TIMESTAMPING_SOFTWARE */ + SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */ + SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */ }; static inline void sock_copy_flags(struct sock *nsk, struct sock *osk) @@ -1346,14 +1353,45 @@ static __inline__ void sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { ktime_t kt = skb->tstamp; + struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); - if (sock_flag(sk, SOCK_RCVTSTAMP)) + /* + * generate control messages if + * - receive time stamping in software requested (SOCK_RCVTSTAMP + * or SOCK_TIMESTAMPING_RX_SOFTWARE) + * - software time stamp available and wanted + * (SOCK_TIMESTAMPING_SOFTWARE) + * - hardware time stamps available and wanted + * (SOCK_TIMESTAMPING_SYS_HARDWARE or + * SOCK_TIMESTAMPING_RAW_HARDWARE) + */ + if (sock_flag(sk, SOCK_RCVTSTAMP) || + sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE) || + (kt.tv64 && sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) || + (hwtstamps->hwtstamp.tv64 && + sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) || + (hwtstamps->syststamp.tv64 && + sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE))) __sock_recv_timestamp(msg, sk, skb); else sk->sk_stamp = kt; } /** + * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped + * @msg: outgoing packet + * @sk: socket sending this packet + * @shtx: filled with instructions for time stamping + * + * Currently only depends on SOCK_TIMESTAMPING* flags. Returns error code if + * parameters are invalid. + */ +extern int sock_tx_timestamp(struct msghdr *msg, + struct sock *sk, + union skb_shared_tx *shtx); + + +/** * sk_eat_skb - Release a skb if it is no longer needed * @sk: socket to eat this skb from * @skb: socket buffer to eat @@ -1421,7 +1459,7 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb) return NULL; } -extern void sock_enable_timestamp(struct sock *sk); +extern void sock_enable_timestamp(struct sock *sk, int flag); extern int sock_get_timestamp(struct sock *, struct timeval __user *); extern int sock_get_timestampns(struct sock *, struct timespec __user *); diff --git a/net/compat.c b/net/compat.c index a3a2ba0..8d73905 100644 --- a/net/compat.c +++ b/net/compat.c @@ -216,7 +216,7 @@ Efault: int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *data) { struct compat_timeval ctv; - struct compat_timespec cts; + struct compat_timespec cts[3]; struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control; struct compat_cmsghdr cmhdr; int cmlen; @@ -233,12 +233,17 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat data = &ctv; len = sizeof(ctv); } - if (level == SOL_SOCKET && type == SCM_TIMESTAMPNS) { + if (level == SOL_SOCKET && + (type == SCM_TIMESTAMPNS || type == SCM_TIMESTAMPING)) { + int count = type == SCM_TIMESTAMPNS ? 1 : 3; + int i; struct timespec *ts = (struct timespec *)data; - cts.tv_sec = ts->tv_sec; - cts.tv_nsec = ts->tv_nsec; + for (i = 0; i < count; i++) { + cts[i].tv_sec = ts[i].tv_sec; + cts[i].tv_nsec = ts[i].tv_nsec; + } data = &cts; - len = sizeof(cts); + len = sizeof(cts[0]) * count; } cmlen = CMSG_COMPAT_LEN(len); @@ -455,7 +460,7 @@ int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp) struct timeval tv; if (!sock_flag(sk, SOCK_TIMESTAMP)) - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); tv = ktime_to_timeval(sk->sk_stamp); if (tv.tv_sec == -1) return err; @@ -479,7 +484,7 @@ int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *usersta struct timespec ts; if (!sock_flag(sk, SOCK_TIMESTAMP)) - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); ts = ktime_to_timespec(sk->sk_stamp); if (ts.tv_sec == -1) return err; diff --git a/net/core/sock.c b/net/core/sock.c index c64996f..d22933a 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -120,6 +120,7 @@ #include <net/net_namespace.h> #include <net/request_sock.h> #include <net/sock.h> +#include <linux/net_tstamp.h> #include <net/xfrm.h> #include <linux/ipsec.h> @@ -255,11 +256,14 @@ static void sock_warn_obsolete_bsdism(const char *name) } } -static void sock_disable_timestamp(struct sock *sk) +static void sock_disable_timestamp(struct sock *sk, int flag) { - if (sock_flag(sk, SOCK_TIMESTAMP)) { - sock_reset_flag(sk, SOCK_TIMESTAMP); - net_disable_timestamp(); + if (sock_flag(sk, flag)) { + sock_reset_flag(sk, flag); + if (!sock_flag(sk, SOCK_TIMESTAMP) && + !sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) { + net_disable_timestamp(); + } } } @@ -614,13 +618,38 @@ set_rcvbuf: else sock_set_flag(sk, SOCK_RCVTSTAMPNS); sock_set_flag(sk, SOCK_RCVTSTAMP); - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); } else { sock_reset_flag(sk, SOCK_RCVTSTAMP); sock_reset_flag(sk, SOCK_RCVTSTAMPNS); } break; + case SO_TIMESTAMPING: + if (val & ~SOF_TIMESTAMPING_MASK) { + ret = EINVAL; + break; + } + sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE, + val & SOF_TIMESTAMPING_TX_HARDWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE, + val & SOF_TIMESTAMPING_TX_SOFTWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE, + val & SOF_TIMESTAMPING_RX_HARDWARE); + if (val & SOF_TIMESTAMPING_RX_SOFTWARE) + sock_enable_timestamp(sk, + SOCK_TIMESTAMPING_RX_SOFTWARE); + else + sock_disable_timestamp(sk, + SOCK_TIMESTAMPING_RX_SOFTWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_SOFTWARE, + val & SOF_TIMESTAMPING_SOFTWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE, + val & SOF_TIMESTAMPING_SYS_HARDWARE); + sock_valbool_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE, + val & SOF_TIMESTAMPING_RAW_HARDWARE); + break; + case SO_RCVLOWAT: if (val < 0) val = INT_MAX; @@ -766,6 +795,24 @@ int sock_getsockopt(struct socket *sock, int level, int optname, v.val = sock_flag(sk, SOCK_RCVTSTAMPNS); break; + case SO_TIMESTAMPING: + v.val = 0; + if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE)) + v.val |= SOF_TIMESTAMPING_TX_HARDWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE)) + v.val |= SOF_TIMESTAMPING_TX_SOFTWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE)) + v.val |= SOF_TIMESTAMPING_RX_HARDWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) + v.val |= SOF_TIMESTAMPING_RX_SOFTWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) + v.val |= SOF_TIMESTAMPING_SOFTWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE)) + v.val |= SOF_TIMESTAMPING_SYS_HARDWARE; + if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) + v.val |= SOF_TIMESTAMPING_RAW_HARDWARE; + break; + case SO_RCVTIMEO: lv=sizeof(struct timeval); if (sk->sk_rcvtimeo == MAX_SCHEDULE_TIMEOUT) { @@ -967,7 +1014,8 @@ void sk_free(struct sock *sk) rcu_assign_pointer(sk->sk_filter, NULL); } - sock_disable_timestamp(sk); + sock_disable_timestamp(sk, SOCK_TIMESTAMP); + sock_disable_timestamp(sk, SOCK_TIMESTAMPING_RX_SOFTWARE); if (atomic_read(&sk->sk_omem_alloc)) printk(KERN_DEBUG "%s: optmem leakage (%d bytes) detected.\n", @@ -1785,7 +1833,7 @@ int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp) { struct timeval tv; if (!sock_flag(sk, SOCK_TIMESTAMP)) - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); tv = ktime_to_timeval(sk->sk_stamp); if (tv.tv_sec == -1) return -ENOENT; @@ -1801,7 +1849,7 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp) { struct timespec ts; if (!sock_flag(sk, SOCK_TIMESTAMP)) - sock_enable_timestamp(sk); + sock_enable_timestamp(sk, SOCK_TIMESTAMP); ts = ktime_to_timespec(sk->sk_stamp); if (ts.tv_sec == -1) return -ENOENT; @@ -1813,11 +1861,20 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp) } EXPORT_SYMBOL(sock_get_timestampns); -void sock_enable_timestamp(struct sock *sk) +void sock_enable_timestamp(struct sock *sk, int flag) { - if (!sock_flag(sk, SOCK_TIMESTAMP)) { - sock_set_flag(sk, SOCK_TIMESTAMP); - net_enable_timestamp(); + if (!sock_flag(sk, flag)) { + sock_set_flag(sk, flag); + /* + * we just set one of the two flags which require net + * time stamping, but time stamping might have been on + * already because of the other one + */ + if (!sock_flag(sk, + flag == SOCK_TIMESTAMP ? + SOCK_TIMESTAMPING_RX_SOFTWARE : + SOCK_TIMESTAMP)) + net_enable_timestamp(); } } diff --git a/net/socket.c b/net/socket.c index 35dd737..47a3dc0 100644 --- a/net/socket.c +++ b/net/socket.c @@ -545,6 +545,18 @@ void sock_release(struct socket *sock) sock->file = NULL; } +int sock_tx_timestamp(struct msghdr *msg, struct sock *sk, + union skb_shared_tx *shtx) +{ + shtx->flags = 0; + if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE)) + shtx->hardware = 1; + if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE)) + shtx->software = 1; + return 0; +} +EXPORT_SYMBOL(sock_tx_timestamp); + static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size) { @@ -595,33 +607,65 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg, return result; } +static int ktime2ts(ktime_t kt, struct timespec *ts) +{ + if (kt.tv64) { + *ts = ktime_to_timespec(kt); + return 1; + } else { + return 0; + } +} + /* * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP) */ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { - ktime_t kt = skb->tstamp; - - if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) { - struct timeval tv; - /* Race occurred between timestamp enabling and packet - receiving. Fill in the current time for now. */ - if (kt.tv64 == 0) - kt = ktime_get_real(); - skb->tstamp = kt; - tv = ktime_to_timeval(kt); - put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP, sizeof(tv), &tv); - } else { - struct timespec ts; - /* Race occurred between timestamp enabling and packet - receiving. Fill in the current time for now. */ - if (kt.tv64 == 0) - kt = ktime_get_real(); - skb->tstamp = kt; - ts = ktime_to_timespec(kt); - put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS, sizeof(ts), &ts); + int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP); + struct timespec ts[3]; + int empty = 1; + struct skb_shared_hwtstamps *shhwtstamps = + skb_hwtstamps(skb); + + /* Race occurred between timestamp enabling and packet + receiving. Fill in the current time for now. */ + if (need_software_tstamp && skb->tstamp.tv64 == 0) + __net_timestamp(skb); + + if (need_software_tstamp) { + if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) { + struct timeval tv; + skb_get_timestamp(skb, &tv); + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP, + sizeof(tv), &tv); + } else { + struct timespec ts; + skb_get_timestampns(skb, &ts); + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS, + sizeof(ts), &ts); + } + } + + + memset(ts, 0, sizeof(ts)); + if (skb->tstamp.tv64 && + sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) { + skb_get_timestampns(skb, ts + 0); + empty = 0; + } + if (shhwtstamps) { + if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE) && + ktime2ts(shhwtstamps->syststamp, ts + 1)) + empty = 0; + if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE) && + ktime2ts(shhwtstamps->hwtstamp, ts + 2)) + empty = 0; } + if (!empty) + put_cmsg(msg, SOL_SOCKET, + SCM_TIMESTAMPING, sizeof(ts), &ts); } EXPORT_SYMBOL_GPL(__sock_recv_timestamp); -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets 2009-02-12 15:03 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Instructions for time stamping outgoing packets are take from the socket layer and later copied into the new skb. Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- Documentation/networking/timestamping.txt | 2 ++ include/net/ip.h | 1 + net/can/raw.c | 3 +++ net/ipv4/icmp.c | 2 ++ net/ipv4/ip_output.c | 6 ++++++ net/ipv4/raw.c | 1 + net/ipv4/udp.c | 4 ++++ 7 files changed, 19 insertions(+), 0 deletions(-) diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index a681a65..0e58b45 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -56,6 +56,8 @@ and including the link layer, the scm_timestamping control message and a sock_extended_err control message with ee_errno==ENOMSG and ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending bounced packet is ready for reading as far as select() is concerned. +If the outgoing packet has to be fragmented, then only the first +fragment is time stamped and returned to the sending socket. All three values correspond to the same event in time, but were generated in different ways. Each of these values may be empty (= all diff --git a/include/net/ip.h b/include/net/ip.h index 1086813..4ac7577 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -55,6 +55,7 @@ struct ipcm_cookie __be32 addr; int oif; struct ip_options *opt; + union skb_shared_tx shtx; }; #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb)) diff --git a/net/can/raw.c b/net/can/raw.c index 0703cba..6aa154e 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -648,6 +648,9 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock, err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size); if (err < 0) goto free_skb; + err = sock_tx_timestamp(msg, sk, skb_tx(skb)); + if (err < 0) + goto free_skb; skb->dev = dev; skb->sk = sk; diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 705b33b..382800a 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -375,6 +375,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) inet->tos = ip_hdr(skb)->tos; daddr = ipc.addr = rt->rt_src; ipc.opt = NULL; + ipc.shtx.flags = 0; if (icmp_param->replyopts.optlen) { ipc.opt = &icmp_param->replyopts; if (ipc.opt->srr) @@ -532,6 +533,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) inet_sk(sk)->tos = tos; ipc.addr = iph->saddr; ipc.opt = &icmp_param.replyopts; + ipc.shtx.flags = 0; { struct flowi fl = { diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 8ebe86d..3e7e910 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -935,6 +935,10 @@ alloc_new_skb: sk->sk_allocation); if (unlikely(skb == NULL)) err = -ENOBUFS; + else + /* only the initial fragment is + time stamped */ + ipc->shtx.flags = 0; } if (skb == NULL) goto error; @@ -945,6 +949,7 @@ alloc_new_skb: skb->ip_summed = csummode; skb->csum = 0; skb_reserve(skb, hh_len); + *skb_tx(skb) = ipc->shtx; /* * Find where to start putting bytes. @@ -1364,6 +1369,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar daddr = ipc.addr = rt->rt_src; ipc.opt = NULL; + ipc.shtx.flags = 0; if (replyopts.opt.optlen) { ipc.opt = &replyopts.opt; diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index dff8bc4..f774651 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -493,6 +493,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, ipc.addr = inet->saddr; ipc.opt = NULL; + ipc.shtx.flags = 0; ipc.oif = sk->sk_bound_dev_if; if (msg->msg_controllen) { diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c47c989..4bd178a 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -596,6 +596,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, return -EOPNOTSUPP; ipc.opt = NULL; + ipc.shtx.flags = 0; if (up->pending) { /* @@ -643,6 +644,9 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, ipc.addr = inet->saddr; ipc.oif = sk->sk_bound_dev_if; + err = sock_tx_timestamp(msg, sk, &ipc.shtx); + if (err) + return err; if (msg->msg_controllen) { err = ip_cmsg_send(sock_net(sk), msg, &ipc); if (err) -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers 2009-02-12 15:03 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- fs/compat_ioctl.c | 6 ++++++ net/core/dev.c | 2 ++ 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c index c03c10d..ab7585a 100644 --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -522,6 +522,11 @@ static int dev_ifsioc(unsigned int fd, unsigned int cmd, unsigned long arg) if (err) return -EFAULT; break; + case SIOCSHWTSTAMP: + if (copy_from_user(&ifr, uifr32, sizeof(*uifr32))) + return -EFAULT; + ifr.ifr_data = compat_ptr(uifr32->ifr_ifru.ifru_data); + break; default: if (copy_from_user(&ifr, uifr32, sizeof(*uifr32))) return -EFAULT; @@ -2563,6 +2568,7 @@ HANDLE_IOCTL(SIOCSIFMAP, dev_ifsioc) HANDLE_IOCTL(SIOCGIFADDR, dev_ifsioc) HANDLE_IOCTL(SIOCSIFADDR, dev_ifsioc) HANDLE_IOCTL(SIOCSIFHWBROADCAST, dev_ifsioc) +HANDLE_IOCTL(SIOCSHWTSTAMP, dev_ifsioc) /* ioctls used by appletalk ddp.c */ HANDLE_IOCTL(SIOCATALKDIFADDR, dev_ifsioc) diff --git a/net/core/dev.c b/net/core/dev.c index d20c28e..d393fc9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4019,6 +4019,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd) cmd == SIOCSMIIREG || cmd == SIOCBRADDIF || cmd == SIOCBRDELIF || + cmd == SIOCSHWTSTAMP || cmd == SIOCWANDEV) { err = -EOPNOTSUPP; if (ops->ndo_do_ioctl) { @@ -4173,6 +4174,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg) case SIOCBONDCHANGEACTIVE: case SIOCBRADDIF: case SIOCBRDELIF: + case SIOCSHWTSTAMP: if (!capable(CAP_NET_ADMIN)) return -EPERM; /* fall through */ -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 08/10] igb: access to NIC time 2009-02-12 15:03 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Adds the register definitions and code to read the time register. Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- drivers/net/igb/e1000_regs.h | 28 +++++++++++ drivers/net/igb/igb.h | 4 ++ drivers/net/igb/igb_main.c | 111 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 143 insertions(+), 0 deletions(-) diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h index 5038b73..64d95cd 100644 --- a/drivers/net/igb/e1000_regs.h +++ b/drivers/net/igb/e1000_regs.h @@ -75,6 +75,34 @@ #define E1000_FCRTH 0x02168 /* Flow Control Receive Threshold High - RW */ #define E1000_RDFPCQ(_n) (0x02430 + (0x4 * (_n))) #define E1000_FCRTV 0x02460 /* Flow Control Refresh Timer Value - RW */ + +/* IEEE 1588 TIMESYNCH */ +#define E1000_TSYNCTXCTL 0x0B614 +#define E1000_TSYNCRXCTL 0x0B620 +#define E1000_TSYNCRXCFG 0x05F50 + +#define E1000_SYSTIML 0x0B600 +#define E1000_SYSTIMH 0x0B604 +#define E1000_TIMINCA 0x0B608 + +#define E1000_RXMTRL 0x0B634 +#define E1000_RXSTMPL 0x0B624 +#define E1000_RXSTMPH 0x0B628 +#define E1000_RXSATRL 0x0B62C +#define E1000_RXSATRH 0x0B630 + +#define E1000_TXSTMPL 0x0B618 +#define E1000_TXSTMPH 0x0B61C + +#define E1000_ETQF0 0x05CB0 +#define E1000_ETQF1 0x05CB4 +#define E1000_ETQF2 0x05CB8 +#define E1000_ETQF3 0x05CBC +#define E1000_ETQF4 0x05CC0 +#define E1000_ETQF5 0x05CC4 +#define E1000_ETQF6 0x05CC8 +#define E1000_ETQF7 0x05CCC + /* Split and Replication RX Control - RW */ /* * Convenience macros diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h index e507449..797a9fe 100644 --- a/drivers/net/igb/igb.h +++ b/drivers/net/igb/igb.h @@ -34,6 +34,8 @@ #include "e1000_mac.h" #include "e1000_82575.h" +#include <linux/clocksource.h> + struct igb_adapter; /* Interrupt defines */ @@ -251,6 +253,8 @@ struct igb_adapter { struct napi_struct napi; struct pci_dev *pdev; struct net_device_stats net_stats; + struct cyclecounter cycles; + struct timecounter clock; /* structs defined in e1000_hw.h */ struct e1000_hw hw; diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index f8c2919..8b2ba42 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -175,6 +175,54 @@ MODULE_DESCRIPTION("Intel(R) Gigabit Ethernet Network Driver"); MODULE_LICENSE("GPL"); MODULE_VERSION(DRV_VERSION); +/** + * Scale the NIC clock cycle by a large factor so that + * relatively small clock corrections can be added or + * substracted at each clock tick. The drawbacks of a + * large factor are a) that the clock register overflows + * more quickly (not such a big deal) and b) that the + * increment per tick has to fit into 24 bits. + * + * Note that + * TIMINCA = IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS * + * IGB_TSYNC_SCALE + * TIMINCA += TIMINCA * adjustment [ppm] / 1e9 + * + * The base scale factor is intentionally a power of two + * so that the division in %struct timecounter can be done with + * a shift. + */ +#define IGB_TSYNC_SHIFT (19) +#define IGB_TSYNC_SCALE (1<<IGB_TSYNC_SHIFT) + +/** + * The duration of one clock cycle of the NIC. + * + * @todo This hard-coded value is part of the specification and might change + * in future hardware revisions. Add revision check. + */ +#define IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS 16 + +#if (IGB_TSYNC_SCALE * IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS) >= (1<<24) +# error IGB_TSYNC_SCALE and/or IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS are too large to fit into TIMINCA +#endif + +/** + * igb_read_clock - read raw cycle counter (to be used by time counter) + */ +static cycle_t igb_read_clock(const struct cyclecounter *tc) +{ + struct igb_adapter *adapter = + container_of(tc, struct igb_adapter, cycles); + struct e1000_hw *hw = &adapter->hw; + u64 stamp; + + stamp = rd32(E1000_SYSTIML); + stamp |= (u64)rd32(E1000_SYSTIMH) << 32ULL; + + return stamp; +} + #ifdef DEBUG /** * igb_get_hw_dev_name - return device name string @@ -185,6 +233,29 @@ char *igb_get_hw_dev_name(struct e1000_hw *hw) struct igb_adapter *adapter = hw->back; return adapter->netdev->name; } + +/** + * igb_get_time_str - format current NIC and system time as string + */ +static char *igb_get_time_str(struct igb_adapter *adapter, + char buffer[160]) +{ + cycle_t hw = adapter->cycles.read(&adapter->cycles); + struct timespec nic = ns_to_timespec(timecounter_read(&adapter->clock)); + struct timespec sys; + struct timespec delta; + getnstimeofday(&sys); + + delta = timespec_sub(nic, sys); + + sprintf(buffer, + "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns", + (long)nic.tv_sec, nic.tv_nsec, + (long)sys.tv_sec, sys.tv_nsec, + (long)delta.tv_sec, delta.tv_nsec); + + return buffer; +} #endif /** @@ -1298,6 +1369,46 @@ static int __devinit igb_probe(struct pci_dev *pdev, } #endif + /* + * Initialize hardware timer: we keep it running just in case + * that some program needs it later on. + */ + memset(&adapter->cycles, 0, sizeof(adapter->cycles)); + adapter->cycles.read = igb_read_clock; + adapter->cycles.mask = CLOCKSOURCE_MASK(64); + adapter->cycles.mult = 1; + adapter->cycles.shift = IGB_TSYNC_SHIFT; + wr32(E1000_TIMINCA, + (1<<24) | + IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS * IGB_TSYNC_SCALE); +#if 0 + /* + * Avoid rollover while we initialize by resetting the time counter. + */ + wr32(E1000_SYSTIML, 0x00000000); + wr32(E1000_SYSTIMH, 0x00000000); +#else + /* + * Set registers so that rollover occurs soon to test this. + */ + wr32(E1000_SYSTIML, 0x00000000); + wr32(E1000_SYSTIMH, 0xFF800000); +#endif + wrfl(); + timecounter_init(&adapter->clock, + &adapter->cycles, + ktime_to_ns(ktime_get_real())); + +#ifdef DEBUG + { + char buffer[160]; + printk(KERN_DEBUG + "igb: %s: hw %p initialized timer\n", + igb_get_time_str(adapter, buffer), + &adapter->hw); + } +#endif + dev_info(&pdev->dev, "Intel(R) Gigabit Ethernet Network Connection\n"); /* print bus type/speed/width info */ dev_info(&pdev->dev, "%s: (PCIe:%s:%s) %pM\n", -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP 2009-02-12 15:03 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- drivers/net/igb/igb_main.c | 31 +++++++++++++++++++++++++++++++ 1 files changed, 31 insertions(+), 0 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index 8b2ba42..90090bb 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -34,6 +34,7 @@ #include <linux/ipv6.h> #include <net/checksum.h> #include <net/ip6_checksum.h> +#include <linux/net_tstamp.h> #include <linux/mii.h> #include <linux/ethtool.h> #include <linux/if_vlan.h> @@ -4182,6 +4183,34 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) } /** + * igb_hwtstamp_ioctl - control hardware time stamping + * @netdev: + * @ifreq: + * @cmd: + * + * Currently cannot enable any kind of hardware time stamping, but + * supports SIOCSHWTSTAMP in general. + **/ +static int igb_hwtstamp_ioctl(struct net_device *netdev, + struct ifreq *ifr, int cmd) +{ + struct hwtstamp_config config; + + if (copy_from_user(&config, ifr->ifr_data, sizeof(config))) + return -EFAULT; + + /* reserved for future extensions */ + if (config.flags) + return -EINVAL; + + if (config.tx_type == HWTSTAMP_TX_OFF && + config.rx_filter == HWTSTAMP_FILTER_NONE) + return 0; + + return -ERANGE; +} + +/** * igb_ioctl - * @netdev: * @ifreq: @@ -4194,6 +4223,8 @@ static int igb_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) case SIOCGMIIREG: case SIOCSMIIREG: return igb_mii_ioctl(netdev, ifr, cmd); + case SIOCSHWTSTAMP: + return igb_hwtstamp_ioctl(netdev, ifr, cmd); default: return -EOPNOTSUPP; } -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping 2009-02-12 15:03 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly @ 2009-02-12 15:03 ` Patrick Ohly 0 siblings, 0 replies; 25+ messages in thread From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw) To: David Miller Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly Both TX and RX hardware time stamping are implemented. Due to hardware limitations it is not possible to verify reliably which packet was time stamped when multiple were pending for sending; this could be solved by only allowing one packet marked for hardware time stamping into the queue (not implemented yet). RX time stamping relies on the flag in the packet descriptor which marks packets that were time stamped. In "all packet" mode this flag is not set. TODO: also support that mode (even though it'll suffer from race conditions). Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> --- drivers/net/igb/e1000_82575.h | 1 + drivers/net/igb/e1000_defines.h | 1 + drivers/net/igb/e1000_regs.h | 40 ++++++ drivers/net/igb/igb.h | 4 + drivers/net/igb/igb_main.c | 266 +++++++++++++++++++++++++++++++++++++-- 5 files changed, 304 insertions(+), 8 deletions(-) diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h index dd50237..e613d5a 100644 --- a/drivers/net/igb/e1000_82575.h +++ b/drivers/net/igb/e1000_82575.h @@ -116,6 +116,7 @@ union e1000_adv_tx_desc { }; /* Adv Transmit Descriptor Config Masks */ +#define E1000_ADVTXD_MAC_TSTAMP 0x00080000 /* IEEE1588 Timestamp packet */ #define E1000_ADVTXD_DTYP_CTXT 0x00200000 /* Advanced Context Descriptor */ #define E1000_ADVTXD_DTYP_DATA 0x00300000 /* Advanced Data Descriptor */ #define E1000_ADVTXD_DCMD_IFCS 0x02000000 /* Insert FCS (Ethernet CRC) */ diff --git a/drivers/net/igb/e1000_defines.h b/drivers/net/igb/e1000_defines.h index 5342e23..79168ee 100644 --- a/drivers/net/igb/e1000_defines.h +++ b/drivers/net/igb/e1000_defines.h @@ -104,6 +104,7 @@ #define E1000_RXD_STAT_UDPCS 0x10 /* UDP xsum calculated */ #define E1000_RXD_STAT_TCPCS 0x20 /* TCP xsum calculated */ #define E1000_RXD_STAT_DYNINT 0x800 /* Pkt caused INT via DYNINT */ +#define E1000_RXD_STAT_TS 0x10000 /* Pkt was time stamped */ #define E1000_RXD_ERR_CE 0x01 /* CRC Error */ #define E1000_RXD_ERR_SE 0x02 /* Symbol Error */ #define E1000_RXD_ERR_SEQ 0x04 /* Sequence Error */ diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h index 64d95cd..1fb19ca 100644 --- a/drivers/net/igb/e1000_regs.h +++ b/drivers/net/igb/e1000_regs.h @@ -78,9 +78,37 @@ /* IEEE 1588 TIMESYNCH */ #define E1000_TSYNCTXCTL 0x0B614 +#define E1000_TSYNCTXCTL_VALID (1<<0) +#define E1000_TSYNCTXCTL_ENABLED (1<<4) #define E1000_TSYNCRXCTL 0x0B620 +#define E1000_TSYNCRXCTL_VALID (1<<0) +#define E1000_TSYNCRXCTL_ENABLED (1<<4) +enum { + E1000_TSYNCRXCTL_TYPE_L2_V2 = 0, + E1000_TSYNCRXCTL_TYPE_L4_V1 = (1<<1), + E1000_TSYNCRXCTL_TYPE_L2_L4_V2 = (1<<2), + E1000_TSYNCRXCTL_TYPE_ALL = (1<<3), + E1000_TSYNCRXCTL_TYPE_EVENT_V2 = (1<<3) | (1<<1), +}; #define E1000_TSYNCRXCFG 0x05F50 +enum { + E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE = 0<<0, + E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE = 1<<0, + E1000_TSYNCRXCFG_PTP_V1_FOLLOWUP_MESSAGE = 2<<0, + E1000_TSYNCRXCFG_PTP_V1_DELAY_RESP_MESSAGE = 3<<0, + E1000_TSYNCRXCFG_PTP_V1_MANAGEMENT_MESSAGE = 4<<0, + E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE = 0<<8, + E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE = 1<<8, + E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_REQ_MESSAGE = 2<<8, + E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_RESP_MESSAGE = 3<<8, + E1000_TSYNCRXCFG_PTP_V2_FOLLOWUP_MESSAGE = 8<<8, + E1000_TSYNCRXCFG_PTP_V2_DELAY_RESP_MESSAGE = 9<<8, + E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_FOLLOWUP_MESSAGE = 0xA<<8, + E1000_TSYNCRXCFG_PTP_V2_ANNOUNCE_MESSAGE = 0xB<<8, + E1000_TSYNCRXCFG_PTP_V2_SIGNALLING_MESSAGE = 0xC<<8, + E1000_TSYNCRXCFG_PTP_V2_MANAGEMENT_MESSAGE = 0xD<<8, +}; #define E1000_SYSTIML 0x0B600 #define E1000_SYSTIMH 0x0B604 #define E1000_TIMINCA 0x0B608 @@ -103,6 +131,18 @@ #define E1000_ETQF6 0x05CC8 #define E1000_ETQF7 0x05CCC +/* Filtering Registers */ +#define E1000_SAQF(_n) (0x5980 + 4 * (_n)) +#define E1000_DAQF(_n) (0x59A0 + 4 * (_n)) +#define E1000_SPQF(_n) (0x59C0 + 4 * (_n)) +#define E1000_FTQF(_n) (0x59E0 + 4 * (_n)) +#define E1000_SAQF0 E1000_SAQF(0) +#define E1000_DAQF0 E1000_DAQF(0) +#define E1000_SPQF0 E1000_SPQF(0) +#define E1000_FTQF0 E1000_FTQF(0) +#define E1000_SYNQF(_n) (0x055FC + (4 * (_n))) /* SYN Packet Queue Fltr */ +#define E1000_ETQF(_n) (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */ + /* Split and Replication RX Control - RW */ /* * Convenience macros diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h index 797a9fe..bb8c35c 100644 --- a/drivers/net/igb/igb.h +++ b/drivers/net/igb/igb.h @@ -35,6 +35,8 @@ #include "e1000_82575.h" #include <linux/clocksource.h> +#include <linux/timecompare.h> +#include <linux/net_tstamp.h> struct igb_adapter; @@ -255,6 +257,8 @@ struct igb_adapter { struct net_device_stats net_stats; struct cyclecounter cycles; struct timecounter clock; + struct timecompare compare; + struct hwtstamp_config hwtstamp_config; /* structs defined in e1000_hw.h */ struct e1000_hw hw; diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index 90090bb..f9d576b 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -250,7 +250,8 @@ static char *igb_get_time_str(struct igb_adapter *adapter, delta = timespec_sub(nic, sys); sprintf(buffer, - "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns", + "HW %llu, NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns", + hw, (long)nic.tv_sec, nic.tv_nsec, (long)sys.tv_sec, sys.tv_nsec, (long)delta.tv_sec, delta.tv_nsec); @@ -1400,6 +1401,18 @@ static int __devinit igb_probe(struct pci_dev *pdev, &adapter->cycles, ktime_to_ns(ktime_get_real())); + /* + * Synchronize our NIC clock against system wall clock. NIC + * time stamp reading requires ~3us per sample, each sample + * was pretty stable even under load => only require 10 + * samples for each offset comparison. + */ + memset(&adapter->compare, 0, sizeof(adapter->compare)); + adapter->compare.source = &adapter->clock; + adapter->compare.target = ktime_get_real; + adapter->compare.num_samples = 10; + timecompare_update(&adapter->compare, 0); + #ifdef DEBUG { char buffer[160]; @@ -2748,6 +2761,7 @@ set_itr_now: #define IGB_TX_FLAGS_VLAN 0x00000002 #define IGB_TX_FLAGS_TSO 0x00000004 #define IGB_TX_FLAGS_IPV4 0x00000008 +#define IGB_TX_FLAGS_TSTAMP 0x00000010 #define IGB_TX_FLAGS_VLAN_MASK 0xffff0000 #define IGB_TX_FLAGS_VLAN_SHIFT 16 @@ -2975,6 +2989,9 @@ static inline void igb_tx_queue_adv(struct igb_adapter *adapter, if (tx_flags & IGB_TX_FLAGS_VLAN) cmd_type_len |= E1000_ADVTXD_DCMD_VLE; + if (tx_flags & IGB_TX_FLAGS_TSTAMP) + cmd_type_len |= E1000_ADVTXD_MAC_TSTAMP; + if (tx_flags & IGB_TX_FLAGS_TSO) { cmd_type_len |= E1000_ADVTXD_DCMD_TSE; @@ -3065,6 +3082,7 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb, unsigned int tx_flags = 0; u8 hdr_len = 0; int tso = 0; + union skb_shared_tx *shtx; if (test_bit(__IGB_DOWN, &adapter->state)) { dev_kfree_skb_any(skb); @@ -3085,7 +3103,29 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb, /* this is a hard error */ return NETDEV_TX_BUSY; } - skb_orphan(skb); + + /* + * TODO: check that there currently is no other packet with + * time stamping in the queue + * + * When doing time stamping, keep the connection to the socket + * a while longer: it is still needed by skb_hwtstamp_tx(), + * called either in igb_tx_hwtstamp() or by our caller when + * doing software time stamping. + */ + shtx = skb_tx(skb); + if (unlikely(shtx->hardware)) { + shtx->in_progress = 1; + tx_flags |= IGB_TX_FLAGS_TSTAMP; + } else if (likely(!shtx->software)) { + /* + * TODO: can this be solved in dev.c:dev_hard_start_xmit()? + * There are probably unmodified driver which do something + * like this and thus don't work in combination with + * SOF_TIMESTAMPING_TX_SOFTWARE. + */ + skb_orphan(skb); + } if (adapter->vlgrp && vlan_tx_tag_present(skb)) { tx_flags |= IGB_TX_FLAGS_VLAN; @@ -3744,6 +3784,43 @@ static int igb_clean_rx_ring_msix(struct napi_struct *napi, int budget) } /** + * igb_hwtstamp - utility function which checks for TX time stamp + * @adapter: board private structure + * @skb: packet that was just sent + * + * If we were asked to do hardware stamping and such a time stamp is + * available, then it must have been for this skb here because we only + * allow only one such packet into the queue. + */ +static void igb_tx_hwtstamp(struct igb_adapter *adapter, struct sk_buff *skb) +{ + union skb_shared_tx *shtx = skb_tx(skb); + struct e1000_hw *hw = &adapter->hw; + + if (unlikely(shtx->hardware)) { + u32 valid = rd32(E1000_TSYNCTXCTL) & E1000_TSYNCTXCTL_VALID; + if (valid) { + u64 regval = rd32(E1000_TXSTMPL); + u64 ns; + struct skb_shared_hwtstamps shhwtstamps; + + memset(&shhwtstamps, 0, sizeof(shhwtstamps)); + regval |= (u64)rd32(E1000_TXSTMPH) << 32; + ns = timecounter_cyc2time(&adapter->clock, + regval); + timecompare_update(&adapter->compare, ns); + shhwtstamps.hwtstamp = ns_to_ktime(ns); + shhwtstamps.syststamp = + timecompare_transform(&adapter->compare, ns); + skb_tstamp_tx(skb, &shhwtstamps); + } + + /* delayed orphaning: skb_tstamp_tx() needs the socket */ + skb_orphan(skb); + } +} + +/** * igb_clean_tx_irq - Reclaim resources after transmit completes * @adapter: board private structure * returns true if ring is completely cleaned @@ -3781,6 +3858,8 @@ static bool igb_clean_tx_irq(struct igb_ring *tx_ring) skb->len; total_packets += segs; total_bytes += bytecount; + + igb_tx_hwtstamp(adapter, skb); } igb_unmap_and_free_tx_resource(adapter, buffer_info); @@ -3914,6 +3993,7 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring, { struct igb_adapter *adapter = rx_ring->adapter; struct net_device *netdev = adapter->netdev; + struct e1000_hw *hw = &adapter->hw; struct pci_dev *pdev = adapter->pdev; union e1000_adv_rx_desc *rx_desc , *next_rxd; struct igb_buffer *buffer_info , *next_buffer; @@ -4006,6 +4086,47 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring, goto next_desc; } send_up: + /* + * If this bit is set, then the RX registers contain + * the time stamp. No other packet will be time + * stamped until we read these registers, so read the + * registers to make them available again. Because + * only one packet can be time stamped at a time, we + * know that the register values must belong to this + * one here and therefore we don't need to compare + * any of the additional attributes stored for it. + * + * If nothing went wrong, then it should have a + * skb_shared_tx that we can turn into a + * skb_shared_hwtstamps. + * + * TODO: can time stamping be triggered (thus locking + * the registers) without the packet reaching this point + * here? In that case RX time stamping would get stuck. + * + * TODO: in "time stamp all packets" mode this bit is + * not set. Need a global flag for this mode and then + * always read the registers. Cannot be done without + * a race condition. + */ + if (unlikely(staterr & E1000_RXD_STAT_TS)) { + u64 regval; + u64 ns; + struct skb_shared_hwtstamps *shhwtstamps = + skb_hwtstamps(skb); + + WARN(!(rd32(E1000_TSYNCRXCTL) & E1000_TSYNCRXCTL_VALID), + "igb: no RX time stamp available for time stamped packet"); + regval = rd32(E1000_RXSTMPL); + regval |= (u64)rd32(E1000_RXSTMPH) << 32; + ns = timecounter_cyc2time(&adapter->clock, regval); + timecompare_update(&adapter->compare, ns); + memset(shhwtstamps, 0, sizeof(*shhwtstamps)); + shhwtstamps->hwtstamp = ns_to_ktime(ns); + shhwtstamps->syststamp = + timecompare_transform(&adapter->compare, ns); + } + if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) { dev_kfree_skb_irq(skb); goto next_desc; @@ -4188,13 +4309,33 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) * @ifreq: * @cmd: * - * Currently cannot enable any kind of hardware time stamping, but - * supports SIOCSHWTSTAMP in general. + * Outgoing time stamping can be enabled and disabled. Play nice and + * disable it when requested, although it shouldn't case any overhead + * when no packet needs it. At most one packet in the queue may be + * marked for time stamping, otherwise it would be impossible to tell + * for sure to which packet the hardware time stamp belongs. + * + * Incoming time stamping has to be configured via the hardware + * filters. Not all combinations are supported, in particular event + * type has to be specified. Matching the kind of event packet is + * not supported, with the exception of "all V2 events regardless of + * level 2 or 4". + * **/ static int igb_hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) { + struct igb_adapter *adapter = netdev_priv(netdev); + struct e1000_hw *hw = &adapter->hw; struct hwtstamp_config config; + u32 tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED; + u32 tsync_rx_ctl_bit = E1000_TSYNCRXCTL_ENABLED; + u32 tsync_rx_ctl_type = 0; + u32 tsync_rx_cfg = 0; + int is_l4 = 0; + int is_l2 = 0; + short port = 319; /* PTP */ + u32 regval; if (copy_from_user(&config, ifr->ifr_data, sizeof(config))) return -EFAULT; @@ -4203,11 +4344,120 @@ static int igb_hwtstamp_ioctl(struct net_device *netdev, if (config.flags) return -EINVAL; - if (config.tx_type == HWTSTAMP_TX_OFF && - config.rx_filter == HWTSTAMP_FILTER_NONE) - return 0; + switch (config.tx_type) { + case HWTSTAMP_TX_OFF: + tsync_tx_ctl_bit = 0; + break; + case HWTSTAMP_TX_ON: + tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED; + break; + default: + return -ERANGE; + } + + switch (config.rx_filter) { + case HWTSTAMP_FILTER_NONE: + tsync_rx_ctl_bit = 0; + break; + case HWTSTAMP_FILTER_PTP_V1_L4_EVENT: + case HWTSTAMP_FILTER_PTP_V2_L4_EVENT: + case HWTSTAMP_FILTER_PTP_V2_L2_EVENT: + case HWTSTAMP_FILTER_ALL: + /* + * register TSYNCRXCFG must be set, therefore it is not + * possible to time stamp both Sync and Delay_Req messages + * => fall back to time stamping all packets + */ + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_ALL; + config.rx_filter = HWTSTAMP_FILTER_ALL; + break; + case HWTSTAMP_FILTER_PTP_V1_L4_SYNC: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1; + tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE; + is_l4 = 1; + break; + case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1; + tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE; + is_l4 = 1; + break; + case HWTSTAMP_FILTER_PTP_V2_L2_SYNC: + case HWTSTAMP_FILTER_PTP_V2_L4_SYNC: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2; + tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE; + is_l2 = 1; + is_l4 = 1; + config.rx_filter = HWTSTAMP_FILTER_SOME; + break; + case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ: + case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2; + tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE; + is_l2 = 1; + is_l4 = 1; + config.rx_filter = HWTSTAMP_FILTER_SOME; + break; + case HWTSTAMP_FILTER_PTP_V2_EVENT: + case HWTSTAMP_FILTER_PTP_V2_SYNC: + case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: + tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_EVENT_V2; + config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT; + is_l2 = 1; + break; + default: + return -ERANGE; + } + + /* enable/disable TX */ + regval = rd32(E1000_TSYNCTXCTL); + regval = (regval & ~E1000_TSYNCTXCTL_ENABLED) | tsync_tx_ctl_bit; + wr32(E1000_TSYNCTXCTL, regval); + + /* enable/disable RX, define which PTP packets are time stamped */ + regval = rd32(E1000_TSYNCRXCTL); + regval = (regval & ~E1000_TSYNCRXCTL_ENABLED) | tsync_rx_ctl_bit; + regval = (regval & ~0xE) | tsync_rx_ctl_type; + wr32(E1000_TSYNCRXCTL, regval); + wr32(E1000_TSYNCRXCFG, tsync_rx_cfg); + + /* + * Ethertype Filter Queue Filter[0][15:0] = 0x88F7 + * (Ethertype to filter on) + * Ethertype Filter Queue Filter[0][26] = 0x1 (Enable filter) + * Ethertype Filter Queue Filter[0][30] = 0x1 (Enable Timestamping) + */ + wr32(E1000_ETQF0, is_l2 ? 0x440088f7 : 0); + + /* L4 Queue Filter[0]: only filter by source and destination port */ + wr32(E1000_SPQF0, htons(port)); + wr32(E1000_IMIREXT(0), is_l4 ? + ((1<<12) | (1<<19) /* bypass size and control flags */) : 0); + wr32(E1000_IMIR(0), is_l4 ? + (htons(port) + | (0<<16) /* immediate interrupt disabled */ + | 0 /* (1<<17) bit cleared: do not bypass + destination port check */) + : 0); + wr32(E1000_FTQF0, is_l4 ? + (0x11 /* UDP */ + | (1<<15) /* VF not compared */ + | (1<<27) /* Enable Timestamping */ + | (7<<28) /* only source port filter enabled, + source/target address and protocol + masked */) + : ((1<<15) | (15<<28) /* all mask bits set = filter not + enabled */)); + + wrfl(); + + adapter->hwtstamp_config = config; + + /* clear TX/RX time stamp registers, just to be sure */ + regval = rd32(E1000_TXSTMPH); + regval = rd32(E1000_RXSTMPH); - return -ERANGE; + return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ? + -EFAULT : 0; } /** -- 1.6.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo 2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly @ 2009-02-16 7:16 ` David Miller 2009-02-16 7:44 ` Patrick Ohly 1 sibling, 1 reply; 25+ messages in thread From: David Miller @ 2009-02-16 7:16 UTC (permalink / raw) To: patrick.ohly; +Cc: johnstul, john.ronciak, linux-kernel, netdev From: Patrick Ohly <patrick.ohly@intel.com> Date: Thu, 12 Feb 2009 16:03:01 +0100 > This revision of the patch series transports hardware time stamps from > the hardware into user space via additional fields in struct > skb_shared_info. It's based on net-next-2.6 as of this morning. > > This is the solution that emerged from the discussion of the other > approaches suggested before (reuse tstamp, extend skbuff, optional > structs): it has the advantage of not touching skbuff and also has the > simplest implementation. > > The clocksource and timecompare patches have been reviewed by John > Stultz. He doesn't mind merging them via the net subtree. > > John Ronciak reviewed the igb driver patches. He suggested to merge the > patches as they; after all, PTPd already works fine. I just tested again > on 32 and 64 bit x86, both with the timestamping example programs as > well as with PTPd. Latest patched PTPd is here: > http://github.com/pohly/ptpd/tree/master > > The open TODOs in the igb driver will be fixed. We think that this will > be easier with the infrastructure and the driver in a regular kernel > tree. Ok, I applied everything and pushed it out to net-next-2.6, let's see how this goes :-) That TX clone wrt. skb_orphan() issue will need a happier solution. Can you describe that problem in detail? Maybe someone can come up with a way to avoid that stuff. Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo 2009-02-16 7:16 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo David Miller @ 2009-02-16 7:44 ` Patrick Ohly 2009-02-21 9:15 ` Patrick Ohly 0 siblings, 1 reply; 25+ messages in thread From: Patrick Ohly @ 2009-02-16 7:44 UTC (permalink / raw) To: David Miller Cc: johnstul@us.ibm.com, Ronciak, John, linux-kernel@vger.kernel.org, netdev@vger.kernel.org On Mon, 2009-02-16 at 09:16 +0200, David Miller wrote: > Ok, I applied everything and pushed it out to net-next-2.6, let's > see how this goes :-) Thanks :-) > That TX clone wrt. skb_orphan() issue will need a happier solution. > > Can you describe that problem in detail? Maybe someone can come > up with a way to avoid that stuff. Bouncing information about a sent packet back to the sender (in skb_tstamp_tx()) requires access to the socket via which the packet was sent (orig_skb->sk). This information must be available after the packet was handed to ops->ndo_start_xmit() in order to implement the TX time stamping software fallback for devices which don't implement hardware time stamping (not enabled and/or not implemented at all). The problem now is that some device drivers call skb_orphan() on the skb in ops->ndo_start_xmit(). I suppose the intention is to free some resources a bit earlier. The unmodified igb driver did this, the bnx2 one didn't. If skb_orphan() is called, then TX software time stamping is not possible. A user space daemon cannot detect this reliably, only make a guess based on the observation that it doesn't receive any TX time stamps. This is not fatal: the traditional time stamping method used by PTPd (multicast loop back) still works. But it would be nicer if that multicast loop back hack wasn't necessary. TX software time stamping might also be more accurate (it generates the time stamp after ops->ndo_start_xmit() returns, not before as in the looped packet code path). I see several ways to solve this: * Always create another reference to skb->sk before ops->ndo_start_xmit() if TX time stamping is requested. Drawback: performance penalty for drivers which support TX time stamping or don't call skb_orphan(). * Turn skb_orphan() into a nop during ops->ndo_start_xmit(), again only if necessary. I think this can done by temporarily overriding the skb destructor. Drawback: looks like a hack to me, not sure whether that is really safe. * Extend the driver API + another reference. Let drivers which implement TX time stamping (and thus know when to avoid skb_orphan()) signal that. dev_hard_start_xmit() then can avoid the "additional sk reference" thing for these drivers. Drawback: requires analyzing and potentially flagging all drivers to have a real effect (as for bnx2). -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo 2009-02-16 7:44 ` Patrick Ohly @ 2009-02-21 9:15 ` Patrick Ohly 0 siblings, 0 replies; 25+ messages in thread From: Patrick Ohly @ 2009-02-21 9:15 UTC (permalink / raw) To: David Miller Cc: johnstul@us.ibm.com, Ronciak, John, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, Kirsher, Jeffrey T, Brandeburg, Jesse, Tantilov, Emil S On Mon, 2009-02-16 at 07:44 +0000, Patrick Ohly wrote: > On Mon, 2009-02-16 at 09:16 +0200, David Miller wrote: > > That TX clone wrt. skb_orphan() issue will need a happier solution. > > > > Can you describe that problem in detail? Maybe someone can come > > up with a way to avoid that stuff. > > Bouncing information about a sent packet back to the sender (in > skb_tstamp_tx()) requires access to the socket via which the packet was > sent (orig_skb->sk). > > This information must be available after the packet was handed to > ops->ndo_start_xmit() in order to implement the TX time stamping > software fallback for devices which don't implement hardware time > stamping (not enabled and/or not implemented at all). There's an even more fundamental problem: the TX software fallback in net_dev_hard_start_xmit() has to access the skb after ndo_start_xmit() has returned successfully. During stress tests LAD colleagues at Intel ran into kernel panics that pointed to net_dev_hard_start_xmit() as the culprit. Removing the current TX software fallback code solved this. My apologies for not realizing earlier that this code is violating skb ownership rules. We propose to remove the faulty code and then solve it in a proper way later on. Hardware time stamping will work fine without it. I will forward the patch separately. > I see several ways to solve this: > * Always create another reference to skb->sk before > ops->ndo_start_xmit() if TX time stamping is requested. > Drawback: performance penalty for drivers which support TX time > stamping or don't call skb_orphan(). [...] > * Extend the driver API + another reference. Let drivers which > implement TX time stamping (and thus know when to avoid > skb_orphan()) signal that. dev_hard_start_xmit() then can avoid > the "additional sk reference" thing for these drivers. Drawback: > requires analyzing and potentially flagging all drivers to have > a real effect (as for bnx2). Something along these lines with an additional reference to the skb might work. -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2009-02-21 9:15 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-12 14:57 [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly 2009-02-12 15:00 ` [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly 2009-02-12 15:03 ` [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping Patrick Ohly 2009-02-16 7:16 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo David Miller 2009-02-16 7:44 ` Patrick Ohly 2009-02-21 9:15 ` Patrick Ohly
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).