* [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 13:01 clock synchronization utility code Patrick Ohly
@ 2009-02-04 13:01 ` Patrick Ohly
2009-02-04 14:03 ` Daniel Walker
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-04 13:01 UTC (permalink / raw)
To: linux-kernel
Cc: netdev, David Miller, John Stultz, Thomas Gleixner, Patrick Ohly
So far struct clocksource acted as the interface between time/timekeeping.c
and hardware. This patch generalizes the concept so that a similar
interface can also be used in other contexts. For that it introduces
new structures and related functions *without* touching the existing
struct clocksource.
The reasons for adding these new structures to clocksource.[ch] are
* the APIs are clearly related
* struct clocksource could be cleaned up to use the new structs
* avoids proliferation of files with similar names (timesource.h?
timecounter.h?)
As outlined in the discussion with John Stultz, this patch adds
* struct cyclecounter: stateless API to hardware which counts clock cycles
* struct timecounter: stateful utility code built on a cyclecounter which
provides a nanosecond counter
* only the function to read the nanosecond counter; deltas are used internally
and not exposed to users of timecounter
The code does no locking of the shared state. It must be called at least
as often as the cycle counter wraps around to detect these wrap arounds.
Both is the responsibility of the timecounter user.
Acked-by: John Stultz <johnstul@us.ibm.com>
---
include/linux/clocksource.h | 101 +++++++++++++++++++++++++++++++++++++++++++
kernel/time/clocksource.c | 76 ++++++++++++++++++++++++++++++++
2 files changed, 177 insertions(+), 0 deletions(-)
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index f88d32f..0729ce2 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -22,8 +22,109 @@ typedef u64 cycle_t;
struct clocksource;
/**
+ * struct cyclecounter - hardware abstraction for a free running counter
+ * Provides completely state-free accessors to the underlying hardware.
+ * Depending on which hardware it reads, the cycle counter may wrap
+ * around quickly. Locking rules (if necessary) have to be defined
+ * by the implementor and user of specific instances of this API.
+ *
+ * @read: returns the current cycle value
+ * @mask: bitmask for two's complement
+ * subtraction of non 64 bit counters,
+ * see CLOCKSOURCE_MASK() helper macro
+ * @mult: cycle to nanosecond multiplier
+ * @shift: cycle to nanosecond divisor (power of two)
+ */
+struct cyclecounter {
+ cycle_t (*read)(const struct cyclecounter *cc);
+ cycle_t mask;
+ u32 mult;
+ u32 shift;
+};
+
+/**
+ * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds
+ * Contains the state needed by timecounter_read() to detect
+ * cycle counter wrap around. Initialize with
+ * timecounter_init(). Also used to convert cycle counts into the
+ * corresponding nanosecond counts with timecounter_cyc2time(). Users
+ * of this code are responsible for initializing the underlying
+ * cycle counter hardware, locking issues and reading the time
+ * more often than the cycle counter wraps around. The nanosecond
+ * counter will only wrap around after ~585 years.
+ *
+ * @cc: the cycle counter used by this instance
+ * @cycle_last: most recent cycle counter value seen by
+ * timecounter_read()
+ * @nsec: continuously increasing count
+ */
+struct timecounter {
+ const struct cyclecounter *cc;
+ cycle_t cycle_last;
+ u64 nsec;
+};
+
+/**
+ * cyclecounter_cyc2ns - converts cycle counter cycles to nanoseconds
+ * @tc: Pointer to cycle counter.
+ * @cycles: Cycles
+ *
+ * XXX - This could use some mult_lxl_ll() asm optimization. Same code
+ * as in cyc2ns, but with unsigned result.
+ */
+static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc,
+ cycle_t cycles)
+{
+ u64 ret = (u64)cycles;
+ ret = (ret * cc->mult) >> cc->shift;
+ return ret;
+}
+
+/**
+ * timecounter_init - initialize a time counter
+ * @tc: Pointer to time counter which is to be initialized/reset
+ * @cc: A cycle counter, ready to be used.
+ * @start_tstamp: Arbitrary initial time stamp.
+ *
+ * After this call the current cycle register (roughly) corresponds to
+ * the initial time stamp. Every call to timecounter_read() increments
+ * the time stamp counter by the number of elapsed nanoseconds.
+ */
+extern void timecounter_init(struct timecounter *tc,
+ const struct cyclecounter *cc,
+ u64 start_tstamp);
+
+/**
+ * timecounter_read - return nanoseconds elapsed since timecounter_init()
+ * plus the initial time stamp
+ * @tc: Pointer to time counter.
+ *
+ * In other words, keeps track of time since the same epoch as
+ * the function which generated the initial time stamp.
+ */
+extern u64 timecounter_read(struct timecounter *tc);
+
+/**
+ * timecounter_cyc2time - convert a cycle counter to same
+ * time base as values returned by
+ * timecounter_read()
+ * @tc: Pointer to time counter.
+ * @cycle: a value returned by tc->cc->read()
+ *
+ * Cycle counts that are converted correctly as long as they
+ * fall into the interval [-1/2 max cycle count, +1/2 max cycle count],
+ * with "max cycle count" == cs->mask+1.
+ *
+ * This allows conversion of cycle counter values which were generated
+ * in the past.
+ */
+extern u64 timecounter_cyc2time(struct timecounter *tc,
+ cycle_t cycle_tstamp);
+
+/**
* struct clocksource - hardware abstraction for a free running counter
* Provides mostly state-free accessors to the underlying hardware.
+ * This is the structure used for system time.
*
* @name: ptr to clocksource name
* @list: list head for registration
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ca89e15..6d4a267 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -31,6 +31,82 @@
#include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
#include <linux/tick.h>
+void timecounter_init(struct timecounter *tc,
+ const struct cyclecounter *cc,
+ u64 start_tstamp)
+{
+ tc->cc = cc;
+ tc->cycle_last = cc->read(cc);
+ tc->nsec = start_tstamp;
+}
+EXPORT_SYMBOL(timecounter_init);
+
+/**
+ * clocksource_read_ns - get nanoseconds since last call of this function
+ * @tc: Pointer to time counter
+ *
+ * When the underlying cycle counter runs over, this will be handled
+ * correctly as long as it does not run over more than once between
+ * calls.
+ *
+ * The first call to this function for a new time counter initializes
+ * the time tracking and returns bogus results.
+ */
+static u64 timecounter_read_delta(struct timecounter *tc)
+{
+ cycle_t cycle_now, cycle_delta;
+ u64 ns_offset;
+
+ /* read cycle counter: */
+ cycle_now = tc->cc->read(tc->cc);
+
+ /* calculate the delta since the last timecounter_read_delta(): */
+ cycle_delta = (cycle_now - tc->cycle_last) & tc->cc->mask;
+
+ /* convert to nanoseconds: */
+ ns_offset = cyclecounter_cyc2ns(tc->cc, cycle_delta);
+
+ /* update time stamp of timecounter_read_delta() call: */
+ tc->cycle_last = cycle_now;
+
+ return ns_offset;
+}
+
+u64 timecounter_read(struct timecounter *tc)
+{
+ u64 nsec;
+
+ /* increment time by nanoseconds since last call */
+ nsec = timecounter_read_delta(tc);
+ nsec += tc->nsec;
+ tc->nsec = nsec;
+
+ return nsec;
+}
+EXPORT_SYMBOL(timecounter_read);
+
+u64 timecounter_cyc2time(struct timecounter *tc,
+ cycle_t cycle_tstamp)
+{
+ u64 cycle_delta = (cycle_tstamp - tc->cycle_last) & tc->cc->mask;
+ u64 nsec;
+
+ /*
+ * Instead of always treating cycle_tstamp as more recent
+ * than tc->cycle_last, detect when it is too far in the
+ * future and treat it as old time stamp instead.
+ */
+ if (cycle_delta > tc->cc->mask / 2) {
+ cycle_delta = (tc->cycle_last - cycle_tstamp) & tc->cc->mask;
+ nsec = tc->nsec - cyclecounter_cyc2ns(tc->cc, cycle_delta);
+ } else {
+ nsec = cyclecounter_cyc2ns(tc->cc, cycle_delta) + tc->nsec;
+ }
+
+ return nsec;
+}
+EXPORT_SYMBOL(timecounter_cyc2time);
+
/* XXX - Would like a better way for initializing curr_clocksource */
extern struct clocksource clocksource_jiffies;
--
1.6.0.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 13:01 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
@ 2009-02-04 14:03 ` Daniel Walker
2009-02-04 14:46 ` Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Walker @ 2009-02-04 14:03 UTC (permalink / raw)
To: Patrick Ohly
Cc: linux-kernel, netdev, David Miller, John Stultz, Thomas Gleixner
On Wed, 2009-02-04 at 14:01 +0100, Patrick Ohly wrote:
> /**
> + * struct cyclecounter - hardware abstraction for a free running counter
> + * Provides completely state-free accessors to the underlying hardware.
> + * Depending on which hardware it reads, the cycle counter may wrap
> + * around quickly. Locking rules (if necessary) have to be defined
> + * by the implementor and user of specific instances of this API.
> + *
> + * @read: returns the current cycle value
> + * @mask: bitmask for two's complement
> + * subtraction of non 64 bit counters,
> + * see CLOCKSOURCE_MASK() helper macro
> + * @mult: cycle to nanosecond multiplier
> + * @shift: cycle to nanosecond divisor (power of two)
> + */
> +struct cyclecounter {
> + cycle_t (*read)(const struct cyclecounter *cc);
> + cycle_t mask;
> + u32 mult;
> + u32 shift;
> +};
Where are these defined? I don't see any in created in your code.
> +/**
> + * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds
> + * Contains the state needed by timecounter_read() to detect
> + * cycle counter wrap around. Initialize with
> + * timecounter_init(). Also used to convert cycle counts into the
> + * corresponding nanosecond counts with timecounter_cyc2time(). Users
> + * of this code are responsible for initializing the underlying
> + * cycle counter hardware, locking issues and reading the time
> + * more often than the cycle counter wraps around. The nanosecond
> + * counter will only wrap around after ~585 years.
> + *
> + * @cc: the cycle counter used by this instance
> + * @cycle_last: most recent cycle counter value seen by
> + * timecounter_read()
> + * @nsec: continuously increasing count
> + */
> +struct timecounter {
> + const struct cyclecounter *cc;
> + cycle_t cycle_last;
> + u64 nsec;
> +};
If your mixing generic and non-generic code, it seems a little
presumptuous to assume the code would get called more often than the
counter wraps. If this cyclecounter is what I think it is (a
clocksource) they wrap at varied times.
> +/**
> + * cyclecounter_cyc2ns - converts cycle counter cycles to nanoseconds
> + * @tc: Pointer to cycle counter.
> + * @cycles: Cycles
> + *
> + * XXX - This could use some mult_lxl_ll() asm optimization. Same code
> + * as in cyc2ns, but with unsigned result.
> + */
> +static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc,
> + cycle_t cycles)
> +{
> + u64 ret = (u64)cycles;
> + ret = (ret * cc->mult) >> cc->shift;
> + return ret;
> +}
This is just outright duplication.. Why wouldn't you use the function
that already exists for this?
> +/**
> + * clocksource_read_ns - get nanoseconds since last call of this function
> + * @tc: Pointer to time counter
> + *
> + * When the underlying cycle counter runs over, this will be handled
> + * correctly as long as it does not run over more than once between
> + * calls.
> + *
> + * The first call to this function for a new time counter initializes
> + * the time tracking and returns bogus results.
> + */
"bogus results" doesn't sound very pretty..
Daniel
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 14:03 ` Daniel Walker
@ 2009-02-04 14:46 ` Patrick Ohly
2009-02-04 15:09 ` Daniel Walker
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-04 14:46 UTC (permalink / raw)
To: Daniel Walker
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
David Miller, John Stultz, Thomas Gleixner
Hi Daniel!
Thanks for looking at this. I think the previous discussion of this
patch under the same subject on LKML already addressed your concerns,
but I'll also reply in detail below.
On Wed, 2009-02-04 at 14:03 +0000, Daniel Walker wrote:
> On Wed, 2009-02-04 at 14:01 +0100, Patrick Ohly wrote:
>
> > /**
> > + * struct cyclecounter - hardware abstraction for a free running counter
> > + * Provides completely state-free accessors to the underlying hardware.
> > + * Depending on which hardware it reads, the cycle counter may wrap
> > + * around quickly. Locking rules (if necessary) have to be defined
> > + * by the implementor and user of specific instances of this API.
> > + *
> > + * @read: returns the current cycle value
> > + * @mask: bitmask for two's complement
> > + * subtraction of non 64 bit counters,
> > + * see CLOCKSOURCE_MASK() helper macro
> > + * @mult: cycle to nanosecond multiplier
> > + * @shift: cycle to nanosecond divisor (power of two)
> > + */
> > +struct cyclecounter {
> > + cycle_t (*read)(const struct cyclecounter *cc);
> > + cycle_t mask;
> > + u32 mult;
> > + u32 shift;
> > +};
>
> Where are these defined? I don't see any in created in your code.
What do you mean with "these"? cycle_t? That type is defined in
clocksource.h. This patch intentionally adds these definitions to the
existing file because of the large conceptual overlap.
In an earlier revision of the patch I had adapted clocksource itself so
that it could be used outside of the time keeping code; John wanted me
to use these smaller structs instead that you now find in the current
patch.
Eventually John wants to refactor clocksource so that it uses them and
just adds additional elements in clocksource. Right now clocksource is a
mixture of different concepts. Breaking out cyclecounter and timecounter
is a first step towards that cleanup.
> > +/**
> > + * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds
> > + * Contains the state needed by timecounter_read() to detect
> > + * cycle counter wrap around. Initialize with
> > + * timecounter_init(). Also used to convert cycle counts into the
> > + * corresponding nanosecond counts with timecounter_cyc2time(). Users
> > + * of this code are responsible for initializing the underlying
> > + * cycle counter hardware, locking issues and reading the time
> > + * more often than the cycle counter wraps around. The nanosecond
> > + * counter will only wrap around after ~585 years.
> > + *
> > + * @cc: the cycle counter used by this instance
> > + * @cycle_last: most recent cycle counter value seen by
> > + * timecounter_read()
> > + * @nsec: continuously increasing count
> > + */
> > +struct timecounter {
> > + const struct cyclecounter *cc;
> > + cycle_t cycle_last;
> > + u64 nsec;
> > +};
>
> If your mixing generic and non-generic code, it seems a little
> presumptuous to assume the code would get called more often than the
> counter wraps. If this cyclecounter is what I think it is (a
> clocksource) they wrap at varied times.
timecounter and cyclecounter will have the same owner, so that owner
knows how often the cycle counter will run over and can use it
accordingly.
The cyclecounter is a pointer and not just a an instance of cyclecounter
so that the owner can pick an instance of a timecounter which has
additional private data attached to it.
The initial user of the new code will have a 64 bit hardware register as
cycle counter. We decided to leave the "counter runs over too often"
problem unsolved until we actually have hardware which exhibits the
problem.
> > +/**
> > + * cyclecounter_cyc2ns - converts cycle counter cycles to nanoseconds
> > + * @tc: Pointer to cycle counter.
> > + * @cycles: Cycles
> > + *
> > + * XXX - This could use some mult_lxl_ll() asm optimization. Same code
> > + * as in cyc2ns, but with unsigned result.
> > + */
> > +static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc,
> > + cycle_t cycles)
> > +{
> > + u64 ret = (u64)cycles;
> > + ret = (ret * cc->mult) >> cc->shift;
> > + return ret;
> > +}
>
> This is just outright duplication.. Why wouldn't you use the function
> that already exists for this?
Because it only works with a clocksource. Adding a second function also
allows using a saner return type (unsigned instead of signed). Removing
the code duplication can be done as part of the code refactoring. But
that really is beyond the scope of the patch series and shouldn't be
done in the network tree.
> > +/**
> > + * clocksource_read_ns - get nanoseconds since last call of this function
> > + * @tc: Pointer to time counter
> > + *
> > + * When the underlying cycle counter runs over, this will be handled
> > + * correctly as long as it does not run over more than once between
> > + * calls.
> > + *
> > + * The first call to this function for a new time counter initializes
> > + * the time tracking and returns bogus results.
> > + */
>
> "bogus results" doesn't sound very pretty..
I can call it "undefined" if that sounds better ;-)
When you quoted the comment I noticed that the function name was still
the old one - fixed.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 14:46 ` Patrick Ohly
@ 2009-02-04 15:09 ` Daniel Walker
2009-02-04 15:24 ` Patrick Ohly
2009-02-04 19:25 ` john stultz
0 siblings, 2 replies; 37+ messages in thread
From: Daniel Walker @ 2009-02-04 15:09 UTC (permalink / raw)
To: Patrick Ohly
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
David Miller, John Stultz, Thomas Gleixner
On Wed, 2009-02-04 at 15:46 +0100, Patrick Ohly wrote:
> Hi Daniel!
>
> Thanks for looking at this. I think the previous discussion of this
> patch under the same subject on LKML already addressed your concerns,
> but I'll also reply in detail below.
>
> On Wed, 2009-02-04 at 14:03 +0000, Daniel Walker wrote:
> > On Wed, 2009-02-04 at 14:01 +0100, Patrick Ohly wrote:
> >
> > > /**
> > > + * struct cyclecounter - hardware abstraction for a free running counter
> > > + * Provides completely state-free accessors to the underlying hardware.
> > > + * Depending on which hardware it reads, the cycle counter may wrap
> > > + * around quickly. Locking rules (if necessary) have to be defined
> > > + * by the implementor and user of specific instances of this API.
> > > + *
> > > + * @read: returns the current cycle value
> > > + * @mask: bitmask for two's complement
> > > + * subtraction of non 64 bit counters,
> > > + * see CLOCKSOURCE_MASK() helper macro
> > > + * @mult: cycle to nanosecond multiplier
> > > + * @shift: cycle to nanosecond divisor (power of two)
> > > + */
> > > +struct cyclecounter {
> > > + cycle_t (*read)(const struct cyclecounter *cc);
> > > + cycle_t mask;
> > > + u32 mult;
> > > + u32 shift;
> > > +};
> >
> > Where are these defined? I don't see any in created in your code.
>
> What do you mean with "these"? cycle_t? That type is defined in
> clocksource.h. This patch intentionally adds these definitions to the
> existing file because of the large conceptual overlap.
No, your creating a new structure here that wasn't declared. I was
referring to "struct cyclecounter". I did look up one of your prior
submission (Dec. 15) and reviewed that.
> In an earlier revision of the patch I had adapted clocksource itself so
> that it could be used outside of the time keeping code; John wanted me
> to use these smaller structs instead that you now find in the current
> patch.
Well, I think your original idea was better.. I don't think we need the
duplication of underlying clocksource mechanics.
> Eventually John wants to refactor clocksource so that it uses them and
> just adds additional elements in clocksource. Right now clocksource is a
> mixture of different concepts. Breaking out cyclecounter and timecounter
> is a first step towards that cleanup.
The problem I see is that your putting off the cleanup of struct
clocksource with duplication.. It should go in reverse , you should use
clocksources for your patch set. Which will motivate John to clean up
the clocksource structure.
There's a whole other issue which is that we have many architectures
already declaring struct clocksource for it's most basic usage. So I
think we want clocksource to be the base "cycle counter" structure (in
name and usage). Almost everything else in struct clocksource is
specific to generic timekeeping and could be factored out easily.
Daniel
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 15:09 ` Daniel Walker
@ 2009-02-04 15:24 ` Patrick Ohly
2009-02-04 19:25 ` john stultz
1 sibling, 0 replies; 37+ messages in thread
From: Patrick Ohly @ 2009-02-04 15:24 UTC (permalink / raw)
To: Daniel Walker
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
David Miller, John Stultz, Thomas Gleixner
On Wed, 2009-02-04 at 15:09 +0000, Daniel Walker wrote:
> On Wed, 2009-02-04 at 15:46 +0100, Patrick Ohly wrote:
> > On Wed, 2009-02-04 at 14:03 +0000, Daniel Walker wrote:
> > > On Wed, 2009-02-04 at 14:01 +0100, Patrick Ohly wrote:
> > >
> > > > /**
> > > > + * struct cyclecounter - hardware abstraction for a free running counter
> > > > + * Provides completely state-free accessors to the underlying hardware.
> > > > + * Depending on which hardware it reads, the cycle counter may wrap
> > > > + * around quickly. Locking rules (if necessary) have to be defined
> > > > + * by the implementor and user of specific instances of this API.
> > > > + *
> > > > + * @read: returns the current cycle value
> > > > + * @mask: bitmask for two's complement
> > > > + * subtraction of non 64 bit counters,
> > > > + * see CLOCKSOURCE_MASK() helper macro
> > > > + * @mult: cycle to nanosecond multiplier
> > > > + * @shift: cycle to nanosecond divisor (power of two)
> > > > + */
> > > > +struct cyclecounter {
> > > > + cycle_t (*read)(const struct cyclecounter *cc);
> > > > + cycle_t mask;
> > > > + u32 mult;
> > > > + u32 shift;
> > > > +};
> > >
> > > Where are these defined? I don't see any in created in your code.
> >
> > What do you mean with "these"? cycle_t? That type is defined in
> > clocksource.h. This patch intentionally adds these definitions to the
> > existing file because of the large conceptual overlap.
>
> No, your creating a new structure here that wasn't declared. I was
> referring to "struct cyclecounter".
Sorry, I still don't see the problem. I'm declaring and defining struct
cyclecounter here. Why should it also be declared (= "struct
cyclecounter;") elsewhere?
> I did look up one of your prior
> submission (Dec. 15) and reviewed that.
Right, I saw that a bit later.
> > In an earlier revision of the patch I had adapted clocksource itself so
> > that it could be used outside of the time keeping code; John wanted me
> > to use these smaller structs instead that you now find in the current
> > patch.
>
> Well, I think your original idea was better.. I don't think we need the
> duplication of underlying clocksource mechanics.
Please discuss this with John. I'd prefer to not get caught in the
cross-fire of a discussion that I don't know enough about :-/
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 15:09 ` Daniel Walker
2009-02-04 15:24 ` Patrick Ohly
@ 2009-02-04 19:25 ` john stultz
2009-02-04 19:40 ` Daniel Walker
1 sibling, 1 reply; 37+ messages in thread
From: john stultz @ 2009-02-04 19:25 UTC (permalink / raw)
To: Daniel Walker
Cc: Patrick Ohly, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, David Miller, Thomas Gleixner
On Wed, 2009-02-04 at 07:09 -0800, Daniel Walker wrote:
> On Wed, 2009-02-04 at 15:46 +0100, Patrick Ohly wrote:
> > In an earlier revision of the patch I had adapted clocksource itself so
> > that it could be used outside of the time keeping code; John wanted me
> > to use these smaller structs instead that you now find in the current
> > patch.
>
> Well, I think your original idea was better.. I don't think we need the
> duplication of underlying clocksource mechanics.
>
> > Eventually John wants to refactor clocksource so that it uses them and
> > just adds additional elements in clocksource. Right now clocksource is a
> > mixture of different concepts. Breaking out cyclecounter and timecounter
> > is a first step towards that cleanup.
>
> The problem I see is that your putting off the cleanup of struct
> clocksource with duplication.. It should go in reverse , you should use
> clocksources for your patch set. Which will motivate John to clean up
> the clocksource structure.
I strongly disagree. Misusing a established structure for unintended use
is just bad. What Patrick wants to use the counters for has very
different semantics then how clocksources are used.
I think having a bit of redundancy in two structures is good motivation
for me to clean up the clocksources to use cyclecounters.
thanks
-john
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 19:25 ` john stultz
@ 2009-02-04 19:40 ` Daniel Walker
2009-02-04 20:06 ` john stultz
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Walker @ 2009-02-04 19:40 UTC (permalink / raw)
To: john stultz
Cc: Patrick Ohly, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, David Miller, Thomas Gleixner
On Wed, 2009-02-04 at 11:25 -0800, john stultz wrote:
> On Wed, 2009-02-04 at 07:09 -0800, Daniel Walker wrote:
> > On Wed, 2009-02-04 at 15:46 +0100, Patrick Ohly wrote:
> > > In an earlier revision of the patch I had adapted clocksource itself so
> > > that it could be used outside of the time keeping code; John wanted me
> > > to use these smaller structs instead that you now find in the current
> > > patch.
> >
> > Well, I think your original idea was better.. I don't think we need the
> > duplication of underlying clocksource mechanics.
> >
> > > Eventually John wants to refactor clocksource so that it uses them and
> > > just adds additional elements in clocksource. Right now clocksource is a
> > > mixture of different concepts. Breaking out cyclecounter and timecounter
> > > is a first step towards that cleanup.
> >
> > The problem I see is that your putting off the cleanup of struct
> > clocksource with duplication.. It should go in reverse , you should use
> > clocksources for your patch set. Which will motivate John to clean up
> > the clocksource structure.
>
> I strongly disagree. Misusing a established structure for unintended use
> is just bad. What Patrick wants to use the counters for has very
> different semantics then how clocksources are used.
I don't think it's at all bad to reuse an established system like that,
especially when the purposed one is largely duplication of the other..
The issue is more that struct clocksource has been loaded up with too
many timekeeping specific ornaments .. I would think a good clean up
would be just to remove those from the structure.
> I think having a bit of redundancy in two structures is good motivation
> for me to clean up the clocksources to use cyclecounters.
I'm not sure what kind of clean up you have in mind. Given that
clocksources are used across many architectures, do you really want to
do a mass name change for all of them? Not to mention most people see
clocksources as cycle counters , so you would just add confusion in
addition to code flux.
If your going to end up merging this new cycle counter structure with
clocksources anyway, why wouldn't we do it now instead of doing
temporary duplication ..
Daniel
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 19:40 ` Daniel Walker
@ 2009-02-04 20:06 ` john stultz
2009-02-04 21:04 ` Daniel Walker
0 siblings, 1 reply; 37+ messages in thread
From: john stultz @ 2009-02-04 20:06 UTC (permalink / raw)
To: Daniel Walker
Cc: Patrick Ohly, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, David Miller, Thomas Gleixner
On Wed, 2009-02-04 at 11:40 -0800, Daniel Walker wrote:
> On Wed, 2009-02-04 at 11:25 -0800, john stultz wrote:
> > On Wed, 2009-02-04 at 07:09 -0800, Daniel Walker wrote:
> > > On Wed, 2009-02-04 at 15:46 +0100, Patrick Ohly wrote:
> > > > In an earlier revision of the patch I had adapted clocksource itself so
> > > > that it could be used outside of the time keeping code; John wanted me
> > > > to use these smaller structs instead that you now find in the current
> > > > patch.
> > >
> > > Well, I think your original idea was better.. I don't think we need the
> > > duplication of underlying clocksource mechanics.
> > >
> > > > Eventually John wants to refactor clocksource so that it uses them and
> > > > just adds additional elements in clocksource. Right now clocksource is a
> > > > mixture of different concepts. Breaking out cyclecounter and timecounter
> > > > is a first step towards that cleanup.
> > >
> > > The problem I see is that your putting off the cleanup of struct
> > > clocksource with duplication.. It should go in reverse , you should use
> > > clocksources for your patch set. Which will motivate John to clean up
> > > the clocksource structure.
> >
> > I strongly disagree. Misusing a established structure for unintended use
> > is just bad. What Patrick wants to use the counters for has very
> > different semantics then how clocksources are used.
>
> I don't think it's at all bad to reuse an established system like that,
> especially when the purposed one is largely duplication of the other..
The duplication is only at a very low level. He could not reuse the
established clocksource system without really breaking its semantics.
> The issue is more that struct clocksource has been loaded up with too
> many timekeeping specific ornaments .. I would think a good clean up
> would be just to remove those from the structure.
I do agree. And I gave it a quick attempt and its not quite trivial.
There is some timekeeping ornaments that are required in the clocksource
(the two notions of mult, the flags values, etc). Trying to best split
this out I think will take some care.
> > I think having a bit of redundancy in two structures is good motivation
> > for me to clean up the clocksources to use cyclecounters.
>
> I'm not sure what kind of clean up you have in mind. Given that
> clocksources are used across many architectures, do you really want to
> do a mass name change for all of them? Not to mention most people see
> clocksources as cycle counters , so you would just add confusion in
> addition to code flux.
No matter what we do, we will have to touch the arch code in some way.
Right now I suspect the arch code will still define clocksources, but we
can refactor clocksources to use cyclecounters internally.
> If your going to end up merging this new cycle counter structure with
> clocksources anyway, why wouldn't we do it now instead of doing
> temporary duplication ..
Patrick could have created his own NICcounter infrastructure somewhere
far away from the generic time code and no one would gripe (much like
sched_clock in many cases). I think he's doing the right thing by
establishing a base structure that we can reuse in the
clocksource/timekeeping code.
Doing that conversion well so the code is more readable as well as
removing structure duplication will take some time and will touch a lot
of historically delicate time code, so it will need additional time to
sit in testing trees before we push it upward.
I see no reason to block his code in the meantime.
thanks
-john
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 20:06 ` john stultz
@ 2009-02-04 21:04 ` Daniel Walker
2009-02-04 21:15 ` john stultz
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Walker @ 2009-02-04 21:04 UTC (permalink / raw)
To: john stultz
Cc: Patrick Ohly, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, David Miller, Thomas Gleixner
On Wed, 2009-02-04 at 12:06 -0800, john stultz wrote:
> The duplication is only at a very low level. He could not reuse the
> established clocksource system without really breaking its semantics.
He gave a link to the first version,
http://kerneltrap.org/mailarchive/linux-netdev/2008/11/19/4164204
What specific semantics is he breaking there?
Daniel
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 21:04 ` Daniel Walker
@ 2009-02-04 21:15 ` john stultz
2009-02-05 0:18 ` Daniel Walker
0 siblings, 1 reply; 37+ messages in thread
From: john stultz @ 2009-02-04 21:15 UTC (permalink / raw)
To: Daniel Walker
Cc: Patrick Ohly, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, David Miller, Thomas Gleixner
On Wed, 2009-02-04 at 13:04 -0800, Daniel Walker wrote:
> On Wed, 2009-02-04 at 12:06 -0800, john stultz wrote:
>
> > The duplication is only at a very low level. He could not reuse the
> > established clocksource system without really breaking its semantics.
>
> He gave a link to the first version,
>
> http://kerneltrap.org/mailarchive/linux-netdev/2008/11/19/4164204
>
> What specific semantics is he breaking there?
His re-usage of cycle_last and xtime_nsec for other means then how
they're defined.
In that case his use of xtime_nsec doesn't even store the same unit.
Plus he adds other accessors to the clocksource structure that are not
compatible with the clocksources registered for timekeeping.
He's really doing something different here, and while it does access a
counter, and it does translate that into nanoseconds, its not the same
as whats done in the timekeeping core which the clocksource was designed
around.
So by creating his own infrastructure in a shared manner, splitting out
a chunk of it to be reused in the clocksource/timekeeping core I think
is really a good thing and the right approach.
thanks
-john
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-04 21:15 ` john stultz
@ 2009-02-05 0:18 ` Daniel Walker
2009-02-05 10:21 ` Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Walker @ 2009-02-05 0:18 UTC (permalink / raw)
To: john stultz
Cc: Patrick Ohly, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, David Miller, Thomas Gleixner
On Wed, 2009-02-04 at 13:15 -0800, john stultz wrote:
> On Wed, 2009-02-04 at 13:04 -0800, Daniel Walker wrote:
> > On Wed, 2009-02-04 at 12:06 -0800, john stultz wrote:
> >
> > > The duplication is only at a very low level. He could not reuse the
> > > established clocksource system without really breaking its semantics.
> >
> > He gave a link to the first version,
> >
> > http://kerneltrap.org/mailarchive/linux-netdev/2008/11/19/4164204
> >
> > What specific semantics is he breaking there?
>
> His re-usage of cycle_last and xtime_nsec for other means then how
> they're defined.
>
> In that case his use of xtime_nsec doesn't even store the same unit.
Those values to me are timekeeping ornaments .. It's not part of the
clocksource it's stuff you added from timekeeping. He could easily add
his own ornaments to the clocksource instead of re-use , then they could
be merged later .. It's not perfect, but it's at least a start.
> Plus he adds other accessors to the clocksource structure that are not
> compatible with the clocksources registered for timekeeping.
It's akin to vread, I think. I'd prefer to see the read() used instead,
but I can see why there could be a need for a structure getting passed.
> He's really doing something different here, and while it does access a
> counter, and it does translate that into nanoseconds, its not the same
> as whats done in the timekeeping core which the clocksource was designed
> around.
I think it's different from _timekeeping_. However the clocksource isn't
getting used differently. Like you said the lowlevel parts are the same,
ultimately that's all the clocksource should be.
It sounds like that's what you want anyway .. You can merge the lowlevel
parts with different sets of ornaments, that seems fairly acceptable to
me.
Daniel
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-05 0:18 ` Daniel Walker
@ 2009-02-05 10:21 ` Patrick Ohly
0 siblings, 0 replies; 37+ messages in thread
From: Patrick Ohly @ 2009-02-05 10:21 UTC (permalink / raw)
To: Daniel Walker
Cc: john stultz, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
David Miller, Thomas Gleixner
On Thu, 2009-02-05 at 02:18 +0200, Daniel Walker wrote:
> > Plus he adds other accessors to the clocksource structure that are not
> > compatible with the clocksources registered for timekeeping.
>
> It's akin to vread, I think. I'd prefer to see the read() used
> instead,
> but I can see why there could be a need for a structure getting
> passed.
There is indeed: the igb driver supports multiple different interfaces.
Each interface has its own NIC time, so the callback needs the
additional pointer to find out which NIC time it is expected to read.
Regarding the rest of the discussion: I understand that it is useful to
have it and perhaps get serious about refactoring clocksource, but I
also hope that this will not delay including the patches. They are
needed for HW time stamping now and cannot wait for the clocksource
refactoring.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo
@ 2009-02-12 14:57 Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly
0 siblings, 2 replies; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 14:57 UTC (permalink / raw)
To: David Miller
Cc: John Stultz, Ronciak, John, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
Hello!
This revision of the patch series transports hardware time stamps from
the hardware into user space via additional fields in struct
skb_shared_info. It's based on net-next-2.6 as of this morning.
This is the solution that emerged from the discussion of the other
approaches suggested before (reuse tstamp, extend skbuff, optional
structs): it has the advantage of not touching skbuff and also has the
simplest implementation.
The clocksource and timecompare patches have been reviewed by John
Stultz. He doesn't mind merging them via the net subtree.
John Ronciak reviewed the igb driver patches. He suggested to merge the
patches as they; after all, PTPd already works fine. I just tested again
on 32 and 64 bit x86, both with the timestamping example programs as
well as with PTPd. Latest patched PTPd is here:
http://github.com/pohly/ptpd/tree/master
The open TODOs in the igb driver will be fixed. We think that this will
be easier with the infrastructure and the driver in a regular kernel
tree.
David, I hope you consider the patches acceptable now. If not, then as
before: please let me know what I can do to make them acceptable.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-12 14:57 [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly
1 sibling, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
So far struct clocksource acted as the interface between time/timekeeping.c
and hardware. This patch generalizes the concept so that a similar
interface can also be used in other contexts. For that it introduces
new structures and related functions *without* touching the existing
struct clocksource.
The reasons for adding these new structures to clocksource.[ch] are
* the APIs are clearly related
* struct clocksource could be cleaned up to use the new structs
* avoids proliferation of files with similar names (timesource.h?
timecounter.h?)
As outlined in the discussion with John Stultz, this patch adds
* struct cyclecounter: stateless API to hardware which counts clock cycles
* struct timecounter: stateful utility code built on a cyclecounter which
provides a nanosecond counter
* only the function to read the nanosecond counter; deltas are used internally
and not exposed to users of timecounter
The code does no locking of the shared state. It must be called at least
as often as the cycle counter wraps around to detect these wrap arounds.
Both is the responsibility of the timecounter user.
Acked-by: John Stultz <johnstul@us.ibm.com>
---
include/linux/clocksource.h | 101 +++++++++++++++++++++++++++++++++++++++++++
kernel/time/clocksource.c | 76 ++++++++++++++++++++++++++++++++
2 files changed, 177 insertions(+), 0 deletions(-)
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index f88d32f..573819e 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -22,8 +22,109 @@ typedef u64 cycle_t;
struct clocksource;
/**
+ * struct cyclecounter - hardware abstraction for a free running counter
+ * Provides completely state-free accessors to the underlying hardware.
+ * Depending on which hardware it reads, the cycle counter may wrap
+ * around quickly. Locking rules (if necessary) have to be defined
+ * by the implementor and user of specific instances of this API.
+ *
+ * @read: returns the current cycle value
+ * @mask: bitmask for two's complement
+ * subtraction of non 64 bit counters,
+ * see CLOCKSOURCE_MASK() helper macro
+ * @mult: cycle to nanosecond multiplier
+ * @shift: cycle to nanosecond divisor (power of two)
+ */
+struct cyclecounter {
+ cycle_t (*read)(const struct cyclecounter *cc);
+ cycle_t mask;
+ u32 mult;
+ u32 shift;
+};
+
+/**
+ * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds
+ * Contains the state needed by timecounter_read() to detect
+ * cycle counter wrap around. Initialize with
+ * timecounter_init(). Also used to convert cycle counts into the
+ * corresponding nanosecond counts with timecounter_cyc2time(). Users
+ * of this code are responsible for initializing the underlying
+ * cycle counter hardware, locking issues and reading the time
+ * more often than the cycle counter wraps around. The nanosecond
+ * counter will only wrap around after ~585 years.
+ *
+ * @cc: the cycle counter used by this instance
+ * @cycle_last: most recent cycle counter value seen by
+ * timecounter_read()
+ * @nsec: continuously increasing count
+ */
+struct timecounter {
+ const struct cyclecounter *cc;
+ cycle_t cycle_last;
+ u64 nsec;
+};
+
+/**
+ * cyclecounter_cyc2ns - converts cycle counter cycles to nanoseconds
+ * @tc: Pointer to cycle counter.
+ * @cycles: Cycles
+ *
+ * XXX - This could use some mult_lxl_ll() asm optimization. Same code
+ * as in cyc2ns, but with unsigned result.
+ */
+static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc,
+ cycle_t cycles)
+{
+ u64 ret = (u64)cycles;
+ ret = (ret * cc->mult) >> cc->shift;
+ return ret;
+}
+
+/**
+ * timecounter_init - initialize a time counter
+ * @tc: Pointer to time counter which is to be initialized/reset
+ * @cc: A cycle counter, ready to be used.
+ * @start_tstamp: Arbitrary initial time stamp.
+ *
+ * After this call the current cycle register (roughly) corresponds to
+ * the initial time stamp. Every call to timecounter_read() increments
+ * the time stamp counter by the number of elapsed nanoseconds.
+ */
+extern void timecounter_init(struct timecounter *tc,
+ const struct cyclecounter *cc,
+ u64 start_tstamp);
+
+/**
+ * timecounter_read - return nanoseconds elapsed since timecounter_init()
+ * plus the initial time stamp
+ * @tc: Pointer to time counter.
+ *
+ * In other words, keeps track of time since the same epoch as
+ * the function which generated the initial time stamp.
+ */
+extern u64 timecounter_read(struct timecounter *tc);
+
+/**
+ * timecounter_cyc2time - convert a cycle counter to same
+ * time base as values returned by
+ * timecounter_read()
+ * @tc: Pointer to time counter.
+ * @cycle: a value returned by tc->cc->read()
+ *
+ * Cycle counts that are converted correctly as long as they
+ * fall into the interval [-1/2 max cycle count, +1/2 max cycle count],
+ * with "max cycle count" == cs->mask+1.
+ *
+ * This allows conversion of cycle counter values which were generated
+ * in the past.
+ */
+extern u64 timecounter_cyc2time(struct timecounter *tc,
+ cycle_t cycle_tstamp);
+
+/**
* struct clocksource - hardware abstraction for a free running counter
* Provides mostly state-free accessors to the underlying hardware.
+ * This is the structure used for system time.
*
* @name: ptr to clocksource name
* @list: list head for registration
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ca89e15..c46c931 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -31,6 +31,82 @@
#include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
#include <linux/tick.h>
+void timecounter_init(struct timecounter *tc,
+ const struct cyclecounter *cc,
+ u64 start_tstamp)
+{
+ tc->cc = cc;
+ tc->cycle_last = cc->read(cc);
+ tc->nsec = start_tstamp;
+}
+EXPORT_SYMBOL(timecounter_init);
+
+/**
+ * timecounter_read_delta - get nanoseconds since last call of this function
+ * @tc: Pointer to time counter
+ *
+ * When the underlying cycle counter runs over, this will be handled
+ * correctly as long as it does not run over more than once between
+ * calls.
+ *
+ * The first call to this function for a new time counter initializes
+ * the time tracking and returns an undefined result.
+ */
+static u64 timecounter_read_delta(struct timecounter *tc)
+{
+ cycle_t cycle_now, cycle_delta;
+ u64 ns_offset;
+
+ /* read cycle counter: */
+ cycle_now = tc->cc->read(tc->cc);
+
+ /* calculate the delta since the last timecounter_read_delta(): */
+ cycle_delta = (cycle_now - tc->cycle_last) & tc->cc->mask;
+
+ /* convert to nanoseconds: */
+ ns_offset = cyclecounter_cyc2ns(tc->cc, cycle_delta);
+
+ /* update time stamp of timecounter_read_delta() call: */
+ tc->cycle_last = cycle_now;
+
+ return ns_offset;
+}
+
+u64 timecounter_read(struct timecounter *tc)
+{
+ u64 nsec;
+
+ /* increment time by nanoseconds since last call */
+ nsec = timecounter_read_delta(tc);
+ nsec += tc->nsec;
+ tc->nsec = nsec;
+
+ return nsec;
+}
+EXPORT_SYMBOL(timecounter_read);
+
+u64 timecounter_cyc2time(struct timecounter *tc,
+ cycle_t cycle_tstamp)
+{
+ u64 cycle_delta = (cycle_tstamp - tc->cycle_last) & tc->cc->mask;
+ u64 nsec;
+
+ /*
+ * Instead of always treating cycle_tstamp as more recent
+ * than tc->cycle_last, detect when it is too far in the
+ * future and treat it as old time stamp instead.
+ */
+ if (cycle_delta > tc->cc->mask / 2) {
+ cycle_delta = (tc->cycle_last - cycle_tstamp) & tc->cc->mask;
+ nsec = tc->nsec - cyclecounter_cyc2ns(tc->cc, cycle_delta);
+ } else {
+ nsec = cyclecounter_cyc2ns(tc->cc, cycle_delta) + tc->nsec;
+ }
+
+ return nsec;
+}
+EXPORT_SYMBOL(timecounter_cyc2time);
+
/* XXX - Would like a better way for initializing curr_clocksource */
extern struct clocksource clocksource_jiffies;
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases
2009-02-12 15:00 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Mapping from a struct timecounter to a time returned by functions like
ktime_get_real() is implemented. This is sufficient to use this code
in a network device driver which wants to support hardware time
stamping and transformation of hardware time stamps to system time.
The interface could have been made more versatile by not depending on
a time counter, but this wasn't done to avoid writing glue code
elsewhere.
The method implemented here is the one used and analyzed under the name
"assisted PTP" in the LCI PTP paper:
http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
Acked-by: John Stultz <johnstul@us.ibm.com>
---
include/linux/timecompare.h | 125 ++++++++++++++++++++++++++++
kernel/time/Makefile | 2 +-
kernel/time/timecompare.c | 191 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 317 insertions(+), 1 deletions(-)
create mode 100644 include/linux/timecompare.h
create mode 100644 kernel/time/timecompare.c
diff --git a/include/linux/timecompare.h b/include/linux/timecompare.h
new file mode 100644
index 0000000..546e223
--- /dev/null
+++ b/include/linux/timecompare.h
@@ -0,0 +1,125 @@
+/*
+ * Utility code which helps transforming between two different time
+ * bases, called "source" and "target" time in this code.
+ *
+ * Source time has to be provided via the timecounter API while target
+ * time is accessed via a function callback whose prototype
+ * intentionally matches ktime_get() and ktime_get_real(). These
+ * interfaces where chosen like this so that the code serves its
+ * initial purpose without additional glue code.
+ *
+ * This purpose is synchronizing a hardware clock in a NIC with system
+ * time, in order to implement the Precision Time Protocol (PTP,
+ * IEEE1588) with more accurate hardware assisted time stamping. In
+ * that context only synchronization against system time (=
+ * ktime_get_real()) is currently needed. But this utility code might
+ * become useful in other situations, which is why it was written as
+ * general purpose utility code.
+ *
+ * The source timecounter is assumed to return monotonically
+ * increasing time (but this code does its best to compensate if that
+ * is not the case) whereas target time may jump.
+ *
+ * The target time corresponding to a source time is determined by
+ * reading target time, reading source time, reading target time
+ * again, then assuming that average target time corresponds to source
+ * time. In other words, the assumption is that reading the source
+ * time is slow and involves equal time for sending the request and
+ * receiving the reply, whereas reading target time is assumed to be
+ * fast.
+ *
+ * Copyright (C) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+#ifndef _LINUX_TIMECOMPARE_H
+#define _LINUX_TIMECOMPARE_H
+
+#include <linux/clocksource.h>
+#include <linux/ktime.h>
+
+/**
+ * struct timecompare - stores state and configuration for the two clocks
+ *
+ * Initialize to zero, then set source/target/num_samples.
+ *
+ * Transformation between source time and target time is done with:
+ * target_time = source_time + offset +
+ * (source_time - last_update) * skew /
+ * TIMECOMPARE_SKEW_RESOLUTION
+ *
+ * @source: used to get source time stamps via timecounter_read()
+ * @target: function returning target time (for example, ktime_get
+ * for monotonic time, or ktime_get_real for wall clock)
+ * @num_samples: number of times that source time and target time are to
+ * be compared when determining their offset
+ * @offset: (target time - source time) at the time of the last update
+ * @skew: average (target time - source time) / delta source time *
+ * TIMECOMPARE_SKEW_RESOLUTION
+ * @last_update: last source time stamp when time offset was measured
+ */
+struct timecompare {
+ struct timecounter *source;
+ ktime_t (*target)(void);
+ int num_samples;
+
+ s64 offset;
+ s64 skew;
+ u64 last_update;
+};
+
+/**
+ * timecompare_transform - transform source time stamp into target time base
+ * @sync: context for time sync
+ * @source_tstamp: the result of timecounter_read() or
+ * timecounter_cyc2time()
+ */
+extern ktime_t timecompare_transform(struct timecompare *sync,
+ u64 source_tstamp);
+
+/**
+ * timecompare_offset - measure current (target time - source time) offset
+ * @sync: context for time sync
+ * @offset: average offset during sample period returned here
+ * @source_tstamp: average source time during sample period returned here
+ *
+ * Returns number of samples used. Might be zero (= no result) in the
+ * unlikely case that target time was monotonically decreasing for all
+ * samples (= broken).
+ */
+extern int timecompare_offset(struct timecompare *sync,
+ s64 *offset,
+ u64 *source_tstamp);
+
+extern void __timecompare_update(struct timecompare *sync,
+ u64 source_tstamp);
+
+/**
+ * timecompare_update - update offset and skew by measuring current offset
+ * @sync: context for time sync
+ * @source_tstamp: the result of timecounter_read() or
+ * timecounter_cyc2time(), pass zero to force update
+ *
+ * Updates are only done at most once per second.
+ */
+static inline void timecompare_update(struct timecompare *sync,
+ u64 source_tstamp)
+{
+ if (!source_tstamp ||
+ (s64)(source_tstamp - sync->last_update) >= NSEC_PER_SEC)
+ __timecompare_update(sync, source_tstamp);
+}
+
+#endif /* _LINUX_TIMECOMPARE_H */
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index 905b0b5..0b0a636 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -1,4 +1,4 @@
-obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o
+obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o timecompare.o
obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD) += clockevents.o
obj-$(CONFIG_GENERIC_CLOCKEVENTS) += tick-common.o
diff --git a/kernel/time/timecompare.c b/kernel/time/timecompare.c
new file mode 100644
index 0000000..71e7f1a
--- /dev/null
+++ b/kernel/time/timecompare.c
@@ -0,0 +1,191 @@
+/*
+ * Copyright (C) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/timecompare.h>
+#include <linux/module.h>
+#include <linux/math64.h>
+
+/*
+ * fixed point arithmetic scale factor for skew
+ *
+ * Usually one would measure skew in ppb (parts per billion, 1e9), but
+ * using a factor of 2 simplifies the math.
+ */
+#define TIMECOMPARE_SKEW_RESOLUTION (((s64)1)<<30)
+
+ktime_t timecompare_transform(struct timecompare *sync,
+ u64 source_tstamp)
+{
+ u64 nsec;
+
+ nsec = source_tstamp + sync->offset;
+ nsec += (s64)(source_tstamp - sync->last_update) * sync->skew /
+ TIMECOMPARE_SKEW_RESOLUTION;
+
+ return ns_to_ktime(nsec);
+}
+EXPORT_SYMBOL(timecompare_transform);
+
+int timecompare_offset(struct timecompare *sync,
+ s64 *offset,
+ u64 *source_tstamp)
+{
+ u64 start_source = 0, end_source = 0;
+ struct {
+ s64 offset;
+ s64 duration_target;
+ } buffer[10], sample, *samples;
+ int counter = 0, i;
+ int used;
+ int index;
+ int num_samples = sync->num_samples;
+
+ if (num_samples > sizeof(buffer)/sizeof(buffer[0])) {
+ samples = kmalloc(sizeof(*samples) * num_samples, GFP_ATOMIC);
+ if (!samples) {
+ samples = buffer;
+ num_samples = sizeof(buffer)/sizeof(buffer[0]);
+ }
+ } else {
+ samples = buffer;
+ }
+
+ /* run until we have enough valid samples, but do not try forever */
+ i = 0;
+ counter = 0;
+ while (1) {
+ u64 ts;
+ ktime_t start, end;
+
+ start = sync->target();
+ ts = timecounter_read(sync->source);
+ end = sync->target();
+
+ if (!i)
+ start_source = ts;
+
+ /* ignore negative durations */
+ sample.duration_target = ktime_to_ns(ktime_sub(end, start));
+ if (sample.duration_target >= 0) {
+ /*
+ * assume symetric delay to and from source:
+ * average target time corresponds to measured
+ * source time
+ */
+ sample.offset =
+ ktime_to_ns(ktime_add(end, start)) / 2 -
+ ts;
+
+ /* simple insertion sort based on duration */
+ index = counter - 1;
+ while (index >= 0) {
+ if (samples[index].duration_target <
+ sample.duration_target)
+ break;
+ samples[index + 1] = samples[index];
+ index--;
+ }
+ samples[index + 1] = sample;
+ counter++;
+ }
+
+ i++;
+ if (counter >= num_samples || i >= 100000) {
+ end_source = ts;
+ break;
+ }
+ }
+
+ *source_tstamp = (end_source + start_source) / 2;
+
+ /* remove outliers by only using 75% of the samples */
+ used = counter * 3 / 4;
+ if (!used)
+ used = counter;
+ if (used) {
+ /* calculate average */
+ s64 off = 0;
+ for (index = 0; index < used; index++)
+ off += samples[index].offset;
+ *offset = div_s64(off, used);
+ }
+
+ if (samples && samples != buffer)
+ kfree(samples);
+
+ return used;
+}
+EXPORT_SYMBOL(timecompare_offset);
+
+void __timecompare_update(struct timecompare *sync,
+ u64 source_tstamp)
+{
+ s64 offset;
+ u64 average_time;
+
+ if (!timecompare_offset(sync, &offset, &average_time))
+ return;
+
+ if (!sync->last_update) {
+ sync->last_update = average_time;
+ sync->offset = offset;
+ sync->skew = 0;
+ } else {
+ s64 delta_nsec = average_time - sync->last_update;
+
+ /* avoid division by negative or small deltas */
+ if (delta_nsec >= 10000) {
+ s64 delta_offset_nsec = offset - sync->offset;
+ s64 skew; /* delta_offset_nsec *
+ TIMECOMPARE_SKEW_RESOLUTION /
+ delta_nsec */
+ u64 divisor;
+
+ /* div_s64() is limited to 32 bit divisor */
+ skew = delta_offset_nsec * TIMECOMPARE_SKEW_RESOLUTION;
+ divisor = delta_nsec;
+ while (unlikely(divisor >= ((s64)1) << 32)) {
+ /* divide both by 2; beware, right shift
+ of negative value has undefined
+ behavior and can only be used for
+ the positive divisor */
+ skew = div_s64(skew, 2);
+ divisor >>= 1;
+ }
+ skew = div_s64(skew, divisor);
+
+ /*
+ * Calculate new overall skew as 4/16 the
+ * old value and 12/16 the new one. This is
+ * a rather arbitrary tradeoff between
+ * only using the latest measurement (0/16 and
+ * 16/16) and even more weight on past measurements.
+ */
+#define TIMECOMPARE_NEW_SKEW_PER_16 12
+ sync->skew =
+ div_s64((16 - TIMECOMPARE_NEW_SKEW_PER_16) *
+ sync->skew +
+ TIMECOMPARE_NEW_SKEW_PER_16 * skew,
+ 16);
+ sync->last_update = average_time;
+ sync->offset = offset;
+ }
+ }
+}
+EXPORT_SYMBOL(__timecompare_update);
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets
2009-02-12 15:00 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
User space can request hardware and/or software time stamping.
Reporting of the result(s) via a new control message is enabled
separately for each field in the message because some of the
fields may require additional computation and thus cause overhead.
User space can tell the different kinds of time stamps apart
and choose what suits its needs.
When a TX timestamp operation is requested, the TX skb will be cloned
and the clone will be time stamped (in hardware or software) and added
to the socket error queue of the skb, if the skb has a socket
associated with it.
The actual TX timestamp will reach userspace as a RX timestamp on the
cloned packet. If timestamping is requested and no timestamping is
done in the device driver (potentially this may use hardware
timestamping), it will be done in software after the device's
start_hard_xmit routine.
---
Documentation/networking/timestamping.txt | 178 +++++++
Documentation/networking/timestamping/.gitignore | 1 +
Documentation/networking/timestamping/Makefile | 6 +
.../networking/timestamping/timestamping.c | 533 ++++++++++++++++++++
arch/alpha/include/asm/socket.h | 3 +
arch/arm/include/asm/socket.h | 3 +
arch/avr32/include/asm/socket.h | 3 +
arch/blackfin/include/asm/socket.h | 3 +
arch/cris/include/asm/socket.h | 3 +
arch/h8300/include/asm/socket.h | 3 +
arch/ia64/include/asm/socket.h | 3 +
arch/m68k/include/asm/socket.h | 3 +
arch/mips/include/asm/socket.h | 3 +
arch/parisc/include/asm/socket.h | 3 +
arch/powerpc/include/asm/socket.h | 3 +
arch/s390/include/asm/socket.h | 3 +
arch/sh/include/asm/socket.h | 3 +
arch/sparc/include/asm/socket.h | 3 +
arch/x86/include/asm/socket.h | 3 +
arch/xtensa/include/asm/socket.h | 3 +
include/asm-frv/socket.h | 3 +
include/asm-m32r/socket.h | 3 +
include/asm-mn10300/socket.h | 3 +
include/linux/errqueue.h | 1 +
include/linux/net_tstamp.h | 104 ++++
include/linux/sockios.h | 3 +
26 files changed, 883 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/timestamping.txt
create mode 100644 Documentation/networking/timestamping/.gitignore
create mode 100644 Documentation/networking/timestamping/Makefile
create mode 100644 Documentation/networking/timestamping/timestamping.c
create mode 100644 include/linux/net_tstamp.h
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
new file mode 100644
index 0000000..a681a65
--- /dev/null
+++ b/Documentation/networking/timestamping.txt
@@ -0,0 +1,178 @@
+The existing interfaces for getting network packages time stamped are:
+
+* SO_TIMESTAMP
+ Generate time stamp for each incoming packet using the (not necessarily
+ monotonous!) system time. Result is returned via recv_msg() in a
+ control message as timeval (usec resolution).
+
+* SO_TIMESTAMPNS
+ Same time stamping mechanism as SO_TIMESTAMP, but returns result as
+ timespec (nsec resolution).
+
+* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
+ Only for multicasts: approximate send time stamp by receiving the looped
+ packet and using its receive time stamp.
+
+The following interface complements the existing ones: receive time
+stamps can be generated and returned for arbitrary packets and much
+closer to the point where the packet is really sent. Time stamps can
+be generated in software (as before) or in hardware (if the hardware
+has such a feature).
+
+SO_TIMESTAMPING:
+
+Instructs the socket layer which kind of information is wanted. The
+parameter is an integer with some of the following bits set. Setting
+other bits is an error and doesn't change the current state.
+
+SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamp in hardware
+SOF_TIMESTAMPING_TX_SOFTWARE: if SOF_TIMESTAMPING_TX_HARDWARE is off or
+ fails, then do it in software
+SOF_TIMESTAMPING_RX_HARDWARE: return the original, unmodified time stamp
+ as generated by the hardware
+SOF_TIMESTAMPING_RX_SOFTWARE: if SOF_TIMESTAMPING_RX_HARDWARE is off or
+ fails, then do it in software
+SOF_TIMESTAMPING_RAW_HARDWARE: return original raw hardware time stamp
+SOF_TIMESTAMPING_SYS_HARDWARE: return hardware time stamp transformed to
+ the system time base
+SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in
+ software
+
+SOF_TIMESTAMPING_TX/RX determine how time stamps are generated.
+SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the
+following control message:
+ struct scm_timestamping {
+ struct timespec systime;
+ struct timespec hwtimetrans;
+ struct timespec hwtimeraw;
+ };
+
+recvmsg() can be used to get this control message for regular incoming
+packets. For send time stamps the outgoing packet is looped back to
+the socket's error queue with the send time stamp(s) attached. It can
+be received with recvmsg(flags=MSG_ERRQUEUE). The call returns the
+original outgoing packet data including all headers preprended down to
+and including the link layer, the scm_timestamping control message and
+a sock_extended_err control message with ee_errno==ENOMSG and
+ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending
+bounced packet is ready for reading as far as select() is concerned.
+
+All three values correspond to the same event in time, but were
+generated in different ways. Each of these values may be empty (= all
+zero), in which case no such value was available. If the application
+is not interested in some of these values, they can be left blank to
+avoid the potential overhead of calculating them.
+
+systime is the value of the system time at that moment. This
+corresponds to the value also returned via SO_TIMESTAMP[NS]. If the
+time stamp was generated by hardware, then this field is
+empty. Otherwise it is filled in if SOF_TIMESTAMPING_SOFTWARE is
+set.
+
+hwtimeraw is the original hardware time stamp. Filled in if
+SOF_TIMESTAMPING_RAW_HARDWARE is set. No assumptions about its
+relation to system time should be made.
+
+hwtimetrans is the hardware time stamp transformed so that it
+corresponds as good as possible to system time. This correlation is
+not perfect; as a consequence, sorting packets received via different
+NICs by their hwtimetrans may differ from the order in which they were
+received. hwtimetrans may be non-monotonic even for the same NIC.
+Filled in if SOF_TIMESTAMPING_SYS_HARDWARE is set. Requires support
+by the network device and will be empty without that support.
+
+
+SIOCSHWTSTAMP:
+
+Hardware time stamping must also be initialized for each device driver
+that is expected to do hardware time stamping. The parameter is:
+
+struct hwtstamp_config {
+ int flags; /* no flags defined right now, must be zero */
+ int tx_type; /* HWTSTAMP_TX_* */
+ int rx_filter; /* HWTSTAMP_FILTER_* */
+};
+
+Desired behavior is passed into the kernel and to a specific device by
+calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose
+ifr_data points to a struct hwtstamp_config. The tx_type and
+rx_filter are hints to the driver what it is expected to do. If
+the requested fine-grained filtering for incoming packets is not
+supported, the driver may time stamp more than just the requested types
+of packets.
+
+A driver which supports hardware time stamping shall update the struct
+with the actual, possibly more permissive configuration. If the
+requested packets cannot be time stamped, then nothing should be
+changed and ERANGE shall be returned (in contrast to EINVAL, which
+indicates that SIOCSHWTSTAMP is not supported at all).
+
+Only a processes with admin rights may change the configuration. User
+space is responsible to ensure that multiple processes don't interfere
+with each other and that the settings are reset.
+
+/* possible values for hwtstamp_config->tx_type */
+enum {
+ /*
+ * no outgoing packet will need hardware time stamping;
+ * should a packet arrive which asks for it, no hardware
+ * time stamping will be done
+ */
+ HWTSTAMP_TX_OFF,
+
+ /*
+ * enables hardware time stamping for outgoing packets;
+ * the sender of the packet decides which are to be
+ * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
+ * before sending the packet
+ */
+ HWTSTAMP_TX_ON,
+};
+
+/* possible values for hwtstamp_config->rx_filter */
+enum {
+ /* time stamp no incoming packet at all */
+ HWTSTAMP_FILTER_NONE,
+
+ /* time stamp any incoming packet */
+ HWTSTAMP_FILTER_ALL,
+
+ /* return value: time stamp all packets requested plus some others */
+ HWTSTAMP_FILTER_SOME,
+
+ /* PTP v1, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
+
+ ...
+};
+
+
+DEVICE IMPLEMENTATION
+
+A driver which supports hardware time stamping must support the
+SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored
+in the skb with skb_hwtstamp_set().
+
+Time stamps for outgoing packets are to be generated as follows:
+- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware()
+ returns non-zero. If yes, then the driver is expected
+ to do hardware time stamping.
+- If this is possible for the skb and requested, then declare
+ that the driver is doing the time stamping by calling
+ skb_hwtstamp_tx_in_progress(). A driver not supporting
+ hardware time stamping doesn't do that. A driver must never
+ touch sk_buff::tstamp! It is used to store how time stamping
+ for an outgoing packets is to be done.
+- As soon as the driver has sent the packet and/or obtained a
+ hardware time stamp for it, it passes the time stamp back by
+ calling skb_hwtstamp_tx() with the original skb, the raw
+ hardware time stamp and a handle to the device (necessary
+ to convert the hardware time stamp to system time). If obtaining
+ the hardware time stamp somehow fails, then the driver should
+ not fall back to software time stamping. The rationale is that
+ this would occur at a later time in the processing pipeline
+ than other software time stamping and therefore could lead
+ to unexpected deltas between time stamps.
+- If the driver did not call skb_hwtstamp_tx_in_progress(), then
+ dev_hard_start_xmit() checks whether software time stamping
+ is wanted as fallback and potentially generates the time stamp.
diff --git a/Documentation/networking/timestamping/.gitignore b/Documentation/networking/timestamping/.gitignore
new file mode 100644
index 0000000..71e81eb
--- /dev/null
+++ b/Documentation/networking/timestamping/.gitignore
@@ -0,0 +1 @@
+timestamping
diff --git a/Documentation/networking/timestamping/Makefile b/Documentation/networking/timestamping/Makefile
new file mode 100644
index 0000000..2a1489f
--- /dev/null
+++ b/Documentation/networking/timestamping/Makefile
@@ -0,0 +1,6 @@
+CPPFLAGS = -I../../../include
+
+timestamping: timestamping.c
+
+clean:
+ rm -f timestamping
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
new file mode 100644
index 0000000..43d1431
--- /dev/null
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -0,0 +1,533 @@
+/*
+ * This program demonstrates how the various time stamping features in
+ * the Linux kernel work. It emulates the behavior of a PTP
+ * implementation in stand-alone master mode by sending PTPv1 Sync
+ * multicasts once every second. It looks for similar packets, but
+ * beyond that doesn't actually implement PTP.
+ *
+ * Outgoing packets are time stamped with SO_TIMESTAMPING with or
+ * without hardware support.
+ *
+ * Incoming packets are time stamped with SO_TIMESTAMPING with or
+ * without hardware support, SIOCGSTAMP[NS] (per-socket time stamp) and
+ * SO_TIMESTAMP[NS].
+ *
+ * Copyright (C) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <sys/ioctl.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+
+#include "asm/types.h"
+#include "linux/net_tstamp.h"
+#include "linux/errqueue.h"
+
+#ifndef SO_TIMESTAMPING
+# define SO_TIMESTAMPING 37
+# define SCM_TIMESTAMPING SO_TIMESTAMPING
+#endif
+
+#ifndef SO_TIMESTAMPNS
+# define SO_TIMESTAMPNS 35
+#endif
+
+#ifndef SIOCGSTAMPNS
+# define SIOCGSTAMPNS 0x8907
+#endif
+
+#ifndef SIOCSHWTSTAMP
+# define SIOCSHWTSTAMP 0x89b0
+#endif
+
+static void usage(const char *error)
+{
+ if (error)
+ printf("invalid option: %s\n", error);
+ printf("timestamping interface option*\n\n"
+ "Options:\n"
+ " IP_MULTICAST_LOOP - looping outgoing multicasts\n"
+ " SO_TIMESTAMP - normal software time stamping, ms resolution\n"
+ " SO_TIMESTAMPNS - more accurate software time stamping\n"
+ " SOF_TIMESTAMPING_TX_HARDWARE - hardware time stamping of outgoing packets\n"
+ " SOF_TIMESTAMPING_TX_SOFTWARE - software fallback for outgoing packets\n"
+ " SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n"
+ " SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n"
+ " SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n"
+ " SOF_TIMESTAMPING_SYS_HARDWARE - request reporting of transformed HW time stamps\n"
+ " SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n"
+ " SIOCGSTAMP - check last socket time stamp\n"
+ " SIOCGSTAMPNS - more accurate socket time stamp\n");
+ exit(1);
+}
+
+static void bail(const char *error)
+{
+ printf("%s: %s\n", error, strerror(errno));
+ exit(1);
+}
+
+static const unsigned char sync[] = {
+ 0x00, 0x01, 0x00, 0x01,
+ 0x5f, 0x44, 0x46, 0x4c,
+ 0x54, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x01, 0x01,
+
+ /* fake uuid */
+ 0x00, 0x01,
+ 0x02, 0x03, 0x04, 0x05,
+
+ 0x00, 0x01, 0x00, 0x37,
+ 0x00, 0x00, 0x00, 0x08,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x49, 0x05, 0xcd, 0x01,
+ 0x29, 0xb1, 0x8d, 0xb0,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x01,
+
+ /* fake uuid */
+ 0x00, 0x01,
+ 0x02, 0x03, 0x04, 0x05,
+
+ 0x00, 0x00, 0x00, 0x37,
+ 0x00, 0x00, 0x00, 0x04,
+ 0x44, 0x46, 0x4c, 0x54,
+ 0x00, 0x00, 0xf0, 0x60,
+ 0x00, 0x01, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x01,
+ 0x00, 0x00, 0xf0, 0x60,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x04,
+ 0x44, 0x46, 0x4c, 0x54,
+ 0x00, 0x01,
+
+ /* fake uuid */
+ 0x00, 0x01,
+ 0x02, 0x03, 0x04, 0x05,
+
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00
+};
+
+static void sendpacket(int sock, struct sockaddr *addr, socklen_t addr_len)
+{
+ struct timeval now;
+ int res;
+
+ res = sendto(sock, sync, sizeof(sync), 0,
+ addr, addr_len);
+ gettimeofday(&now, 0);
+ if (res < 0)
+ printf("%s: %s\n", "send", strerror(errno));
+ else
+ printf("%ld.%06ld: sent %d bytes\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ res);
+}
+
+static void printpacket(struct msghdr *msg, int res,
+ char *data,
+ int sock, int recvmsg_flags,
+ int siocgstamp, int siocgstampns)
+{
+ struct sockaddr_in *from_addr = (struct sockaddr_in *)msg->msg_name;
+ struct cmsghdr *cmsg;
+ struct timeval tv;
+ struct timespec ts;
+ struct timeval now;
+
+ gettimeofday(&now, 0);
+
+ printf("%ld.%06ld: received %s data, %d bytes from %s, %d bytes control messages\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
+ res,
+ inet_ntoa(from_addr->sin_addr),
+ msg->msg_controllen);
+ for (cmsg = CMSG_FIRSTHDR(msg);
+ cmsg;
+ cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ printf(" cmsg len %d: ", cmsg->cmsg_len);
+ switch (cmsg->cmsg_level) {
+ case SOL_SOCKET:
+ printf("SOL_SOCKET ");
+ switch (cmsg->cmsg_type) {
+ case SO_TIMESTAMP: {
+ struct timeval *stamp =
+ (struct timeval *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMP %ld.%06ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_usec);
+ break;
+ }
+ case SO_TIMESTAMPNS: {
+ struct timespec *stamp =
+ (struct timespec *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMPNS %ld.%09ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ break;
+ }
+ case SO_TIMESTAMPING: {
+ struct timespec *stamp =
+ (struct timespec *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMPING ");
+ printf("SW %ld.%09ld ",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ stamp++;
+ printf("HW transformed %ld.%09ld ",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ stamp++;
+ printf("HW raw %ld.%09ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ break;
+ }
+ default:
+ printf("type %d", cmsg->cmsg_type);
+ break;
+ }
+ break;
+ case IPPROTO_IP:
+ printf("IPPROTO_IP ");
+ switch (cmsg->cmsg_type) {
+ case IP_RECVERR: {
+ struct sock_extended_err *err =
+ (struct sock_extended_err *)CMSG_DATA(cmsg);
+ printf("IP_RECVERR ee_errno '%s' ee_origin %d => %s",
+ strerror(err->ee_errno),
+ err->ee_origin,
+#ifdef SO_EE_ORIGIN_TIMESTAMPING
+ err->ee_origin == SO_EE_ORIGIN_TIMESTAMPING ?
+ "bounced packet" : "unexpected origin"
+#else
+ "probably SO_EE_ORIGIN_TIMESTAMPING"
+#endif
+ );
+ if (res < sizeof(sync))
+ printf(" => truncated data?!");
+ else if (!memcmp(sync, data + res - sizeof(sync),
+ sizeof(sync)))
+ printf(" => GOT OUR DATA BACK (HURRAY!)");
+ break;
+ }
+ case IP_PKTINFO: {
+ struct in_pktinfo *pktinfo =
+ (struct in_pktinfo *)CMSG_DATA(cmsg);
+ printf("IP_PKTINFO interface index %u",
+ pktinfo->ipi_ifindex);
+ break;
+ }
+ default:
+ printf("type %d", cmsg->cmsg_type);
+ break;
+ }
+ break;
+ default:
+ printf("level %d type %d",
+ cmsg->cmsg_level,
+ cmsg->cmsg_type);
+ break;
+ }
+ printf("\n");
+ }
+
+ if (siocgstamp) {
+ if (ioctl(sock, SIOCGSTAMP, &tv))
+ printf(" %s: %s\n", "SIOCGSTAMP", strerror(errno));
+ else
+ printf("SIOCGSTAMP %ld.%06ld\n",
+ (long)tv.tv_sec,
+ (long)tv.tv_usec);
+ }
+ if (siocgstampns) {
+ if (ioctl(sock, SIOCGSTAMPNS, &ts))
+ printf(" %s: %s\n", "SIOCGSTAMPNS", strerror(errno));
+ else
+ printf("SIOCGSTAMPNS %ld.%09ld\n",
+ (long)ts.tv_sec,
+ (long)ts.tv_nsec);
+ }
+}
+
+static void recvpacket(int sock, int recvmsg_flags,
+ int siocgstamp, int siocgstampns)
+{
+ char data[256];
+ struct msghdr msg;
+ struct iovec entry;
+ struct sockaddr_in from_addr;
+ struct {
+ struct cmsghdr cm;
+ char control[512];
+ } control;
+ int res;
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_iov = &entry;
+ msg.msg_iovlen = 1;
+ entry.iov_base = data;
+ entry.iov_len = sizeof(data);
+ msg.msg_name = (caddr_t)&from_addr;
+ msg.msg_namelen = sizeof(from_addr);
+ msg.msg_control = &control;
+ msg.msg_controllen = sizeof(control);
+
+ res = recvmsg(sock, &msg, recvmsg_flags|MSG_DONTWAIT);
+ if (res < 0) {
+ printf("%s %s: %s\n",
+ "recvmsg",
+ (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
+ strerror(errno));
+ } else {
+ printpacket(&msg, res, data,
+ sock, recvmsg_flags,
+ siocgstamp, siocgstampns);
+ }
+}
+
+int main(int argc, char **argv)
+{
+ int so_timestamping_flags = 0;
+ int so_timestamp = 0;
+ int so_timestampns = 0;
+ int siocgstamp = 0;
+ int siocgstampns = 0;
+ int ip_multicast_loop = 0;
+ char *interface;
+ int i;
+ int enabled = 1;
+ int sock;
+ struct ifreq device;
+ struct ifreq hwtstamp;
+ struct hwtstamp_config hwconfig, hwconfig_requested;
+ struct sockaddr_in addr;
+ struct ip_mreq imr;
+ struct in_addr iaddr;
+ int val;
+ socklen_t len;
+ struct timeval next;
+
+ if (argc < 2)
+ usage(0);
+ interface = argv[1];
+
+ for (i = 2; i < argc; i++) {
+ if (!strcasecmp(argv[i], "SO_TIMESTAMP"))
+ so_timestamp = 1;
+ else if (!strcasecmp(argv[i], "SO_TIMESTAMPNS"))
+ so_timestampns = 1;
+ else if (!strcasecmp(argv[i], "SIOCGSTAMP"))
+ siocgstamp = 1;
+ else if (!strcasecmp(argv[i], "SIOCGSTAMPNS"))
+ siocgstampns = 1;
+ else if (!strcasecmp(argv[i], "IP_MULTICAST_LOOP"))
+ ip_multicast_loop = 1;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_HARDWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_TX_HARDWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_SOFTWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_TX_SOFTWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_HARDWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_RX_HARDWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_SOFTWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SYS_HARDWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_SYS_HARDWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE;
+ else
+ usage(argv[i]);
+ }
+
+ sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
+ if (socket < 0)
+ bail("socket");
+
+ memset(&device, 0, sizeof(device));
+ strncpy(device.ifr_name, interface, sizeof(device.ifr_name));
+ if (ioctl(sock, SIOCGIFADDR, &device) < 0)
+ bail("getting interface IP address");
+
+ memset(&hwtstamp, 0, sizeof(hwtstamp));
+ strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name));
+ hwtstamp.ifr_data = (void *)&hwconfig;
+ memset(&hwconfig, 0, sizeof(&hwconfig));
+ hwconfig.tx_type =
+ (so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ?
+ HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
+ hwconfig.rx_filter =
+ (so_timestamping_flags & SOF_TIMESTAMPING_RX_HARDWARE) ?
+ HWTSTAMP_FILTER_PTP_V1_L4_SYNC : HWTSTAMP_FILTER_NONE;
+ hwconfig_requested = hwconfig;
+ if (ioctl(sock, SIOCSHWTSTAMP, &hwtstamp) < 0) {
+ if ((errno == EINVAL || errno == ENOTSUP) &&
+ hwconfig_requested.tx_type == HWTSTAMP_TX_OFF &&
+ hwconfig_requested.rx_filter == HWTSTAMP_FILTER_NONE)
+ printf("SIOCSHWTSTAMP: disabling hardware time stamping not possible\n");
+ else
+ bail("SIOCSHWTSTAMP");
+ }
+ printf("SIOCSHWTSTAMP: tx_type %d requested, got %d; rx_filter %d requested, got %d\n",
+ hwconfig_requested.tx_type, hwconfig.tx_type,
+ hwconfig_requested.rx_filter, hwconfig.rx_filter);
+
+ /* bind to PTP port */
+ addr.sin_family = AF_INET;
+ addr.sin_addr.s_addr = htonl(INADDR_ANY);
+ addr.sin_port = htons(319 /* PTP event port */);
+ if (bind(sock,
+ (struct sockaddr *)&addr,
+ sizeof(struct sockaddr_in)) < 0)
+ bail("bind");
+
+ /* set multicast group for outgoing packets */
+ inet_aton("224.0.1.130", &iaddr); /* alternate PTP domain 1 */
+ addr.sin_addr = iaddr;
+ imr.imr_multiaddr.s_addr = iaddr.s_addr;
+ imr.imr_interface.s_addr =
+ ((struct sockaddr_in *)&device.ifr_addr)->sin_addr.s_addr;
+ if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF,
+ &imr.imr_interface.s_addr, sizeof(struct in_addr)) < 0)
+ bail("set multicast");
+
+ /* join multicast group, loop our own packet */
+ if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,
+ &imr, sizeof(struct ip_mreq)) < 0)
+ bail("join multicast group");
+
+ if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP,
+ &ip_multicast_loop, sizeof(enabled)) < 0) {
+ bail("loop multicast");
+ }
+
+ /* set socket options for time stamping */
+ if (so_timestamp &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP,
+ &enabled, sizeof(enabled)) < 0)
+ bail("setsockopt SO_TIMESTAMP");
+
+ if (so_timestampns &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS,
+ &enabled, sizeof(enabled)) < 0)
+ bail("setsockopt SO_TIMESTAMPNS");
+
+ if (so_timestamping_flags &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING,
+ &so_timestamping_flags,
+ sizeof(so_timestamping_flags)) < 0)
+ bail("setsockopt SO_TIMESTAMPING");
+
+ /* request IP_PKTINFO for debugging purposes */
+ if (setsockopt(sock, SOL_IP, IP_PKTINFO,
+ &enabled, sizeof(enabled)) < 0)
+ printf("%s: %s\n", "setsockopt IP_PKTINFO", strerror(errno));
+
+ /* verify socket options */
+ len = sizeof(val);
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &val, &len) < 0)
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMP", strerror(errno));
+ else
+ printf("SO_TIMESTAMP %d\n", val);
+
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &val, &len) < 0)
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMPNS",
+ strerror(errno));
+ else
+ printf("SO_TIMESTAMPNS %d\n", val);
+
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &val, &len) < 0) {
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMPING",
+ strerror(errno));
+ } else {
+ printf("SO_TIMESTAMPING %d\n", val);
+ if (val != so_timestamping_flags)
+ printf(" not the expected value %d\n",
+ so_timestamping_flags);
+ }
+
+ /* send packets forever every five seconds */
+ gettimeofday(&next, 0);
+ next.tv_sec = (next.tv_sec + 1) / 5 * 5;
+ next.tv_usec = 0;
+ while (1) {
+ struct timeval now;
+ struct timeval delta;
+ long delta_us;
+ int res;
+ fd_set readfs, errorfs;
+
+ gettimeofday(&now, 0);
+ delta_us = (long)(next.tv_sec - now.tv_sec) * 1000000 +
+ (long)(next.tv_usec - now.tv_usec);
+ if (delta_us > 0) {
+ /* continue waiting for timeout or data */
+ delta.tv_sec = delta_us / 1000000;
+ delta.tv_usec = delta_us % 1000000;
+
+ FD_ZERO(&readfs);
+ FD_ZERO(&errorfs);
+ FD_SET(sock, &readfs);
+ FD_SET(sock, &errorfs);
+ printf("%ld.%06ld: select %ldus\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ delta_us);
+ res = select(sock + 1, &readfs, 0, &errorfs, &delta);
+ gettimeofday(&now, 0);
+ printf("%ld.%06ld: select returned: %d, %s\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ res,
+ res < 0 ? strerror(errno) : "success");
+ if (res > 0) {
+ if (FD_ISSET(sock, &readfs))
+ printf("ready for reading\n");
+ if (FD_ISSET(sock, &errorfs))
+ printf("has error\n");
+ recvpacket(sock, 0,
+ siocgstamp,
+ siocgstampns);
+ recvpacket(sock, MSG_ERRQUEUE,
+ siocgstamp,
+ siocgstampns);
+ }
+ } else {
+ /* write one packet */
+ sendpacket(sock,
+ (struct sockaddr *)&addr,
+ sizeof(addr));
+ next.tv_sec += 5;
+ continue;
+ }
+ }
+
+ return 0;
+}
diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index a1057c2..3641ec1 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -62,6 +62,9 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
/* O_NONBLOCK clashes with the bits used for socket types. Therefore we
* have to define SOCK_NONBLOCK to a different value here.
*/
diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h
index 6817be9..537de4e 100644
--- a/arch/arm/include/asm/socket.h
+++ b/arch/arm/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/avr32/include/asm/socket.h b/arch/avr32/include/asm/socket.h
index 35863f2..04c8606 100644
--- a/arch/avr32/include/asm/socket.h
+++ b/arch/avr32/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/blackfin/include/asm/socket.h b/arch/blackfin/include/asm/socket.h
index 2ca702e..fac7fe9 100644
--- a/arch/blackfin/include/asm/socket.h
+++ b/arch/blackfin/include/asm/socket.h
@@ -53,4 +53,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/cris/include/asm/socket.h b/arch/cris/include/asm/socket.h
index 9df0ca8..d5cf740 100644
--- a/arch/cris/include/asm/socket.h
+++ b/arch/cris/include/asm/socket.h
@@ -56,6 +56,9 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h
index da2520d..602518a 100644
--- a/arch/h8300/include/asm/socket.h
+++ b/arch/h8300/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/asm/socket.h b/arch/ia64/include/asm/socket.h
index d5ef0aa..7454212 100644
--- a/arch/ia64/include/asm/socket.h
+++ b/arch/ia64/include/asm/socket.h
@@ -63,4 +63,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m68k/include/asm/socket.h b/arch/m68k/include/asm/socket.h
index dbc64e9..ca87f93 100644
--- a/arch/m68k/include/asm/socket.h
+++ b/arch/m68k/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/mips/include/asm/socket.h b/arch/mips/include/asm/socket.h
index facc2d7..2abca17 100644
--- a/arch/mips/include/asm/socket.h
+++ b/arch/mips/include/asm/socket.h
@@ -75,6 +75,9 @@ To add: #define SO_REUSEPORT 0x0200 /* Allow local address and port reuse. */
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#ifdef __KERNEL__
/** sock_type - Socket types
diff --git a/arch/parisc/include/asm/socket.h b/arch/parisc/include/asm/socket.h
index fba402c..885472b 100644
--- a/arch/parisc/include/asm/socket.h
+++ b/arch/parisc/include/asm/socket.h
@@ -54,6 +54,9 @@
#define SO_MARK 0x401f
+#define SO_TIMESTAMPING 0x4020
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
/* O_NONBLOCK clashes with the bits used for socket types. Therefore we
* have to define SOCK_NONBLOCK to a different value here.
*/
diff --git a/arch/powerpc/include/asm/socket.h b/arch/powerpc/include/asm/socket.h
index f5a4e16..1e5cfad 100644
--- a/arch/powerpc/include/asm/socket.h
+++ b/arch/powerpc/include/asm/socket.h
@@ -61,4 +61,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/asm/socket.h b/arch/s390/include/asm/socket.h
index c786ab6..02330c5 100644
--- a/arch/s390/include/asm/socket.h
+++ b/arch/s390/include/asm/socket.h
@@ -62,4 +62,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/sh/include/asm/socket.h b/arch/sh/include/asm/socket.h
index 6d4bf65..345653b 100644
--- a/arch/sh/include/asm/socket.h
+++ b/arch/sh/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* __ASM_SH_SOCKET_H */
diff --git a/arch/sparc/include/asm/socket.h b/arch/sparc/include/asm/socket.h
index bf50d0c..982a12f 100644
--- a/arch/sparc/include/asm/socket.h
+++ b/arch/sparc/include/asm/socket.h
@@ -50,6 +50,9 @@
#define SO_MARK 0x0022
+#define SO_TIMESTAMPING 0x0023
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
/* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 0x5001
#define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002
diff --git a/arch/x86/include/asm/socket.h b/arch/x86/include/asm/socket.h
index 8ab9cc8..ca8bf2c 100644
--- a/arch/x86/include/asm/socket.h
+++ b/arch/x86/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_X86_SOCKET_H */
diff --git a/arch/xtensa/include/asm/socket.h b/arch/xtensa/include/asm/socket.h
index 6100682..dd1a7a4 100644
--- a/arch/xtensa/include/asm/socket.h
+++ b/arch/xtensa/include/asm/socket.h
@@ -65,4 +65,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _XTENSA_SOCKET_H */
diff --git a/include/asm-frv/socket.h b/include/asm-frv/socket.h
index e51ca67..57c3d40 100644
--- a/include/asm-frv/socket.h
+++ b/include/asm-frv/socket.h
@@ -54,5 +54,8 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/include/asm-m32r/socket.h b/include/asm-m32r/socket.h
index 9a0e200..be7ed58 100644
--- a/include/asm-m32r/socket.h
+++ b/include/asm-m32r/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_M32R_SOCKET_H */
diff --git a/include/asm-mn10300/socket.h b/include/asm-mn10300/socket.h
index 80af9c4..fb5daf4 100644
--- a/include/asm-mn10300/socket.h
+++ b/include/asm-mn10300/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/include/linux/errqueue.h b/include/linux/errqueue.h
index ceb1454..ec12cc7 100644
--- a/include/linux/errqueue.h
+++ b/include/linux/errqueue.h
@@ -18,6 +18,7 @@ struct sock_extended_err
#define SO_EE_ORIGIN_LOCAL 1
#define SO_EE_ORIGIN_ICMP 2
#define SO_EE_ORIGIN_ICMP6 3
+#define SO_EE_ORIGIN_TIMESTAMPING 4
#define SO_EE_OFFENDER(ee) ((struct sockaddr*)((ee)+1))
diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h
new file mode 100644
index 0000000..a3b8546
--- /dev/null
+++ b/include/linux/net_tstamp.h
@@ -0,0 +1,104 @@
+/*
+ * Userspace API for hardware time stamping of network packets
+ *
+ * Copyright (C) 2008,2009 Intel Corporation
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ */
+
+#ifndef _NET_TIMESTAMPING_H
+#define _NET_TIMESTAMPING_H
+
+#include <linux/socket.h> /* for SO_TIMESTAMPING */
+
+/* SO_TIMESTAMPING gets an integer bit field comprised of these values */
+enum {
+ SOF_TIMESTAMPING_TX_HARDWARE = (1<<0),
+ SOF_TIMESTAMPING_TX_SOFTWARE = (1<<1),
+ SOF_TIMESTAMPING_RX_HARDWARE = (1<<2),
+ SOF_TIMESTAMPING_RX_SOFTWARE = (1<<3),
+ SOF_TIMESTAMPING_SOFTWARE = (1<<4),
+ SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5),
+ SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6),
+ SOF_TIMESTAMPING_MASK =
+ (SOF_TIMESTAMPING_RAW_HARDWARE - 1) |
+ SOF_TIMESTAMPING_RAW_HARDWARE
+};
+
+/**
+ * struct hwtstamp_config - %SIOCSHWTSTAMP parameter
+ *
+ * @flags: no flags defined right now, must be zero
+ * @tx_type: one of HWTSTAMP_TX_*
+ * @rx_type: one of one of HWTSTAMP_FILTER_*
+ *
+ * %SIOCSHWTSTAMP expects a &struct ifreq with a ifr_data pointer to
+ * this structure. dev_ifsioc() in the kernel takes care of the
+ * translation between 32 bit userspace and 64 bit kernel. The
+ * structure is intentionally chosen so that it has the same layout on
+ * 32 and 64 bit systems, don't break this!
+ */
+struct hwtstamp_config {
+ int flags;
+ int tx_type;
+ int rx_filter;
+};
+
+/* possible values for hwtstamp_config->tx_type */
+enum {
+ /*
+ * No outgoing packet will need hardware time stamping;
+ * should a packet arrive which asks for it, no hardware
+ * time stamping will be done.
+ */
+ HWTSTAMP_TX_OFF,
+
+ /*
+ * Enables hardware time stamping for outgoing packets;
+ * the sender of the packet decides which are to be
+ * time stamped by setting %SOF_TIMESTAMPING_TX_SOFTWARE
+ * before sending the packet.
+ */
+ HWTSTAMP_TX_ON,
+};
+
+/* possible values for hwtstamp_config->rx_filter */
+enum {
+ /* time stamp no incoming packet at all */
+ HWTSTAMP_FILTER_NONE,
+
+ /* time stamp any incoming packet */
+ HWTSTAMP_FILTER_ALL,
+
+ /* return value: time stamp all packets requested plus some others */
+ HWTSTAMP_FILTER_SOME,
+
+ /* PTP v1, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
+ /* PTP v1, UDP, Sync packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_SYNC,
+ /* PTP v1, UDP, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ,
+ /* PTP v2, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_EVENT,
+ /* PTP v2, UDP, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_SYNC,
+ /* PTP v2, UDP, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ,
+
+ /* 802.AS1, Ethernet, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_EVENT,
+ /* 802.AS1, Ethernet, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_SYNC,
+ /* 802.AS1, Ethernet, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ,
+
+ /* PTP v2/802.AS1, any layer, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_EVENT,
+ /* PTP v2/802.AS1, any layer, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_SYNC,
+ /* PTP v2/802.AS1, any layer, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_DELAY_REQ,
+};
+
+#endif /* _NET_TIMESTAMPING_H */
diff --git a/include/linux/sockios.h b/include/linux/sockios.h
index abef759..241f179 100644
--- a/include/linux/sockios.h
+++ b/include/linux/sockios.h
@@ -122,6 +122,9 @@
#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
#define SIOCBRDELIF 0x89a3 /* remove interface from bridge */
+/* hardware time stamping: parameters in linux/net_tstamp.h */
+#define SIOCSHWTSTAMP 0x89b0
+
/* Device private ioctl calls */
/*
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping
2009-02-12 15:00 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
The additional per-packet information (16 bytes for time stamps, 1
byte for flags) is stored for all packets in the skb_shared_info
struct. This implementation detail is hidden from users of that
information via skb_* accessor functions. A separate struct resp.
union is used for the additional information so that it can be
stored/copied easily outside of skb_shared_info.
Compared to previous implementations (reusing the tstamp field
depending on the context, optional additional structures) this
is the simplest solution. It does not extend sk_buff itself.
TX time stamping is implemented in software if the device driver
doesn't support hardware time stamping.
The new semantic for hardware/software time stamping around
ndo_start_xmit() is based on two assumptions about existing
network device drivers which don't support hardware time
stamping and know nothing about it:
- they leave the new skb_shared_tx unmodified
- the keep the connection to the originating socket in skb->sk
alive, i.e., don't call skb_orphan()
Given that skb_shared_tx is new, the first assumption is safe.
The second is only true for some drivers. As a result, software
TX time stamping currently works with the bnx2 driver, but not
with the unmodified igb driver (the two drivers this patch series
was tested with).
---
include/linux/skbuff.h | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
net/core/dev.c | 32 ++++++++++++++++-
net/core/skbuff.c | 41 +++++++++++++++++++++
3 files changed, 161 insertions(+), 3 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9247008..f96bc91 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -132,6 +132,57 @@ struct skb_frag_struct {
__u32 size;
};
+#define HAVE_HW_TIME_STAMP
+
+/**
+ * skb_shared_hwtstamps - hardware time stamps
+ *
+ * @hwtstamp: hardware time stamp transformed into duration
+ * since arbitrary point in time
+ * @syststamp: hwtstamp transformed to system time base
+ *
+ * Software time stamps generated by ktime_get_real() are stored in
+ * skb->tstamp. The relation between the different kinds of time
+ * stamps is as follows:
+ *
+ * syststamp and tstamp can be compared against each other in
+ * arbitrary combinations. The accuracy of a
+ * syststamp/tstamp/"syststamp from other device" comparison is
+ * limited by the accuracy of the transformation into system time
+ * base. This depends on the device driver and its underlying
+ * hardware.
+ *
+ * hwtstamps can only be compared against other hwtstamps from
+ * the same device.
+ *
+ * This structure is attached to packets as part of the
+ * &skb_shared_info. Use skb_hwtstamps() to get a pointer.
+ */
+struct skb_shared_hwtstamps {
+ ktime_t hwtstamp;
+ ktime_t syststamp;
+};
+
+/**
+ * skb_shared_tx - instructions for time stamping of outgoing packets
+ *
+ * @hardware: generate hardware time stamp
+ * @software: generate software time stamp
+ * @in_progress: device driver is going to provide
+ * hardware time stamp
+ *
+ * These flags are attached to packets as part of the
+ * &skb_shared_info. Use skb_tx() to get a pointer.
+ */
+union skb_shared_tx {
+ struct {
+ __u8 hardware:1,
+ software:1,
+ in_progress:1;
+ };
+ __u8 flags;
+};
+
/* This data is invariant across clones and lives at
* the end of the header data, ie. at skb->end.
*/
@@ -143,10 +194,12 @@ struct skb_shared_info {
unsigned short gso_segs;
unsigned short gso_type;
__be32 ip6_frag_id;
+ union skb_shared_tx tx_flags;
#ifdef CONFIG_HAS_DMA
unsigned int num_dma_maps;
#endif
struct sk_buff *frag_list;
+ struct skb_shared_hwtstamps hwtstamps;
skb_frag_t frags[MAX_SKB_FRAGS];
#ifdef CONFIG_HAS_DMA
dma_addr_t dma_maps[MAX_SKB_FRAGS + 1];
@@ -465,6 +518,16 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb)
/* Internal */
#define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB)))
+static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb)
+{
+ return &skb_shinfo(skb)->hwtstamps;
+}
+
+static inline union skb_shared_tx *skb_tx(struct sk_buff *skb)
+{
+ return &skb_shinfo(skb)->tx_flags;
+}
+
/**
* skb_queue_empty - check if a queue is empty
* @list: queue head
@@ -1730,6 +1793,11 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
extern void skb_init(void);
+static inline ktime_t skb_get_ktime(const struct sk_buff *skb)
+{
+ return skb->tstamp;
+}
+
/**
* skb_get_timestamp - get timestamp from a skb
* @skb: skb to get stamp from
@@ -1739,11 +1807,18 @@ extern void skb_init(void);
* This function converts the offset back to a struct timeval and stores
* it in stamp.
*/
-static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval *stamp)
+static inline void skb_get_timestamp(const struct sk_buff *skb,
+ struct timeval *stamp)
{
*stamp = ktime_to_timeval(skb->tstamp);
}
+static inline void skb_get_timestampns(const struct sk_buff *skb,
+ struct timespec *stamp)
+{
+ *stamp = ktime_to_timespec(skb->tstamp);
+}
+
static inline void __net_timestamp(struct sk_buff *skb)
{
skb->tstamp = ktime_get_real();
@@ -1759,6 +1834,20 @@ static inline ktime_t net_invalid_timestamp(void)
return ktime_set(0, 0);
}
+/**
+ * skb_tstamp_tx - queue clone of skb with send time stamps
+ * @orig_skb: the original outgoing packet
+ * @hwtstamps: hardware time stamps, may be NULL if not available
+ *
+ * If the skb has a socket associated, then this function clones the
+ * skb (thus sharing the actual data and optional structures), stores
+ * the optional hardware time stamping information (if non NULL) or
+ * generates a software time stamp (otherwise), then queues the clone
+ * to the error queue of the socket. Errors are silently ignored.
+ */
+extern void skb_tstamp_tx(struct sk_buff *orig_skb,
+ struct skb_shared_hwtstamps *hwtstamps);
+
extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 1e27a67..d20c28e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1672,10 +1672,21 @@ static int dev_gso_segment(struct sk_buff *skb)
return 0;
}
+static void tstamp_tx(struct sk_buff *skb)
+{
+ union skb_shared_tx *shtx =
+ skb_tx(skb);
+ if (unlikely(shtx->software &&
+ !shtx->in_progress)) {
+ skb_tstamp_tx(skb, NULL);
+ }
+}
+
int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
struct netdev_queue *txq)
{
const struct net_device_ops *ops = dev->netdev_ops;
+ int rc;
prefetch(&dev->netdev_ops->ndo_start_xmit);
if (likely(!skb->next)) {
@@ -1689,13 +1700,29 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
goto gso;
}
- return ops->ndo_start_xmit(skb, dev);
+ rc = ops->ndo_start_xmit(skb, dev);
+ /*
+ * TODO: if skb_orphan() was called by
+ * dev->hard_start_xmit() (for example, the unmodified
+ * igb driver does that; bnx2 doesn't), then
+ * skb_tx_software_timestamp() will be unable to send
+ * back the time stamp.
+ *
+ * How can this be prevented? Always create another
+ * reference to the socket before calling
+ * dev->hard_start_xmit()? Prevent that skb_orphan()
+ * does anything in dev->hard_start_xmit() by clearing
+ * the skb destructor before the call and restoring it
+ * afterwards, then doing the skb_orphan() ourselves?
+ */
+ if (likely(!rc))
+ tstamp_tx(skb);
+ return rc;
}
gso:
do {
struct sk_buff *nskb = skb->next;
- int rc;
skb->next = nskb->next;
nskb->next = NULL;
@@ -1705,6 +1732,7 @@ gso:
skb->next = nskb;
return rc;
}
+ tstamp_tx(skb);
if (unlikely(netif_tx_queue_stopped(txq) && skb->next))
return NETDEV_TX_BUSY;
} while (skb->next);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7657cec..7b831a7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -55,6 +55,7 @@
#include <linux/rtnetlink.h>
#include <linux/init.h>
#include <linux/scatterlist.h>
+#include <linux/errqueue.h>
#include <net/protocol.h>
#include <net/dst.h>
@@ -215,7 +216,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
shinfo->gso_segs = 0;
shinfo->gso_type = 0;
shinfo->ip6_frag_id = 0;
+ shinfo->tx_flags.flags = 0;
shinfo->frag_list = NULL;
+ memset(&shinfo->hwtstamps, 0, sizeof(shinfo->hwtstamps));
if (fclone) {
struct sk_buff *child = skb + 1;
@@ -2940,6 +2943,44 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
}
EXPORT_SYMBOL_GPL(skb_cow_data);
+void skb_tstamp_tx(struct sk_buff *orig_skb,
+ struct skb_shared_hwtstamps *hwtstamps)
+{
+ struct sock *sk = orig_skb->sk;
+ struct sock_exterr_skb *serr;
+ struct sk_buff *skb;
+ int err;
+
+ if (!sk)
+ return;
+
+ skb = skb_clone(orig_skb, GFP_ATOMIC);
+ if (!skb)
+ return;
+
+ if (hwtstamps) {
+ *skb_hwtstamps(skb) =
+ *hwtstamps;
+ } else {
+ /*
+ * no hardware time stamps available,
+ * so keep the skb_shared_tx and only
+ * store software time stamp
+ */
+ skb->tstamp = ktime_get_real();
+ }
+
+ serr = SKB_EXT_ERR(skb);
+ memset(serr, 0, sizeof(*serr));
+ serr->ee.ee_errno = ENOMSG;
+ serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+ err = sock_queue_err_skb(sk, skb);
+ if (err)
+ kfree_skb(skb);
+}
+EXPORT_SYMBOL_GPL(skb_tstamp_tx);
+
+
/**
* skb_partial_csum_set - set up and verify partial csum values for packet
* @skb: the skb to set
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING
2009-02-12 15:00 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
The overlap with the old SO_TIMESTAMP[NS] options is handled so
that time stamping in software (net_enable_timestamp()) is
enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
is set. It's disabled if all of these are off.
---
include/net/sock.h | 44 +++++++++++++++++++++++++--
net/compat.c | 19 +++++++----
net/core/sock.c | 81 ++++++++++++++++++++++++++++++++++++++++++-------
net/socket.c | 84 +++++++++++++++++++++++++++++++++++++++------------
4 files changed, 186 insertions(+), 42 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index c047def..0a54d9b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -158,7 +158,7 @@ struct sock_common {
* @sk_allocation: allocation mode
* @sk_sndbuf: size of send buffer in bytes
* @sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
- * %SO_OOBINLINE settings
+ * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
* @sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets
* @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO)
* @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
@@ -488,6 +488,13 @@ enum sock_flags {
SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+ SOCK_TIMESTAMPING_TX_HARDWARE, /* %SOF_TIMESTAMPING_TX_HARDWARE */
+ SOCK_TIMESTAMPING_TX_SOFTWARE, /* %SOF_TIMESTAMPING_TX_SOFTWARE */
+ SOCK_TIMESTAMPING_RX_HARDWARE, /* %SOF_TIMESTAMPING_RX_HARDWARE */
+ SOCK_TIMESTAMPING_RX_SOFTWARE, /* %SOF_TIMESTAMPING_RX_SOFTWARE */
+ SOCK_TIMESTAMPING_SOFTWARE, /* %SOF_TIMESTAMPING_SOFTWARE */
+ SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */
+ SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */
};
static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -1346,14 +1353,45 @@ static __inline__ void
sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
{
ktime_t kt = skb->tstamp;
+ struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb);
- if (sock_flag(sk, SOCK_RCVTSTAMP))
+ /*
+ * generate control messages if
+ * - receive time stamping in software requested (SOCK_RCVTSTAMP
+ * or SOCK_TIMESTAMPING_RX_SOFTWARE)
+ * - software time stamp available and wanted
+ * (SOCK_TIMESTAMPING_SOFTWARE)
+ * - hardware time stamps available and wanted
+ * (SOCK_TIMESTAMPING_SYS_HARDWARE or
+ * SOCK_TIMESTAMPING_RAW_HARDWARE)
+ */
+ if (sock_flag(sk, SOCK_RCVTSTAMP) ||
+ sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE) ||
+ (kt.tv64 && sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) ||
+ (hwtstamps->hwtstamp.tv64 &&
+ sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) ||
+ (hwtstamps->syststamp.tv64 &&
+ sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE)))
__sock_recv_timestamp(msg, sk, skb);
else
sk->sk_stamp = kt;
}
/**
+ * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
+ * @msg: outgoing packet
+ * @sk: socket sending this packet
+ * @shtx: filled with instructions for time stamping
+ *
+ * Currently only depends on SOCK_TIMESTAMPING* flags. Returns error code if
+ * parameters are invalid.
+ */
+extern int sock_tx_timestamp(struct msghdr *msg,
+ struct sock *sk,
+ union skb_shared_tx *shtx);
+
+
+/**
* sk_eat_skb - Release a skb if it is no longer needed
* @sk: socket to eat this skb from
* @skb: socket buffer to eat
@@ -1421,7 +1459,7 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb)
return NULL;
}
-extern void sock_enable_timestamp(struct sock *sk);
+extern void sock_enable_timestamp(struct sock *sk, int flag);
extern int sock_get_timestamp(struct sock *, struct timeval __user *);
extern int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/net/compat.c b/net/compat.c
index a3a2ba0..8d73905 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -216,7 +216,7 @@ Efault:
int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *data)
{
struct compat_timeval ctv;
- struct compat_timespec cts;
+ struct compat_timespec cts[3];
struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
struct compat_cmsghdr cmhdr;
int cmlen;
@@ -233,12 +233,17 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
data = &ctv;
len = sizeof(ctv);
}
- if (level == SOL_SOCKET && type == SCM_TIMESTAMPNS) {
+ if (level == SOL_SOCKET &&
+ (type == SCM_TIMESTAMPNS || type == SCM_TIMESTAMPING)) {
+ int count = type == SCM_TIMESTAMPNS ? 1 : 3;
+ int i;
struct timespec *ts = (struct timespec *)data;
- cts.tv_sec = ts->tv_sec;
- cts.tv_nsec = ts->tv_nsec;
+ for (i = 0; i < count; i++) {
+ cts[i].tv_sec = ts[i].tv_sec;
+ cts[i].tv_nsec = ts[i].tv_nsec;
+ }
data = &cts;
- len = sizeof(cts);
+ len = sizeof(cts[0]) * count;
}
cmlen = CMSG_COMPAT_LEN(len);
@@ -455,7 +460,7 @@ int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
struct timeval tv;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
tv = ktime_to_timeval(sk->sk_stamp);
if (tv.tv_sec == -1)
return err;
@@ -479,7 +484,7 @@ int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *usersta
struct timespec ts;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
ts = ktime_to_timespec(sk->sk_stamp);
if (ts.tv_sec == -1)
return err;
diff --git a/net/core/sock.c b/net/core/sock.c
index c64996f..d22933a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -120,6 +120,7 @@
#include <net/net_namespace.h>
#include <net/request_sock.h>
#include <net/sock.h>
+#include <linux/net_tstamp.h>
#include <net/xfrm.h>
#include <linux/ipsec.h>
@@ -255,11 +256,14 @@ static void sock_warn_obsolete_bsdism(const char *name)
}
}
-static void sock_disable_timestamp(struct sock *sk)
+static void sock_disable_timestamp(struct sock *sk, int flag)
{
- if (sock_flag(sk, SOCK_TIMESTAMP)) {
- sock_reset_flag(sk, SOCK_TIMESTAMP);
- net_disable_timestamp();
+ if (sock_flag(sk, flag)) {
+ sock_reset_flag(sk, flag);
+ if (!sock_flag(sk, SOCK_TIMESTAMP) &&
+ !sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) {
+ net_disable_timestamp();
+ }
}
}
@@ -614,13 +618,38 @@ set_rcvbuf:
else
sock_set_flag(sk, SOCK_RCVTSTAMPNS);
sock_set_flag(sk, SOCK_RCVTSTAMP);
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
} else {
sock_reset_flag(sk, SOCK_RCVTSTAMP);
sock_reset_flag(sk, SOCK_RCVTSTAMPNS);
}
break;
+ case SO_TIMESTAMPING:
+ if (val & ~SOF_TIMESTAMPING_MASK) {
+ ret = EINVAL;
+ break;
+ }
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE,
+ val & SOF_TIMESTAMPING_TX_HARDWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE,
+ val & SOF_TIMESTAMPING_TX_SOFTWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE,
+ val & SOF_TIMESTAMPING_RX_HARDWARE);
+ if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
+ sock_enable_timestamp(sk,
+ SOCK_TIMESTAMPING_RX_SOFTWARE);
+ else
+ sock_disable_timestamp(sk,
+ SOCK_TIMESTAMPING_RX_SOFTWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_SOFTWARE,
+ val & SOF_TIMESTAMPING_SOFTWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE,
+ val & SOF_TIMESTAMPING_SYS_HARDWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE,
+ val & SOF_TIMESTAMPING_RAW_HARDWARE);
+ break;
+
case SO_RCVLOWAT:
if (val < 0)
val = INT_MAX;
@@ -766,6 +795,24 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
v.val = sock_flag(sk, SOCK_RCVTSTAMPNS);
break;
+ case SO_TIMESTAMPING:
+ v.val = 0;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE))
+ v.val |= SOF_TIMESTAMPING_TX_HARDWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE))
+ v.val |= SOF_TIMESTAMPING_TX_SOFTWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE))
+ v.val |= SOF_TIMESTAMPING_RX_HARDWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE))
+ v.val |= SOF_TIMESTAMPING_RX_SOFTWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE))
+ v.val |= SOF_TIMESTAMPING_SOFTWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE))
+ v.val |= SOF_TIMESTAMPING_SYS_HARDWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE))
+ v.val |= SOF_TIMESTAMPING_RAW_HARDWARE;
+ break;
+
case SO_RCVTIMEO:
lv=sizeof(struct timeval);
if (sk->sk_rcvtimeo == MAX_SCHEDULE_TIMEOUT) {
@@ -967,7 +1014,8 @@ void sk_free(struct sock *sk)
rcu_assign_pointer(sk->sk_filter, NULL);
}
- sock_disable_timestamp(sk);
+ sock_disable_timestamp(sk, SOCK_TIMESTAMP);
+ sock_disable_timestamp(sk, SOCK_TIMESTAMPING_RX_SOFTWARE);
if (atomic_read(&sk->sk_omem_alloc))
printk(KERN_DEBUG "%s: optmem leakage (%d bytes) detected.\n",
@@ -1785,7 +1833,7 @@ int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
{
struct timeval tv;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
tv = ktime_to_timeval(sk->sk_stamp);
if (tv.tv_sec == -1)
return -ENOENT;
@@ -1801,7 +1849,7 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
{
struct timespec ts;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
ts = ktime_to_timespec(sk->sk_stamp);
if (ts.tv_sec == -1)
return -ENOENT;
@@ -1813,11 +1861,20 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
}
EXPORT_SYMBOL(sock_get_timestampns);
-void sock_enable_timestamp(struct sock *sk)
+void sock_enable_timestamp(struct sock *sk, int flag)
{
- if (!sock_flag(sk, SOCK_TIMESTAMP)) {
- sock_set_flag(sk, SOCK_TIMESTAMP);
- net_enable_timestamp();
+ if (!sock_flag(sk, flag)) {
+ sock_set_flag(sk, flag);
+ /*
+ * we just set one of the two flags which require net
+ * time stamping, but time stamping might have been on
+ * already because of the other one
+ */
+ if (!sock_flag(sk,
+ flag == SOCK_TIMESTAMP ?
+ SOCK_TIMESTAMPING_RX_SOFTWARE :
+ SOCK_TIMESTAMP))
+ net_enable_timestamp();
}
}
diff --git a/net/socket.c b/net/socket.c
index 35dd737..47a3dc0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -545,6 +545,18 @@ void sock_release(struct socket *sock)
sock->file = NULL;
}
+int sock_tx_timestamp(struct msghdr *msg, struct sock *sk,
+ union skb_shared_tx *shtx)
+{
+ shtx->flags = 0;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE))
+ shtx->hardware = 1;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE))
+ shtx->software = 1;
+ return 0;
+}
+EXPORT_SYMBOL(sock_tx_timestamp);
+
static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
struct msghdr *msg, size_t size)
{
@@ -595,33 +607,65 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
return result;
}
+static int ktime2ts(ktime_t kt, struct timespec *ts)
+{
+ if (kt.tv64) {
+ *ts = ktime_to_timespec(kt);
+ return 1;
+ } else {
+ return 0;
+ }
+}
+
/*
* called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
*/
void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
struct sk_buff *skb)
{
- ktime_t kt = skb->tstamp;
-
- if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) {
- struct timeval tv;
- /* Race occurred between timestamp enabling and packet
- receiving. Fill in the current time for now. */
- if (kt.tv64 == 0)
- kt = ktime_get_real();
- skb->tstamp = kt;
- tv = ktime_to_timeval(kt);
- put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP, sizeof(tv), &tv);
- } else {
- struct timespec ts;
- /* Race occurred between timestamp enabling and packet
- receiving. Fill in the current time for now. */
- if (kt.tv64 == 0)
- kt = ktime_get_real();
- skb->tstamp = kt;
- ts = ktime_to_timespec(kt);
- put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS, sizeof(ts), &ts);
+ int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP);
+ struct timespec ts[3];
+ int empty = 1;
+ struct skb_shared_hwtstamps *shhwtstamps =
+ skb_hwtstamps(skb);
+
+ /* Race occurred between timestamp enabling and packet
+ receiving. Fill in the current time for now. */
+ if (need_software_tstamp && skb->tstamp.tv64 == 0)
+ __net_timestamp(skb);
+
+ if (need_software_tstamp) {
+ if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+ struct timeval tv;
+ skb_get_timestamp(skb, &tv);
+ put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+ sizeof(tv), &tv);
+ } else {
+ struct timespec ts;
+ skb_get_timestampns(skb, &ts);
+ put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
+ sizeof(ts), &ts);
+ }
+ }
+
+
+ memset(ts, 0, sizeof(ts));
+ if (skb->tstamp.tv64 &&
+ sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) {
+ skb_get_timestampns(skb, ts + 0);
+ empty = 0;
+ }
+ if (shhwtstamps) {
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE) &&
+ ktime2ts(shhwtstamps->syststamp, ts + 1))
+ empty = 0;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE) &&
+ ktime2ts(shhwtstamps->hwtstamp, ts + 2))
+ empty = 0;
}
+ if (!empty)
+ put_cmsg(msg, SOL_SOCKET,
+ SCM_TIMESTAMPING, sizeof(ts), &ts);
}
EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets
2009-02-12 15:00 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.
---
Documentation/networking/timestamping.txt | 2 ++
include/net/ip.h | 1 +
net/can/raw.c | 3 +++
net/ipv4/icmp.c | 2 ++
net/ipv4/ip_output.c | 6 ++++++
net/ipv4/raw.c | 1 +
net/ipv4/udp.c | 4 ++++
7 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index a681a65..0e58b45 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -56,6 +56,8 @@ and including the link layer, the scm_timestamping control message and
a sock_extended_err control message with ee_errno==ENOMSG and
ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending
bounced packet is ready for reading as far as select() is concerned.
+If the outgoing packet has to be fragmented, then only the first
+fragment is time stamped and returned to the sending socket.
All three values correspond to the same event in time, but were
generated in different ways. Each of these values may be empty (= all
diff --git a/include/net/ip.h b/include/net/ip.h
index 1086813..4ac7577 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -55,6 +55,7 @@ struct ipcm_cookie
__be32 addr;
int oif;
struct ip_options *opt;
+ union skb_shared_tx shtx;
};
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/can/raw.c b/net/can/raw.c
index 0703cba..6aa154e 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -648,6 +648,9 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
if (err < 0)
goto free_skb;
+ err = sock_tx_timestamp(msg, sk, skb_tx(skb));
+ if (err < 0)
+ goto free_skb;
skb->dev = dev;
skb->sk = sk;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 705b33b..382800a 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -375,6 +375,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
inet->tos = ip_hdr(skb)->tos;
daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
+ ipc.shtx.flags = 0;
if (icmp_param->replyopts.optlen) {
ipc.opt = &icmp_param->replyopts;
if (ipc.opt->srr)
@@ -532,6 +533,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
inet_sk(sk)->tos = tos;
ipc.addr = iph->saddr;
ipc.opt = &icmp_param.replyopts;
+ ipc.shtx.flags = 0;
{
struct flowi fl = {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8ebe86d..3e7e910 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -935,6 +935,10 @@ alloc_new_skb:
sk->sk_allocation);
if (unlikely(skb == NULL))
err = -ENOBUFS;
+ else
+ /* only the initial fragment is
+ time stamped */
+ ipc->shtx.flags = 0;
}
if (skb == NULL)
goto error;
@@ -945,6 +949,7 @@ alloc_new_skb:
skb->ip_summed = csummode;
skb->csum = 0;
skb_reserve(skb, hh_len);
+ *skb_tx(skb) = ipc->shtx;
/*
* Find where to start putting bytes.
@@ -1364,6 +1369,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
+ ipc.shtx.flags = 0;
if (replyopts.opt.optlen) {
ipc.opt = &replyopts.opt;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index dff8bc4..f774651 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -493,6 +493,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.addr = inet->saddr;
ipc.opt = NULL;
+ ipc.shtx.flags = 0;
ipc.oif = sk->sk_bound_dev_if;
if (msg->msg_controllen) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c47c989..4bd178a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -596,6 +596,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
return -EOPNOTSUPP;
ipc.opt = NULL;
+ ipc.shtx.flags = 0;
if (up->pending) {
/*
@@ -643,6 +644,9 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.addr = inet->saddr;
ipc.oif = sk->sk_bound_dev_if;
+ err = sock_tx_timestamp(msg, sk, &ipc.shtx);
+ if (err)
+ return err;
if (msg->msg_controllen) {
err = ip_cmsg_send(sock_net(sk), msg, &ipc);
if (err)
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers
2009-02-12 15:00 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
---
fs/compat_ioctl.c | 6 ++++++
net/core/dev.c | 2 ++
2 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index c03c10d..ab7585a 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -522,6 +522,11 @@ static int dev_ifsioc(unsigned int fd, unsigned int cmd, unsigned long arg)
if (err)
return -EFAULT;
break;
+ case SIOCSHWTSTAMP:
+ if (copy_from_user(&ifr, uifr32, sizeof(*uifr32)))
+ return -EFAULT;
+ ifr.ifr_data = compat_ptr(uifr32->ifr_ifru.ifru_data);
+ break;
default:
if (copy_from_user(&ifr, uifr32, sizeof(*uifr32)))
return -EFAULT;
@@ -2563,6 +2568,7 @@ HANDLE_IOCTL(SIOCSIFMAP, dev_ifsioc)
HANDLE_IOCTL(SIOCGIFADDR, dev_ifsioc)
HANDLE_IOCTL(SIOCSIFADDR, dev_ifsioc)
HANDLE_IOCTL(SIOCSIFHWBROADCAST, dev_ifsioc)
+HANDLE_IOCTL(SIOCSHWTSTAMP, dev_ifsioc)
/* ioctls used by appletalk ddp.c */
HANDLE_IOCTL(SIOCATALKDIFADDR, dev_ifsioc)
diff --git a/net/core/dev.c b/net/core/dev.c
index d20c28e..d393fc9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4019,6 +4019,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
cmd == SIOCSMIIREG ||
cmd == SIOCBRADDIF ||
cmd == SIOCBRDELIF ||
+ cmd == SIOCSHWTSTAMP ||
cmd == SIOCWANDEV) {
err = -EOPNOTSUPP;
if (ops->ndo_do_ioctl) {
@@ -4173,6 +4174,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
case SIOCBONDCHANGEACTIVE:
case SIOCBRADDIF:
case SIOCBRDELIF:
+ case SIOCSHWTSTAMP:
if (!capable(CAP_NET_ADMIN))
return -EPERM;
/* fall through */
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 08/10] igb: access to NIC time
2009-02-12 15:00 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Adds the register definitions and code to read the time
register.
Signed-off-by: John Ronciak <john.ronciak@intel.com>
---
drivers/net/igb/e1000_regs.h | 28 +++++++++++
drivers/net/igb/igb.h | 4 ++
drivers/net/igb/igb_main.c | 111 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 143 insertions(+), 0 deletions(-)
diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h
index 5038b73..64d95cd 100644
--- a/drivers/net/igb/e1000_regs.h
+++ b/drivers/net/igb/e1000_regs.h
@@ -75,6 +75,34 @@
#define E1000_FCRTH 0x02168 /* Flow Control Receive Threshold High - RW */
#define E1000_RDFPCQ(_n) (0x02430 + (0x4 * (_n)))
#define E1000_FCRTV 0x02460 /* Flow Control Refresh Timer Value - RW */
+
+/* IEEE 1588 TIMESYNCH */
+#define E1000_TSYNCTXCTL 0x0B614
+#define E1000_TSYNCRXCTL 0x0B620
+#define E1000_TSYNCRXCFG 0x05F50
+
+#define E1000_SYSTIML 0x0B600
+#define E1000_SYSTIMH 0x0B604
+#define E1000_TIMINCA 0x0B608
+
+#define E1000_RXMTRL 0x0B634
+#define E1000_RXSTMPL 0x0B624
+#define E1000_RXSTMPH 0x0B628
+#define E1000_RXSATRL 0x0B62C
+#define E1000_RXSATRH 0x0B630
+
+#define E1000_TXSTMPL 0x0B618
+#define E1000_TXSTMPH 0x0B61C
+
+#define E1000_ETQF0 0x05CB0
+#define E1000_ETQF1 0x05CB4
+#define E1000_ETQF2 0x05CB8
+#define E1000_ETQF3 0x05CBC
+#define E1000_ETQF4 0x05CC0
+#define E1000_ETQF5 0x05CC4
+#define E1000_ETQF6 0x05CC8
+#define E1000_ETQF7 0x05CCC
+
/* Split and Replication RX Control - RW */
/*
* Convenience macros
diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index e507449..797a9fe 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -34,6 +34,8 @@
#include "e1000_mac.h"
#include "e1000_82575.h"
+#include <linux/clocksource.h>
+
struct igb_adapter;
/* Interrupt defines */
@@ -251,6 +253,8 @@ struct igb_adapter {
struct napi_struct napi;
struct pci_dev *pdev;
struct net_device_stats net_stats;
+ struct cyclecounter cycles;
+ struct timecounter clock;
/* structs defined in e1000_hw.h */
struct e1000_hw hw;
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index f8c2919..8b2ba42 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -175,6 +175,54 @@ MODULE_DESCRIPTION("Intel(R) Gigabit Ethernet Network Driver");
MODULE_LICENSE("GPL");
MODULE_VERSION(DRV_VERSION);
+/**
+ * Scale the NIC clock cycle by a large factor so that
+ * relatively small clock corrections can be added or
+ * substracted at each clock tick. The drawbacks of a
+ * large factor are a) that the clock register overflows
+ * more quickly (not such a big deal) and b) that the
+ * increment per tick has to fit into 24 bits.
+ *
+ * Note that
+ * TIMINCA = IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS *
+ * IGB_TSYNC_SCALE
+ * TIMINCA += TIMINCA * adjustment [ppm] / 1e9
+ *
+ * The base scale factor is intentionally a power of two
+ * so that the division in %struct timecounter can be done with
+ * a shift.
+ */
+#define IGB_TSYNC_SHIFT (19)
+#define IGB_TSYNC_SCALE (1<<IGB_TSYNC_SHIFT)
+
+/**
+ * The duration of one clock cycle of the NIC.
+ *
+ * @todo This hard-coded value is part of the specification and might change
+ * in future hardware revisions. Add revision check.
+ */
+#define IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS 16
+
+#if (IGB_TSYNC_SCALE * IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS) >= (1<<24)
+# error IGB_TSYNC_SCALE and/or IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS are too large to fit into TIMINCA
+#endif
+
+/**
+ * igb_read_clock - read raw cycle counter (to be used by time counter)
+ */
+static cycle_t igb_read_clock(const struct cyclecounter *tc)
+{
+ struct igb_adapter *adapter =
+ container_of(tc, struct igb_adapter, cycles);
+ struct e1000_hw *hw = &adapter->hw;
+ u64 stamp;
+
+ stamp = rd32(E1000_SYSTIML);
+ stamp |= (u64)rd32(E1000_SYSTIMH) << 32ULL;
+
+ return stamp;
+}
+
#ifdef DEBUG
/**
* igb_get_hw_dev_name - return device name string
@@ -185,6 +233,29 @@ char *igb_get_hw_dev_name(struct e1000_hw *hw)
struct igb_adapter *adapter = hw->back;
return adapter->netdev->name;
}
+
+/**
+ * igb_get_time_str - format current NIC and system time as string
+ */
+static char *igb_get_time_str(struct igb_adapter *adapter,
+ char buffer[160])
+{
+ cycle_t hw = adapter->cycles.read(&adapter->cycles);
+ struct timespec nic = ns_to_timespec(timecounter_read(&adapter->clock));
+ struct timespec sys;
+ struct timespec delta;
+ getnstimeofday(&sys);
+
+ delta = timespec_sub(nic, sys);
+
+ sprintf(buffer,
+ "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ (long)nic.tv_sec, nic.tv_nsec,
+ (long)sys.tv_sec, sys.tv_nsec,
+ (long)delta.tv_sec, delta.tv_nsec);
+
+ return buffer;
+}
#endif
/**
@@ -1298,6 +1369,46 @@ static int __devinit igb_probe(struct pci_dev *pdev,
}
#endif
+ /*
+ * Initialize hardware timer: we keep it running just in case
+ * that some program needs it later on.
+ */
+ memset(&adapter->cycles, 0, sizeof(adapter->cycles));
+ adapter->cycles.read = igb_read_clock;
+ adapter->cycles.mask = CLOCKSOURCE_MASK(64);
+ adapter->cycles.mult = 1;
+ adapter->cycles.shift = IGB_TSYNC_SHIFT;
+ wr32(E1000_TIMINCA,
+ (1<<24) |
+ IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS * IGB_TSYNC_SCALE);
+#if 0
+ /*
+ * Avoid rollover while we initialize by resetting the time counter.
+ */
+ wr32(E1000_SYSTIML, 0x00000000);
+ wr32(E1000_SYSTIMH, 0x00000000);
+#else
+ /*
+ * Set registers so that rollover occurs soon to test this.
+ */
+ wr32(E1000_SYSTIML, 0x00000000);
+ wr32(E1000_SYSTIMH, 0xFF800000);
+#endif
+ wrfl();
+ timecounter_init(&adapter->clock,
+ &adapter->cycles,
+ ktime_to_ns(ktime_get_real()));
+
+#ifdef DEBUG
+ {
+ char buffer[160];
+ printk(KERN_DEBUG
+ "igb: %s: hw %p initialized timer\n",
+ igb_get_time_str(adapter, buffer),
+ &adapter->hw);
+ }
+#endif
+
dev_info(&pdev->dev, "Intel(R) Gigabit Ethernet Network Connection\n");
/* print bus type/speed/width info */
dev_info(&pdev->dev, "%s: (PCIe:%s:%s) %pM\n",
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP
2009-02-12 15:00 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Signed-off-by: John Ronciak <john.ronciak@intel.com>
---
drivers/net/igb/igb_main.c | 31 +++++++++++++++++++++++++++++++
1 files changed, 31 insertions(+), 0 deletions(-)
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 8b2ba42..90090bb 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -34,6 +34,7 @@
#include <linux/ipv6.h>
#include <net/checksum.h>
#include <net/ip6_checksum.h>
+#include <linux/net_tstamp.h>
#include <linux/mii.h>
#include <linux/ethtool.h>
#include <linux/if_vlan.h>
@@ -4182,6 +4183,34 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
}
/**
+ * igb_hwtstamp_ioctl - control hardware time stamping
+ * @netdev:
+ * @ifreq:
+ * @cmd:
+ *
+ * Currently cannot enable any kind of hardware time stamping, but
+ * supports SIOCSHWTSTAMP in general.
+ **/
+static int igb_hwtstamp_ioctl(struct net_device *netdev,
+ struct ifreq *ifr, int cmd)
+{
+ struct hwtstamp_config config;
+
+ if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
+ return -EFAULT;
+
+ /* reserved for future extensions */
+ if (config.flags)
+ return -EINVAL;
+
+ if (config.tx_type == HWTSTAMP_TX_OFF &&
+ config.rx_filter == HWTSTAMP_FILTER_NONE)
+ return 0;
+
+ return -ERANGE;
+}
+
+/**
* igb_ioctl -
* @netdev:
* @ifreq:
@@ -4194,6 +4223,8 @@ static int igb_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
case SIOCGMIIREG:
case SIOCSMIIREG:
return igb_mii_ioctl(netdev, ifr, cmd);
+ case SIOCSHWTSTAMP:
+ return igb_hwtstamp_ioctl(netdev, ifr, cmd);
default:
return -EOPNOTSUPP;
}
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping
2009-02-12 15:00 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly
@ 2009-02-12 15:00 ` Patrick Ohly
0 siblings, 0 replies; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:00 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Both TX and RX hardware time stamping are implemented. Due to
hardware limitations it is not possible to verify reliably which
packet was time stamped when multiple were pending for sending; this
could be solved by only allowing one packet marked for hardware time
stamping into the queue (not implemented yet).
RX time stamping relies on the flag in the packet descriptor which
marks packets that were time stamped. In "all packet" mode this flag
is not set. TODO: also support that mode (even though it'll suffer
from race conditions).
Signed-off-by: John Ronciak <john.ronciak@intel.com>
---
drivers/net/igb/e1000_82575.h | 1 +
drivers/net/igb/e1000_defines.h | 1 +
drivers/net/igb/e1000_regs.h | 40 ++++++
drivers/net/igb/igb.h | 4 +
drivers/net/igb/igb_main.c | 266 +++++++++++++++++++++++++++++++++++++--
5 files changed, 304 insertions(+), 8 deletions(-)
diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h
index dd50237..e613d5a 100644
--- a/drivers/net/igb/e1000_82575.h
+++ b/drivers/net/igb/e1000_82575.h
@@ -116,6 +116,7 @@ union e1000_adv_tx_desc {
};
/* Adv Transmit Descriptor Config Masks */
+#define E1000_ADVTXD_MAC_TSTAMP 0x00080000 /* IEEE1588 Timestamp packet */
#define E1000_ADVTXD_DTYP_CTXT 0x00200000 /* Advanced Context Descriptor */
#define E1000_ADVTXD_DTYP_DATA 0x00300000 /* Advanced Data Descriptor */
#define E1000_ADVTXD_DCMD_IFCS 0x02000000 /* Insert FCS (Ethernet CRC) */
diff --git a/drivers/net/igb/e1000_defines.h b/drivers/net/igb/e1000_defines.h
index 5342e23..79168ee 100644
--- a/drivers/net/igb/e1000_defines.h
+++ b/drivers/net/igb/e1000_defines.h
@@ -104,6 +104,7 @@
#define E1000_RXD_STAT_UDPCS 0x10 /* UDP xsum calculated */
#define E1000_RXD_STAT_TCPCS 0x20 /* TCP xsum calculated */
#define E1000_RXD_STAT_DYNINT 0x800 /* Pkt caused INT via DYNINT */
+#define E1000_RXD_STAT_TS 0x10000 /* Pkt was time stamped */
#define E1000_RXD_ERR_CE 0x01 /* CRC Error */
#define E1000_RXD_ERR_SE 0x02 /* Symbol Error */
#define E1000_RXD_ERR_SEQ 0x04 /* Sequence Error */
diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h
index 64d95cd..1fb19ca 100644
--- a/drivers/net/igb/e1000_regs.h
+++ b/drivers/net/igb/e1000_regs.h
@@ -78,9 +78,37 @@
/* IEEE 1588 TIMESYNCH */
#define E1000_TSYNCTXCTL 0x0B614
+#define E1000_TSYNCTXCTL_VALID (1<<0)
+#define E1000_TSYNCTXCTL_ENABLED (1<<4)
#define E1000_TSYNCRXCTL 0x0B620
+#define E1000_TSYNCRXCTL_VALID (1<<0)
+#define E1000_TSYNCRXCTL_ENABLED (1<<4)
+enum {
+ E1000_TSYNCRXCTL_TYPE_L2_V2 = 0,
+ E1000_TSYNCRXCTL_TYPE_L4_V1 = (1<<1),
+ E1000_TSYNCRXCTL_TYPE_L2_L4_V2 = (1<<2),
+ E1000_TSYNCRXCTL_TYPE_ALL = (1<<3),
+ E1000_TSYNCRXCTL_TYPE_EVENT_V2 = (1<<3) | (1<<1),
+};
#define E1000_TSYNCRXCFG 0x05F50
+enum {
+ E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE = 0<<0,
+ E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE = 1<<0,
+ E1000_TSYNCRXCFG_PTP_V1_FOLLOWUP_MESSAGE = 2<<0,
+ E1000_TSYNCRXCFG_PTP_V1_DELAY_RESP_MESSAGE = 3<<0,
+ E1000_TSYNCRXCFG_PTP_V1_MANAGEMENT_MESSAGE = 4<<0,
+ E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE = 0<<8,
+ E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE = 1<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_REQ_MESSAGE = 2<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_RESP_MESSAGE = 3<<8,
+ E1000_TSYNCRXCFG_PTP_V2_FOLLOWUP_MESSAGE = 8<<8,
+ E1000_TSYNCRXCFG_PTP_V2_DELAY_RESP_MESSAGE = 9<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_FOLLOWUP_MESSAGE = 0xA<<8,
+ E1000_TSYNCRXCFG_PTP_V2_ANNOUNCE_MESSAGE = 0xB<<8,
+ E1000_TSYNCRXCFG_PTP_V2_SIGNALLING_MESSAGE = 0xC<<8,
+ E1000_TSYNCRXCFG_PTP_V2_MANAGEMENT_MESSAGE = 0xD<<8,
+};
#define E1000_SYSTIML 0x0B600
#define E1000_SYSTIMH 0x0B604
#define E1000_TIMINCA 0x0B608
@@ -103,6 +131,18 @@
#define E1000_ETQF6 0x05CC8
#define E1000_ETQF7 0x05CCC
+/* Filtering Registers */
+#define E1000_SAQF(_n) (0x5980 + 4 * (_n))
+#define E1000_DAQF(_n) (0x59A0 + 4 * (_n))
+#define E1000_SPQF(_n) (0x59C0 + 4 * (_n))
+#define E1000_FTQF(_n) (0x59E0 + 4 * (_n))
+#define E1000_SAQF0 E1000_SAQF(0)
+#define E1000_DAQF0 E1000_DAQF(0)
+#define E1000_SPQF0 E1000_SPQF(0)
+#define E1000_FTQF0 E1000_FTQF(0)
+#define E1000_SYNQF(_n) (0x055FC + (4 * (_n))) /* SYN Packet Queue Fltr */
+#define E1000_ETQF(_n) (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */
+
/* Split and Replication RX Control - RW */
/*
* Convenience macros
diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index 797a9fe..bb8c35c 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -35,6 +35,8 @@
#include "e1000_82575.h"
#include <linux/clocksource.h>
+#include <linux/timecompare.h>
+#include <linux/net_tstamp.h>
struct igb_adapter;
@@ -255,6 +257,8 @@ struct igb_adapter {
struct net_device_stats net_stats;
struct cyclecounter cycles;
struct timecounter clock;
+ struct timecompare compare;
+ struct hwtstamp_config hwtstamp_config;
/* structs defined in e1000_hw.h */
struct e1000_hw hw;
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 90090bb..f9d576b 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -250,7 +250,8 @@ static char *igb_get_time_str(struct igb_adapter *adapter,
delta = timespec_sub(nic, sys);
sprintf(buffer,
- "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ "HW %llu, NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ hw,
(long)nic.tv_sec, nic.tv_nsec,
(long)sys.tv_sec, sys.tv_nsec,
(long)delta.tv_sec, delta.tv_nsec);
@@ -1400,6 +1401,18 @@ static int __devinit igb_probe(struct pci_dev *pdev,
&adapter->cycles,
ktime_to_ns(ktime_get_real()));
+ /*
+ * Synchronize our NIC clock against system wall clock. NIC
+ * time stamp reading requires ~3us per sample, each sample
+ * was pretty stable even under load => only require 10
+ * samples for each offset comparison.
+ */
+ memset(&adapter->compare, 0, sizeof(adapter->compare));
+ adapter->compare.source = &adapter->clock;
+ adapter->compare.target = ktime_get_real;
+ adapter->compare.num_samples = 10;
+ timecompare_update(&adapter->compare, 0);
+
#ifdef DEBUG
{
char buffer[160];
@@ -2748,6 +2761,7 @@ set_itr_now:
#define IGB_TX_FLAGS_VLAN 0x00000002
#define IGB_TX_FLAGS_TSO 0x00000004
#define IGB_TX_FLAGS_IPV4 0x00000008
+#define IGB_TX_FLAGS_TSTAMP 0x00000010
#define IGB_TX_FLAGS_VLAN_MASK 0xffff0000
#define IGB_TX_FLAGS_VLAN_SHIFT 16
@@ -2975,6 +2989,9 @@ static inline void igb_tx_queue_adv(struct igb_adapter *adapter,
if (tx_flags & IGB_TX_FLAGS_VLAN)
cmd_type_len |= E1000_ADVTXD_DCMD_VLE;
+ if (tx_flags & IGB_TX_FLAGS_TSTAMP)
+ cmd_type_len |= E1000_ADVTXD_MAC_TSTAMP;
+
if (tx_flags & IGB_TX_FLAGS_TSO) {
cmd_type_len |= E1000_ADVTXD_DCMD_TSE;
@@ -3065,6 +3082,7 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb,
unsigned int tx_flags = 0;
u8 hdr_len = 0;
int tso = 0;
+ union skb_shared_tx *shtx;
if (test_bit(__IGB_DOWN, &adapter->state)) {
dev_kfree_skb_any(skb);
@@ -3085,7 +3103,29 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb,
/* this is a hard error */
return NETDEV_TX_BUSY;
}
- skb_orphan(skb);
+
+ /*
+ * TODO: check that there currently is no other packet with
+ * time stamping in the queue
+ *
+ * When doing time stamping, keep the connection to the socket
+ * a while longer: it is still needed by skb_hwtstamp_tx(),
+ * called either in igb_tx_hwtstamp() or by our caller when
+ * doing software time stamping.
+ */
+ shtx = skb_tx(skb);
+ if (unlikely(shtx->hardware)) {
+ shtx->in_progress = 1;
+ tx_flags |= IGB_TX_FLAGS_TSTAMP;
+ } else if (likely(!shtx->software)) {
+ /*
+ * TODO: can this be solved in dev.c:dev_hard_start_xmit()?
+ * There are probably unmodified driver which do something
+ * like this and thus don't work in combination with
+ * SOF_TIMESTAMPING_TX_SOFTWARE.
+ */
+ skb_orphan(skb);
+ }
if (adapter->vlgrp && vlan_tx_tag_present(skb)) {
tx_flags |= IGB_TX_FLAGS_VLAN;
@@ -3744,6 +3784,43 @@ static int igb_clean_rx_ring_msix(struct napi_struct *napi, int budget)
}
/**
+ * igb_hwtstamp - utility function which checks for TX time stamp
+ * @adapter: board private structure
+ * @skb: packet that was just sent
+ *
+ * If we were asked to do hardware stamping and such a time stamp is
+ * available, then it must have been for this skb here because we only
+ * allow only one such packet into the queue.
+ */
+static void igb_tx_hwtstamp(struct igb_adapter *adapter, struct sk_buff *skb)
+{
+ union skb_shared_tx *shtx = skb_tx(skb);
+ struct e1000_hw *hw = &adapter->hw;
+
+ if (unlikely(shtx->hardware)) {
+ u32 valid = rd32(E1000_TSYNCTXCTL) & E1000_TSYNCTXCTL_VALID;
+ if (valid) {
+ u64 regval = rd32(E1000_TXSTMPL);
+ u64 ns;
+ struct skb_shared_hwtstamps shhwtstamps;
+
+ memset(&shhwtstamps, 0, sizeof(shhwtstamps));
+ regval |= (u64)rd32(E1000_TXSTMPH) << 32;
+ ns = timecounter_cyc2time(&adapter->clock,
+ regval);
+ timecompare_update(&adapter->compare, ns);
+ shhwtstamps.hwtstamp = ns_to_ktime(ns);
+ shhwtstamps.syststamp =
+ timecompare_transform(&adapter->compare, ns);
+ skb_tstamp_tx(skb, &shhwtstamps);
+ }
+
+ /* delayed orphaning: skb_tstamp_tx() needs the socket */
+ skb_orphan(skb);
+ }
+}
+
+/**
* igb_clean_tx_irq - Reclaim resources after transmit completes
* @adapter: board private structure
* returns true if ring is completely cleaned
@@ -3781,6 +3858,8 @@ static bool igb_clean_tx_irq(struct igb_ring *tx_ring)
skb->len;
total_packets += segs;
total_bytes += bytecount;
+
+ igb_tx_hwtstamp(adapter, skb);
}
igb_unmap_and_free_tx_resource(adapter, buffer_info);
@@ -3914,6 +3993,7 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring,
{
struct igb_adapter *adapter = rx_ring->adapter;
struct net_device *netdev = adapter->netdev;
+ struct e1000_hw *hw = &adapter->hw;
struct pci_dev *pdev = adapter->pdev;
union e1000_adv_rx_desc *rx_desc , *next_rxd;
struct igb_buffer *buffer_info , *next_buffer;
@@ -4006,6 +4086,47 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring,
goto next_desc;
}
send_up:
+ /*
+ * If this bit is set, then the RX registers contain
+ * the time stamp. No other packet will be time
+ * stamped until we read these registers, so read the
+ * registers to make them available again. Because
+ * only one packet can be time stamped at a time, we
+ * know that the register values must belong to this
+ * one here and therefore we don't need to compare
+ * any of the additional attributes stored for it.
+ *
+ * If nothing went wrong, then it should have a
+ * skb_shared_tx that we can turn into a
+ * skb_shared_hwtstamps.
+ *
+ * TODO: can time stamping be triggered (thus locking
+ * the registers) without the packet reaching this point
+ * here? In that case RX time stamping would get stuck.
+ *
+ * TODO: in "time stamp all packets" mode this bit is
+ * not set. Need a global flag for this mode and then
+ * always read the registers. Cannot be done without
+ * a race condition.
+ */
+ if (unlikely(staterr & E1000_RXD_STAT_TS)) {
+ u64 regval;
+ u64 ns;
+ struct skb_shared_hwtstamps *shhwtstamps =
+ skb_hwtstamps(skb);
+
+ WARN(!(rd32(E1000_TSYNCRXCTL) & E1000_TSYNCRXCTL_VALID),
+ "igb: no RX time stamp available for time stamped packet");
+ regval = rd32(E1000_RXSTMPL);
+ regval |= (u64)rd32(E1000_RXSTMPH) << 32;
+ ns = timecounter_cyc2time(&adapter->clock, regval);
+ timecompare_update(&adapter->compare, ns);
+ memset(shhwtstamps, 0, sizeof(*shhwtstamps));
+ shhwtstamps->hwtstamp = ns_to_ktime(ns);
+ shhwtstamps->syststamp =
+ timecompare_transform(&adapter->compare, ns);
+ }
+
if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
dev_kfree_skb_irq(skb);
goto next_desc;
@@ -4188,13 +4309,33 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
* @ifreq:
* @cmd:
*
- * Currently cannot enable any kind of hardware time stamping, but
- * supports SIOCSHWTSTAMP in general.
+ * Outgoing time stamping can be enabled and disabled. Play nice and
+ * disable it when requested, although it shouldn't case any overhead
+ * when no packet needs it. At most one packet in the queue may be
+ * marked for time stamping, otherwise it would be impossible to tell
+ * for sure to which packet the hardware time stamp belongs.
+ *
+ * Incoming time stamping has to be configured via the hardware
+ * filters. Not all combinations are supported, in particular event
+ * type has to be specified. Matching the kind of event packet is
+ * not supported, with the exception of "all V2 events regardless of
+ * level 2 or 4".
+ *
**/
static int igb_hwtstamp_ioctl(struct net_device *netdev,
struct ifreq *ifr, int cmd)
{
+ struct igb_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
struct hwtstamp_config config;
+ u32 tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED;
+ u32 tsync_rx_ctl_bit = E1000_TSYNCRXCTL_ENABLED;
+ u32 tsync_rx_ctl_type = 0;
+ u32 tsync_rx_cfg = 0;
+ int is_l4 = 0;
+ int is_l2 = 0;
+ short port = 319; /* PTP */
+ u32 regval;
if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
return -EFAULT;
@@ -4203,11 +4344,120 @@ static int igb_hwtstamp_ioctl(struct net_device *netdev,
if (config.flags)
return -EINVAL;
- if (config.tx_type == HWTSTAMP_TX_OFF &&
- config.rx_filter == HWTSTAMP_FILTER_NONE)
- return 0;
+ switch (config.tx_type) {
+ case HWTSTAMP_TX_OFF:
+ tsync_tx_ctl_bit = 0;
+ break;
+ case HWTSTAMP_TX_ON:
+ tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ switch (config.rx_filter) {
+ case HWTSTAMP_FILTER_NONE:
+ tsync_rx_ctl_bit = 0;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+ case HWTSTAMP_FILTER_ALL:
+ /*
+ * register TSYNCRXCFG must be set, therefore it is not
+ * possible to time stamp both Sync and Delay_Req messages
+ * => fall back to time stamping all packets
+ */
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_ALL;
+ config.rx_filter = HWTSTAMP_FILTER_ALL;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE;
+ is_l4 = 1;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE;
+ is_l4 = 1;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE;
+ is_l2 = 1;
+ is_l4 = 1;
+ config.rx_filter = HWTSTAMP_FILTER_SOME;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+ case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE;
+ is_l2 = 1;
+ is_l4 = 1;
+ config.rx_filter = HWTSTAMP_FILTER_SOME;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_EVENT_V2;
+ config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
+ is_l2 = 1;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ /* enable/disable TX */
+ regval = rd32(E1000_TSYNCTXCTL);
+ regval = (regval & ~E1000_TSYNCTXCTL_ENABLED) | tsync_tx_ctl_bit;
+ wr32(E1000_TSYNCTXCTL, regval);
+
+ /* enable/disable RX, define which PTP packets are time stamped */
+ regval = rd32(E1000_TSYNCRXCTL);
+ regval = (regval & ~E1000_TSYNCRXCTL_ENABLED) | tsync_rx_ctl_bit;
+ regval = (regval & ~0xE) | tsync_rx_ctl_type;
+ wr32(E1000_TSYNCRXCTL, regval);
+ wr32(E1000_TSYNCRXCFG, tsync_rx_cfg);
+
+ /*
+ * Ethertype Filter Queue Filter[0][15:0] = 0x88F7
+ * (Ethertype to filter on)
+ * Ethertype Filter Queue Filter[0][26] = 0x1 (Enable filter)
+ * Ethertype Filter Queue Filter[0][30] = 0x1 (Enable Timestamping)
+ */
+ wr32(E1000_ETQF0, is_l2 ? 0x440088f7 : 0);
+
+ /* L4 Queue Filter[0]: only filter by source and destination port */
+ wr32(E1000_SPQF0, htons(port));
+ wr32(E1000_IMIREXT(0), is_l4 ?
+ ((1<<12) | (1<<19) /* bypass size and control flags */) : 0);
+ wr32(E1000_IMIR(0), is_l4 ?
+ (htons(port)
+ | (0<<16) /* immediate interrupt disabled */
+ | 0 /* (1<<17) bit cleared: do not bypass
+ destination port check */)
+ : 0);
+ wr32(E1000_FTQF0, is_l4 ?
+ (0x11 /* UDP */
+ | (1<<15) /* VF not compared */
+ | (1<<27) /* Enable Timestamping */
+ | (7<<28) /* only source port filter enabled,
+ source/target address and protocol
+ masked */)
+ : ((1<<15) | (15<<28) /* all mask bits set = filter not
+ enabled */));
+
+ wrfl();
+
+ adapter->hwtstamp_config = config;
+
+ /* clear TX/RX time stamp registers, just to be sure */
+ regval = rd32(E1000_TXSTMPH);
+ regval = rd32(E1000_RXSTMPH);
- return -ERANGE;
+ return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ?
+ -EFAULT : 0;
}
/**
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo
2009-02-12 14:57 [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
2009-02-16 7:16 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo David Miller
1 sibling, 2 replies; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: John Stultz, Ronciak, John, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
[once more, with signed-off - sorry for the noise]
This revision of the patch series transports hardware time stamps from
the hardware into user space via additional fields in struct
skb_shared_info. It's based on net-next-2.6 as of this morning.
This is the solution that emerged from the discussion of the other
approaches suggested before (reuse tstamp, extend skbuff, optional
structs): it has the advantage of not touching skbuff and also has the
simplest implementation.
The clocksource and timecompare patches have been reviewed by John
Stultz. He doesn't mind merging them via the net subtree.
John Ronciak reviewed the igb driver patches. He suggested to merge the
patches as they; after all, PTPd already works fine. I just tested again
on 32 and 64 bit x86, both with the timestamping example programs as
well as with PTPd. Latest patched PTPd is here:
http://github.com/pohly/ptpd/tree/master
The open TODOs in the igb driver will be fixed. We think that this will
be easier with the infrastructure and the driver in a regular kernel
tree.
David, I hope you consider the patches acceptable now. If not, then as
before: please let me know what I can do to make them acceptable.
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c
2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly
2009-02-16 7:16 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo David Miller
1 sibling, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
So far struct clocksource acted as the interface between time/timekeeping.c
and hardware. This patch generalizes the concept so that a similar
interface can also be used in other contexts. For that it introduces
new structures and related functions *without* touching the existing
struct clocksource.
The reasons for adding these new structures to clocksource.[ch] are
* the APIs are clearly related
* struct clocksource could be cleaned up to use the new structs
* avoids proliferation of files with similar names (timesource.h?
timecounter.h?)
As outlined in the discussion with John Stultz, this patch adds
* struct cyclecounter: stateless API to hardware which counts clock cycles
* struct timecounter: stateful utility code built on a cyclecounter which
provides a nanosecond counter
* only the function to read the nanosecond counter; deltas are used internally
and not exposed to users of timecounter
The code does no locking of the shared state. It must be called at least
as often as the cycle counter wraps around to detect these wrap arounds.
Both is the responsibility of the timecounter user.
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
include/linux/clocksource.h | 101 +++++++++++++++++++++++++++++++++++++++++++
kernel/time/clocksource.c | 76 ++++++++++++++++++++++++++++++++
2 files changed, 177 insertions(+), 0 deletions(-)
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index f88d32f..573819e 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -22,8 +22,109 @@ typedef u64 cycle_t;
struct clocksource;
/**
+ * struct cyclecounter - hardware abstraction for a free running counter
+ * Provides completely state-free accessors to the underlying hardware.
+ * Depending on which hardware it reads, the cycle counter may wrap
+ * around quickly. Locking rules (if necessary) have to be defined
+ * by the implementor and user of specific instances of this API.
+ *
+ * @read: returns the current cycle value
+ * @mask: bitmask for two's complement
+ * subtraction of non 64 bit counters,
+ * see CLOCKSOURCE_MASK() helper macro
+ * @mult: cycle to nanosecond multiplier
+ * @shift: cycle to nanosecond divisor (power of two)
+ */
+struct cyclecounter {
+ cycle_t (*read)(const struct cyclecounter *cc);
+ cycle_t mask;
+ u32 mult;
+ u32 shift;
+};
+
+/**
+ * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds
+ * Contains the state needed by timecounter_read() to detect
+ * cycle counter wrap around. Initialize with
+ * timecounter_init(). Also used to convert cycle counts into the
+ * corresponding nanosecond counts with timecounter_cyc2time(). Users
+ * of this code are responsible for initializing the underlying
+ * cycle counter hardware, locking issues and reading the time
+ * more often than the cycle counter wraps around. The nanosecond
+ * counter will only wrap around after ~585 years.
+ *
+ * @cc: the cycle counter used by this instance
+ * @cycle_last: most recent cycle counter value seen by
+ * timecounter_read()
+ * @nsec: continuously increasing count
+ */
+struct timecounter {
+ const struct cyclecounter *cc;
+ cycle_t cycle_last;
+ u64 nsec;
+};
+
+/**
+ * cyclecounter_cyc2ns - converts cycle counter cycles to nanoseconds
+ * @tc: Pointer to cycle counter.
+ * @cycles: Cycles
+ *
+ * XXX - This could use some mult_lxl_ll() asm optimization. Same code
+ * as in cyc2ns, but with unsigned result.
+ */
+static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc,
+ cycle_t cycles)
+{
+ u64 ret = (u64)cycles;
+ ret = (ret * cc->mult) >> cc->shift;
+ return ret;
+}
+
+/**
+ * timecounter_init - initialize a time counter
+ * @tc: Pointer to time counter which is to be initialized/reset
+ * @cc: A cycle counter, ready to be used.
+ * @start_tstamp: Arbitrary initial time stamp.
+ *
+ * After this call the current cycle register (roughly) corresponds to
+ * the initial time stamp. Every call to timecounter_read() increments
+ * the time stamp counter by the number of elapsed nanoseconds.
+ */
+extern void timecounter_init(struct timecounter *tc,
+ const struct cyclecounter *cc,
+ u64 start_tstamp);
+
+/**
+ * timecounter_read - return nanoseconds elapsed since timecounter_init()
+ * plus the initial time stamp
+ * @tc: Pointer to time counter.
+ *
+ * In other words, keeps track of time since the same epoch as
+ * the function which generated the initial time stamp.
+ */
+extern u64 timecounter_read(struct timecounter *tc);
+
+/**
+ * timecounter_cyc2time - convert a cycle counter to same
+ * time base as values returned by
+ * timecounter_read()
+ * @tc: Pointer to time counter.
+ * @cycle: a value returned by tc->cc->read()
+ *
+ * Cycle counts that are converted correctly as long as they
+ * fall into the interval [-1/2 max cycle count, +1/2 max cycle count],
+ * with "max cycle count" == cs->mask+1.
+ *
+ * This allows conversion of cycle counter values which were generated
+ * in the past.
+ */
+extern u64 timecounter_cyc2time(struct timecounter *tc,
+ cycle_t cycle_tstamp);
+
+/**
* struct clocksource - hardware abstraction for a free running counter
* Provides mostly state-free accessors to the underlying hardware.
+ * This is the structure used for system time.
*
* @name: ptr to clocksource name
* @list: list head for registration
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ca89e15..c46c931 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -31,6 +31,82 @@
#include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
#include <linux/tick.h>
+void timecounter_init(struct timecounter *tc,
+ const struct cyclecounter *cc,
+ u64 start_tstamp)
+{
+ tc->cc = cc;
+ tc->cycle_last = cc->read(cc);
+ tc->nsec = start_tstamp;
+}
+EXPORT_SYMBOL(timecounter_init);
+
+/**
+ * timecounter_read_delta - get nanoseconds since last call of this function
+ * @tc: Pointer to time counter
+ *
+ * When the underlying cycle counter runs over, this will be handled
+ * correctly as long as it does not run over more than once between
+ * calls.
+ *
+ * The first call to this function for a new time counter initializes
+ * the time tracking and returns an undefined result.
+ */
+static u64 timecounter_read_delta(struct timecounter *tc)
+{
+ cycle_t cycle_now, cycle_delta;
+ u64 ns_offset;
+
+ /* read cycle counter: */
+ cycle_now = tc->cc->read(tc->cc);
+
+ /* calculate the delta since the last timecounter_read_delta(): */
+ cycle_delta = (cycle_now - tc->cycle_last) & tc->cc->mask;
+
+ /* convert to nanoseconds: */
+ ns_offset = cyclecounter_cyc2ns(tc->cc, cycle_delta);
+
+ /* update time stamp of timecounter_read_delta() call: */
+ tc->cycle_last = cycle_now;
+
+ return ns_offset;
+}
+
+u64 timecounter_read(struct timecounter *tc)
+{
+ u64 nsec;
+
+ /* increment time by nanoseconds since last call */
+ nsec = timecounter_read_delta(tc);
+ nsec += tc->nsec;
+ tc->nsec = nsec;
+
+ return nsec;
+}
+EXPORT_SYMBOL(timecounter_read);
+
+u64 timecounter_cyc2time(struct timecounter *tc,
+ cycle_t cycle_tstamp)
+{
+ u64 cycle_delta = (cycle_tstamp - tc->cycle_last) & tc->cc->mask;
+ u64 nsec;
+
+ /*
+ * Instead of always treating cycle_tstamp as more recent
+ * than tc->cycle_last, detect when it is too far in the
+ * future and treat it as old time stamp instead.
+ */
+ if (cycle_delta > tc->cc->mask / 2) {
+ cycle_delta = (tc->cycle_last - cycle_tstamp) & tc->cc->mask;
+ nsec = tc->nsec - cyclecounter_cyc2ns(tc->cc, cycle_delta);
+ } else {
+ nsec = cyclecounter_cyc2ns(tc->cc, cycle_delta) + tc->nsec;
+ }
+
+ return nsec;
+}
+EXPORT_SYMBOL(timecounter_cyc2time);
+
/* XXX - Would like a better way for initializing curr_clocksource */
extern struct clocksource clocksource_jiffies;
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases
2009-02-12 15:03 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Mapping from a struct timecounter to a time returned by functions like
ktime_get_real() is implemented. This is sufficient to use this code
in a network device driver which wants to support hardware time
stamping and transformation of hardware time stamps to system time.
The interface could have been made more versatile by not depending on
a time counter, but this wasn't done to avoid writing glue code
elsewhere.
The method implemented here is the one used and analyzed under the name
"assisted PTP" in the LCI PTP paper:
http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
include/linux/timecompare.h | 125 ++++++++++++++++++++++++++++
kernel/time/Makefile | 2 +-
kernel/time/timecompare.c | 191 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 317 insertions(+), 1 deletions(-)
create mode 100644 include/linux/timecompare.h
create mode 100644 kernel/time/timecompare.c
diff --git a/include/linux/timecompare.h b/include/linux/timecompare.h
new file mode 100644
index 0000000..546e223
--- /dev/null
+++ b/include/linux/timecompare.h
@@ -0,0 +1,125 @@
+/*
+ * Utility code which helps transforming between two different time
+ * bases, called "source" and "target" time in this code.
+ *
+ * Source time has to be provided via the timecounter API while target
+ * time is accessed via a function callback whose prototype
+ * intentionally matches ktime_get() and ktime_get_real(). These
+ * interfaces where chosen like this so that the code serves its
+ * initial purpose without additional glue code.
+ *
+ * This purpose is synchronizing a hardware clock in a NIC with system
+ * time, in order to implement the Precision Time Protocol (PTP,
+ * IEEE1588) with more accurate hardware assisted time stamping. In
+ * that context only synchronization against system time (=
+ * ktime_get_real()) is currently needed. But this utility code might
+ * become useful in other situations, which is why it was written as
+ * general purpose utility code.
+ *
+ * The source timecounter is assumed to return monotonically
+ * increasing time (but this code does its best to compensate if that
+ * is not the case) whereas target time may jump.
+ *
+ * The target time corresponding to a source time is determined by
+ * reading target time, reading source time, reading target time
+ * again, then assuming that average target time corresponds to source
+ * time. In other words, the assumption is that reading the source
+ * time is slow and involves equal time for sending the request and
+ * receiving the reply, whereas reading target time is assumed to be
+ * fast.
+ *
+ * Copyright (C) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+#ifndef _LINUX_TIMECOMPARE_H
+#define _LINUX_TIMECOMPARE_H
+
+#include <linux/clocksource.h>
+#include <linux/ktime.h>
+
+/**
+ * struct timecompare - stores state and configuration for the two clocks
+ *
+ * Initialize to zero, then set source/target/num_samples.
+ *
+ * Transformation between source time and target time is done with:
+ * target_time = source_time + offset +
+ * (source_time - last_update) * skew /
+ * TIMECOMPARE_SKEW_RESOLUTION
+ *
+ * @source: used to get source time stamps via timecounter_read()
+ * @target: function returning target time (for example, ktime_get
+ * for monotonic time, or ktime_get_real for wall clock)
+ * @num_samples: number of times that source time and target time are to
+ * be compared when determining their offset
+ * @offset: (target time - source time) at the time of the last update
+ * @skew: average (target time - source time) / delta source time *
+ * TIMECOMPARE_SKEW_RESOLUTION
+ * @last_update: last source time stamp when time offset was measured
+ */
+struct timecompare {
+ struct timecounter *source;
+ ktime_t (*target)(void);
+ int num_samples;
+
+ s64 offset;
+ s64 skew;
+ u64 last_update;
+};
+
+/**
+ * timecompare_transform - transform source time stamp into target time base
+ * @sync: context for time sync
+ * @source_tstamp: the result of timecounter_read() or
+ * timecounter_cyc2time()
+ */
+extern ktime_t timecompare_transform(struct timecompare *sync,
+ u64 source_tstamp);
+
+/**
+ * timecompare_offset - measure current (target time - source time) offset
+ * @sync: context for time sync
+ * @offset: average offset during sample period returned here
+ * @source_tstamp: average source time during sample period returned here
+ *
+ * Returns number of samples used. Might be zero (= no result) in the
+ * unlikely case that target time was monotonically decreasing for all
+ * samples (= broken).
+ */
+extern int timecompare_offset(struct timecompare *sync,
+ s64 *offset,
+ u64 *source_tstamp);
+
+extern void __timecompare_update(struct timecompare *sync,
+ u64 source_tstamp);
+
+/**
+ * timecompare_update - update offset and skew by measuring current offset
+ * @sync: context for time sync
+ * @source_tstamp: the result of timecounter_read() or
+ * timecounter_cyc2time(), pass zero to force update
+ *
+ * Updates are only done at most once per second.
+ */
+static inline void timecompare_update(struct timecompare *sync,
+ u64 source_tstamp)
+{
+ if (!source_tstamp ||
+ (s64)(source_tstamp - sync->last_update) >= NSEC_PER_SEC)
+ __timecompare_update(sync, source_tstamp);
+}
+
+#endif /* _LINUX_TIMECOMPARE_H */
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index 905b0b5..0b0a636 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -1,4 +1,4 @@
-obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o
+obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o timecompare.o
obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD) += clockevents.o
obj-$(CONFIG_GENERIC_CLOCKEVENTS) += tick-common.o
diff --git a/kernel/time/timecompare.c b/kernel/time/timecompare.c
new file mode 100644
index 0000000..71e7f1a
--- /dev/null
+++ b/kernel/time/timecompare.c
@@ -0,0 +1,191 @@
+/*
+ * Copyright (C) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/timecompare.h>
+#include <linux/module.h>
+#include <linux/math64.h>
+
+/*
+ * fixed point arithmetic scale factor for skew
+ *
+ * Usually one would measure skew in ppb (parts per billion, 1e9), but
+ * using a factor of 2 simplifies the math.
+ */
+#define TIMECOMPARE_SKEW_RESOLUTION (((s64)1)<<30)
+
+ktime_t timecompare_transform(struct timecompare *sync,
+ u64 source_tstamp)
+{
+ u64 nsec;
+
+ nsec = source_tstamp + sync->offset;
+ nsec += (s64)(source_tstamp - sync->last_update) * sync->skew /
+ TIMECOMPARE_SKEW_RESOLUTION;
+
+ return ns_to_ktime(nsec);
+}
+EXPORT_SYMBOL(timecompare_transform);
+
+int timecompare_offset(struct timecompare *sync,
+ s64 *offset,
+ u64 *source_tstamp)
+{
+ u64 start_source = 0, end_source = 0;
+ struct {
+ s64 offset;
+ s64 duration_target;
+ } buffer[10], sample, *samples;
+ int counter = 0, i;
+ int used;
+ int index;
+ int num_samples = sync->num_samples;
+
+ if (num_samples > sizeof(buffer)/sizeof(buffer[0])) {
+ samples = kmalloc(sizeof(*samples) * num_samples, GFP_ATOMIC);
+ if (!samples) {
+ samples = buffer;
+ num_samples = sizeof(buffer)/sizeof(buffer[0]);
+ }
+ } else {
+ samples = buffer;
+ }
+
+ /* run until we have enough valid samples, but do not try forever */
+ i = 0;
+ counter = 0;
+ while (1) {
+ u64 ts;
+ ktime_t start, end;
+
+ start = sync->target();
+ ts = timecounter_read(sync->source);
+ end = sync->target();
+
+ if (!i)
+ start_source = ts;
+
+ /* ignore negative durations */
+ sample.duration_target = ktime_to_ns(ktime_sub(end, start));
+ if (sample.duration_target >= 0) {
+ /*
+ * assume symetric delay to and from source:
+ * average target time corresponds to measured
+ * source time
+ */
+ sample.offset =
+ ktime_to_ns(ktime_add(end, start)) / 2 -
+ ts;
+
+ /* simple insertion sort based on duration */
+ index = counter - 1;
+ while (index >= 0) {
+ if (samples[index].duration_target <
+ sample.duration_target)
+ break;
+ samples[index + 1] = samples[index];
+ index--;
+ }
+ samples[index + 1] = sample;
+ counter++;
+ }
+
+ i++;
+ if (counter >= num_samples || i >= 100000) {
+ end_source = ts;
+ break;
+ }
+ }
+
+ *source_tstamp = (end_source + start_source) / 2;
+
+ /* remove outliers by only using 75% of the samples */
+ used = counter * 3 / 4;
+ if (!used)
+ used = counter;
+ if (used) {
+ /* calculate average */
+ s64 off = 0;
+ for (index = 0; index < used; index++)
+ off += samples[index].offset;
+ *offset = div_s64(off, used);
+ }
+
+ if (samples && samples != buffer)
+ kfree(samples);
+
+ return used;
+}
+EXPORT_SYMBOL(timecompare_offset);
+
+void __timecompare_update(struct timecompare *sync,
+ u64 source_tstamp)
+{
+ s64 offset;
+ u64 average_time;
+
+ if (!timecompare_offset(sync, &offset, &average_time))
+ return;
+
+ if (!sync->last_update) {
+ sync->last_update = average_time;
+ sync->offset = offset;
+ sync->skew = 0;
+ } else {
+ s64 delta_nsec = average_time - sync->last_update;
+
+ /* avoid division by negative or small deltas */
+ if (delta_nsec >= 10000) {
+ s64 delta_offset_nsec = offset - sync->offset;
+ s64 skew; /* delta_offset_nsec *
+ TIMECOMPARE_SKEW_RESOLUTION /
+ delta_nsec */
+ u64 divisor;
+
+ /* div_s64() is limited to 32 bit divisor */
+ skew = delta_offset_nsec * TIMECOMPARE_SKEW_RESOLUTION;
+ divisor = delta_nsec;
+ while (unlikely(divisor >= ((s64)1) << 32)) {
+ /* divide both by 2; beware, right shift
+ of negative value has undefined
+ behavior and can only be used for
+ the positive divisor */
+ skew = div_s64(skew, 2);
+ divisor >>= 1;
+ }
+ skew = div_s64(skew, divisor);
+
+ /*
+ * Calculate new overall skew as 4/16 the
+ * old value and 12/16 the new one. This is
+ * a rather arbitrary tradeoff between
+ * only using the latest measurement (0/16 and
+ * 16/16) and even more weight on past measurements.
+ */
+#define TIMECOMPARE_NEW_SKEW_PER_16 12
+ sync->skew =
+ div_s64((16 - TIMECOMPARE_NEW_SKEW_PER_16) *
+ sync->skew +
+ TIMECOMPARE_NEW_SKEW_PER_16 * skew,
+ 16);
+ sync->last_update = average_time;
+ sync->offset = offset;
+ }
+ }
+}
+EXPORT_SYMBOL(__timecompare_update);
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets
2009-02-12 15:03 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
User space can request hardware and/or software time stamping.
Reporting of the result(s) via a new control message is enabled
separately for each field in the message because some of the
fields may require additional computation and thus cause overhead.
User space can tell the different kinds of time stamps apart
and choose what suits its needs.
When a TX timestamp operation is requested, the TX skb will be cloned
and the clone will be time stamped (in hardware or software) and added
to the socket error queue of the skb, if the skb has a socket
associated with it.
The actual TX timestamp will reach userspace as a RX timestamp on the
cloned packet. If timestamping is requested and no timestamping is
done in the device driver (potentially this may use hardware
timestamping), it will be done in software after the device's
start_hard_xmit routine.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
Documentation/networking/timestamping.txt | 178 +++++++
Documentation/networking/timestamping/.gitignore | 1 +
Documentation/networking/timestamping/Makefile | 6 +
.../networking/timestamping/timestamping.c | 533 ++++++++++++++++++++
arch/alpha/include/asm/socket.h | 3 +
arch/arm/include/asm/socket.h | 3 +
arch/avr32/include/asm/socket.h | 3 +
arch/blackfin/include/asm/socket.h | 3 +
arch/cris/include/asm/socket.h | 3 +
arch/h8300/include/asm/socket.h | 3 +
arch/ia64/include/asm/socket.h | 3 +
arch/m68k/include/asm/socket.h | 3 +
arch/mips/include/asm/socket.h | 3 +
arch/parisc/include/asm/socket.h | 3 +
arch/powerpc/include/asm/socket.h | 3 +
arch/s390/include/asm/socket.h | 3 +
arch/sh/include/asm/socket.h | 3 +
arch/sparc/include/asm/socket.h | 3 +
arch/x86/include/asm/socket.h | 3 +
arch/xtensa/include/asm/socket.h | 3 +
include/asm-frv/socket.h | 3 +
include/asm-m32r/socket.h | 3 +
include/asm-mn10300/socket.h | 3 +
include/linux/errqueue.h | 1 +
include/linux/net_tstamp.h | 104 ++++
include/linux/sockios.h | 3 +
26 files changed, 883 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/timestamping.txt
create mode 100644 Documentation/networking/timestamping/.gitignore
create mode 100644 Documentation/networking/timestamping/Makefile
create mode 100644 Documentation/networking/timestamping/timestamping.c
create mode 100644 include/linux/net_tstamp.h
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
new file mode 100644
index 0000000..a681a65
--- /dev/null
+++ b/Documentation/networking/timestamping.txt
@@ -0,0 +1,178 @@
+The existing interfaces for getting network packages time stamped are:
+
+* SO_TIMESTAMP
+ Generate time stamp for each incoming packet using the (not necessarily
+ monotonous!) system time. Result is returned via recv_msg() in a
+ control message as timeval (usec resolution).
+
+* SO_TIMESTAMPNS
+ Same time stamping mechanism as SO_TIMESTAMP, but returns result as
+ timespec (nsec resolution).
+
+* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
+ Only for multicasts: approximate send time stamp by receiving the looped
+ packet and using its receive time stamp.
+
+The following interface complements the existing ones: receive time
+stamps can be generated and returned for arbitrary packets and much
+closer to the point where the packet is really sent. Time stamps can
+be generated in software (as before) or in hardware (if the hardware
+has such a feature).
+
+SO_TIMESTAMPING:
+
+Instructs the socket layer which kind of information is wanted. The
+parameter is an integer with some of the following bits set. Setting
+other bits is an error and doesn't change the current state.
+
+SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamp in hardware
+SOF_TIMESTAMPING_TX_SOFTWARE: if SOF_TIMESTAMPING_TX_HARDWARE is off or
+ fails, then do it in software
+SOF_TIMESTAMPING_RX_HARDWARE: return the original, unmodified time stamp
+ as generated by the hardware
+SOF_TIMESTAMPING_RX_SOFTWARE: if SOF_TIMESTAMPING_RX_HARDWARE is off or
+ fails, then do it in software
+SOF_TIMESTAMPING_RAW_HARDWARE: return original raw hardware time stamp
+SOF_TIMESTAMPING_SYS_HARDWARE: return hardware time stamp transformed to
+ the system time base
+SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in
+ software
+
+SOF_TIMESTAMPING_TX/RX determine how time stamps are generated.
+SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the
+following control message:
+ struct scm_timestamping {
+ struct timespec systime;
+ struct timespec hwtimetrans;
+ struct timespec hwtimeraw;
+ };
+
+recvmsg() can be used to get this control message for regular incoming
+packets. For send time stamps the outgoing packet is looped back to
+the socket's error queue with the send time stamp(s) attached. It can
+be received with recvmsg(flags=MSG_ERRQUEUE). The call returns the
+original outgoing packet data including all headers preprended down to
+and including the link layer, the scm_timestamping control message and
+a sock_extended_err control message with ee_errno==ENOMSG and
+ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending
+bounced packet is ready for reading as far as select() is concerned.
+
+All three values correspond to the same event in time, but were
+generated in different ways. Each of these values may be empty (= all
+zero), in which case no such value was available. If the application
+is not interested in some of these values, they can be left blank to
+avoid the potential overhead of calculating them.
+
+systime is the value of the system time at that moment. This
+corresponds to the value also returned via SO_TIMESTAMP[NS]. If the
+time stamp was generated by hardware, then this field is
+empty. Otherwise it is filled in if SOF_TIMESTAMPING_SOFTWARE is
+set.
+
+hwtimeraw is the original hardware time stamp. Filled in if
+SOF_TIMESTAMPING_RAW_HARDWARE is set. No assumptions about its
+relation to system time should be made.
+
+hwtimetrans is the hardware time stamp transformed so that it
+corresponds as good as possible to system time. This correlation is
+not perfect; as a consequence, sorting packets received via different
+NICs by their hwtimetrans may differ from the order in which they were
+received. hwtimetrans may be non-monotonic even for the same NIC.
+Filled in if SOF_TIMESTAMPING_SYS_HARDWARE is set. Requires support
+by the network device and will be empty without that support.
+
+
+SIOCSHWTSTAMP:
+
+Hardware time stamping must also be initialized for each device driver
+that is expected to do hardware time stamping. The parameter is:
+
+struct hwtstamp_config {
+ int flags; /* no flags defined right now, must be zero */
+ int tx_type; /* HWTSTAMP_TX_* */
+ int rx_filter; /* HWTSTAMP_FILTER_* */
+};
+
+Desired behavior is passed into the kernel and to a specific device by
+calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose
+ifr_data points to a struct hwtstamp_config. The tx_type and
+rx_filter are hints to the driver what it is expected to do. If
+the requested fine-grained filtering for incoming packets is not
+supported, the driver may time stamp more than just the requested types
+of packets.
+
+A driver which supports hardware time stamping shall update the struct
+with the actual, possibly more permissive configuration. If the
+requested packets cannot be time stamped, then nothing should be
+changed and ERANGE shall be returned (in contrast to EINVAL, which
+indicates that SIOCSHWTSTAMP is not supported at all).
+
+Only a processes with admin rights may change the configuration. User
+space is responsible to ensure that multiple processes don't interfere
+with each other and that the settings are reset.
+
+/* possible values for hwtstamp_config->tx_type */
+enum {
+ /*
+ * no outgoing packet will need hardware time stamping;
+ * should a packet arrive which asks for it, no hardware
+ * time stamping will be done
+ */
+ HWTSTAMP_TX_OFF,
+
+ /*
+ * enables hardware time stamping for outgoing packets;
+ * the sender of the packet decides which are to be
+ * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
+ * before sending the packet
+ */
+ HWTSTAMP_TX_ON,
+};
+
+/* possible values for hwtstamp_config->rx_filter */
+enum {
+ /* time stamp no incoming packet at all */
+ HWTSTAMP_FILTER_NONE,
+
+ /* time stamp any incoming packet */
+ HWTSTAMP_FILTER_ALL,
+
+ /* return value: time stamp all packets requested plus some others */
+ HWTSTAMP_FILTER_SOME,
+
+ /* PTP v1, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
+
+ ...
+};
+
+
+DEVICE IMPLEMENTATION
+
+A driver which supports hardware time stamping must support the
+SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored
+in the skb with skb_hwtstamp_set().
+
+Time stamps for outgoing packets are to be generated as follows:
+- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware()
+ returns non-zero. If yes, then the driver is expected
+ to do hardware time stamping.
+- If this is possible for the skb and requested, then declare
+ that the driver is doing the time stamping by calling
+ skb_hwtstamp_tx_in_progress(). A driver not supporting
+ hardware time stamping doesn't do that. A driver must never
+ touch sk_buff::tstamp! It is used to store how time stamping
+ for an outgoing packets is to be done.
+- As soon as the driver has sent the packet and/or obtained a
+ hardware time stamp for it, it passes the time stamp back by
+ calling skb_hwtstamp_tx() with the original skb, the raw
+ hardware time stamp and a handle to the device (necessary
+ to convert the hardware time stamp to system time). If obtaining
+ the hardware time stamp somehow fails, then the driver should
+ not fall back to software time stamping. The rationale is that
+ this would occur at a later time in the processing pipeline
+ than other software time stamping and therefore could lead
+ to unexpected deltas between time stamps.
+- If the driver did not call skb_hwtstamp_tx_in_progress(), then
+ dev_hard_start_xmit() checks whether software time stamping
+ is wanted as fallback and potentially generates the time stamp.
diff --git a/Documentation/networking/timestamping/.gitignore b/Documentation/networking/timestamping/.gitignore
new file mode 100644
index 0000000..71e81eb
--- /dev/null
+++ b/Documentation/networking/timestamping/.gitignore
@@ -0,0 +1 @@
+timestamping
diff --git a/Documentation/networking/timestamping/Makefile b/Documentation/networking/timestamping/Makefile
new file mode 100644
index 0000000..2a1489f
--- /dev/null
+++ b/Documentation/networking/timestamping/Makefile
@@ -0,0 +1,6 @@
+CPPFLAGS = -I../../../include
+
+timestamping: timestamping.c
+
+clean:
+ rm -f timestamping
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
new file mode 100644
index 0000000..43d1431
--- /dev/null
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -0,0 +1,533 @@
+/*
+ * This program demonstrates how the various time stamping features in
+ * the Linux kernel work. It emulates the behavior of a PTP
+ * implementation in stand-alone master mode by sending PTPv1 Sync
+ * multicasts once every second. It looks for similar packets, but
+ * beyond that doesn't actually implement PTP.
+ *
+ * Outgoing packets are time stamped with SO_TIMESTAMPING with or
+ * without hardware support.
+ *
+ * Incoming packets are time stamped with SO_TIMESTAMPING with or
+ * without hardware support, SIOCGSTAMP[NS] (per-socket time stamp) and
+ * SO_TIMESTAMP[NS].
+ *
+ * Copyright (C) 2009 Intel Corporation.
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <sys/ioctl.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+
+#include "asm/types.h"
+#include "linux/net_tstamp.h"
+#include "linux/errqueue.h"
+
+#ifndef SO_TIMESTAMPING
+# define SO_TIMESTAMPING 37
+# define SCM_TIMESTAMPING SO_TIMESTAMPING
+#endif
+
+#ifndef SO_TIMESTAMPNS
+# define SO_TIMESTAMPNS 35
+#endif
+
+#ifndef SIOCGSTAMPNS
+# define SIOCGSTAMPNS 0x8907
+#endif
+
+#ifndef SIOCSHWTSTAMP
+# define SIOCSHWTSTAMP 0x89b0
+#endif
+
+static void usage(const char *error)
+{
+ if (error)
+ printf("invalid option: %s\n", error);
+ printf("timestamping interface option*\n\n"
+ "Options:\n"
+ " IP_MULTICAST_LOOP - looping outgoing multicasts\n"
+ " SO_TIMESTAMP - normal software time stamping, ms resolution\n"
+ " SO_TIMESTAMPNS - more accurate software time stamping\n"
+ " SOF_TIMESTAMPING_TX_HARDWARE - hardware time stamping of outgoing packets\n"
+ " SOF_TIMESTAMPING_TX_SOFTWARE - software fallback for outgoing packets\n"
+ " SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n"
+ " SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n"
+ " SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n"
+ " SOF_TIMESTAMPING_SYS_HARDWARE - request reporting of transformed HW time stamps\n"
+ " SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n"
+ " SIOCGSTAMP - check last socket time stamp\n"
+ " SIOCGSTAMPNS - more accurate socket time stamp\n");
+ exit(1);
+}
+
+static void bail(const char *error)
+{
+ printf("%s: %s\n", error, strerror(errno));
+ exit(1);
+}
+
+static const unsigned char sync[] = {
+ 0x00, 0x01, 0x00, 0x01,
+ 0x5f, 0x44, 0x46, 0x4c,
+ 0x54, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x01, 0x01,
+
+ /* fake uuid */
+ 0x00, 0x01,
+ 0x02, 0x03, 0x04, 0x05,
+
+ 0x00, 0x01, 0x00, 0x37,
+ 0x00, 0x00, 0x00, 0x08,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x49, 0x05, 0xcd, 0x01,
+ 0x29, 0xb1, 0x8d, 0xb0,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x01,
+
+ /* fake uuid */
+ 0x00, 0x01,
+ 0x02, 0x03, 0x04, 0x05,
+
+ 0x00, 0x00, 0x00, 0x37,
+ 0x00, 0x00, 0x00, 0x04,
+ 0x44, 0x46, 0x4c, 0x54,
+ 0x00, 0x00, 0xf0, 0x60,
+ 0x00, 0x01, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x01,
+ 0x00, 0x00, 0xf0, 0x60,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x04,
+ 0x44, 0x46, 0x4c, 0x54,
+ 0x00, 0x01,
+
+ /* fake uuid */
+ 0x00, 0x01,
+ 0x02, 0x03, 0x04, 0x05,
+
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00
+};
+
+static void sendpacket(int sock, struct sockaddr *addr, socklen_t addr_len)
+{
+ struct timeval now;
+ int res;
+
+ res = sendto(sock, sync, sizeof(sync), 0,
+ addr, addr_len);
+ gettimeofday(&now, 0);
+ if (res < 0)
+ printf("%s: %s\n", "send", strerror(errno));
+ else
+ printf("%ld.%06ld: sent %d bytes\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ res);
+}
+
+static void printpacket(struct msghdr *msg, int res,
+ char *data,
+ int sock, int recvmsg_flags,
+ int siocgstamp, int siocgstampns)
+{
+ struct sockaddr_in *from_addr = (struct sockaddr_in *)msg->msg_name;
+ struct cmsghdr *cmsg;
+ struct timeval tv;
+ struct timespec ts;
+ struct timeval now;
+
+ gettimeofday(&now, 0);
+
+ printf("%ld.%06ld: received %s data, %d bytes from %s, %d bytes control messages\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
+ res,
+ inet_ntoa(from_addr->sin_addr),
+ msg->msg_controllen);
+ for (cmsg = CMSG_FIRSTHDR(msg);
+ cmsg;
+ cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ printf(" cmsg len %d: ", cmsg->cmsg_len);
+ switch (cmsg->cmsg_level) {
+ case SOL_SOCKET:
+ printf("SOL_SOCKET ");
+ switch (cmsg->cmsg_type) {
+ case SO_TIMESTAMP: {
+ struct timeval *stamp =
+ (struct timeval *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMP %ld.%06ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_usec);
+ break;
+ }
+ case SO_TIMESTAMPNS: {
+ struct timespec *stamp =
+ (struct timespec *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMPNS %ld.%09ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ break;
+ }
+ case SO_TIMESTAMPING: {
+ struct timespec *stamp =
+ (struct timespec *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMPING ");
+ printf("SW %ld.%09ld ",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ stamp++;
+ printf("HW transformed %ld.%09ld ",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ stamp++;
+ printf("HW raw %ld.%09ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ break;
+ }
+ default:
+ printf("type %d", cmsg->cmsg_type);
+ break;
+ }
+ break;
+ case IPPROTO_IP:
+ printf("IPPROTO_IP ");
+ switch (cmsg->cmsg_type) {
+ case IP_RECVERR: {
+ struct sock_extended_err *err =
+ (struct sock_extended_err *)CMSG_DATA(cmsg);
+ printf("IP_RECVERR ee_errno '%s' ee_origin %d => %s",
+ strerror(err->ee_errno),
+ err->ee_origin,
+#ifdef SO_EE_ORIGIN_TIMESTAMPING
+ err->ee_origin == SO_EE_ORIGIN_TIMESTAMPING ?
+ "bounced packet" : "unexpected origin"
+#else
+ "probably SO_EE_ORIGIN_TIMESTAMPING"
+#endif
+ );
+ if (res < sizeof(sync))
+ printf(" => truncated data?!");
+ else if (!memcmp(sync, data + res - sizeof(sync),
+ sizeof(sync)))
+ printf(" => GOT OUR DATA BACK (HURRAY!)");
+ break;
+ }
+ case IP_PKTINFO: {
+ struct in_pktinfo *pktinfo =
+ (struct in_pktinfo *)CMSG_DATA(cmsg);
+ printf("IP_PKTINFO interface index %u",
+ pktinfo->ipi_ifindex);
+ break;
+ }
+ default:
+ printf("type %d", cmsg->cmsg_type);
+ break;
+ }
+ break;
+ default:
+ printf("level %d type %d",
+ cmsg->cmsg_level,
+ cmsg->cmsg_type);
+ break;
+ }
+ printf("\n");
+ }
+
+ if (siocgstamp) {
+ if (ioctl(sock, SIOCGSTAMP, &tv))
+ printf(" %s: %s\n", "SIOCGSTAMP", strerror(errno));
+ else
+ printf("SIOCGSTAMP %ld.%06ld\n",
+ (long)tv.tv_sec,
+ (long)tv.tv_usec);
+ }
+ if (siocgstampns) {
+ if (ioctl(sock, SIOCGSTAMPNS, &ts))
+ printf(" %s: %s\n", "SIOCGSTAMPNS", strerror(errno));
+ else
+ printf("SIOCGSTAMPNS %ld.%09ld\n",
+ (long)ts.tv_sec,
+ (long)ts.tv_nsec);
+ }
+}
+
+static void recvpacket(int sock, int recvmsg_flags,
+ int siocgstamp, int siocgstampns)
+{
+ char data[256];
+ struct msghdr msg;
+ struct iovec entry;
+ struct sockaddr_in from_addr;
+ struct {
+ struct cmsghdr cm;
+ char control[512];
+ } control;
+ int res;
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_iov = &entry;
+ msg.msg_iovlen = 1;
+ entry.iov_base = data;
+ entry.iov_len = sizeof(data);
+ msg.msg_name = (caddr_t)&from_addr;
+ msg.msg_namelen = sizeof(from_addr);
+ msg.msg_control = &control;
+ msg.msg_controllen = sizeof(control);
+
+ res = recvmsg(sock, &msg, recvmsg_flags|MSG_DONTWAIT);
+ if (res < 0) {
+ printf("%s %s: %s\n",
+ "recvmsg",
+ (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
+ strerror(errno));
+ } else {
+ printpacket(&msg, res, data,
+ sock, recvmsg_flags,
+ siocgstamp, siocgstampns);
+ }
+}
+
+int main(int argc, char **argv)
+{
+ int so_timestamping_flags = 0;
+ int so_timestamp = 0;
+ int so_timestampns = 0;
+ int siocgstamp = 0;
+ int siocgstampns = 0;
+ int ip_multicast_loop = 0;
+ char *interface;
+ int i;
+ int enabled = 1;
+ int sock;
+ struct ifreq device;
+ struct ifreq hwtstamp;
+ struct hwtstamp_config hwconfig, hwconfig_requested;
+ struct sockaddr_in addr;
+ struct ip_mreq imr;
+ struct in_addr iaddr;
+ int val;
+ socklen_t len;
+ struct timeval next;
+
+ if (argc < 2)
+ usage(0);
+ interface = argv[1];
+
+ for (i = 2; i < argc; i++) {
+ if (!strcasecmp(argv[i], "SO_TIMESTAMP"))
+ so_timestamp = 1;
+ else if (!strcasecmp(argv[i], "SO_TIMESTAMPNS"))
+ so_timestampns = 1;
+ else if (!strcasecmp(argv[i], "SIOCGSTAMP"))
+ siocgstamp = 1;
+ else if (!strcasecmp(argv[i], "SIOCGSTAMPNS"))
+ siocgstampns = 1;
+ else if (!strcasecmp(argv[i], "IP_MULTICAST_LOOP"))
+ ip_multicast_loop = 1;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_HARDWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_TX_HARDWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_SOFTWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_TX_SOFTWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_HARDWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_RX_HARDWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_SOFTWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SYS_HARDWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_SYS_HARDWARE;
+ else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE"))
+ so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE;
+ else
+ usage(argv[i]);
+ }
+
+ sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
+ if (socket < 0)
+ bail("socket");
+
+ memset(&device, 0, sizeof(device));
+ strncpy(device.ifr_name, interface, sizeof(device.ifr_name));
+ if (ioctl(sock, SIOCGIFADDR, &device) < 0)
+ bail("getting interface IP address");
+
+ memset(&hwtstamp, 0, sizeof(hwtstamp));
+ strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name));
+ hwtstamp.ifr_data = (void *)&hwconfig;
+ memset(&hwconfig, 0, sizeof(&hwconfig));
+ hwconfig.tx_type =
+ (so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ?
+ HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
+ hwconfig.rx_filter =
+ (so_timestamping_flags & SOF_TIMESTAMPING_RX_HARDWARE) ?
+ HWTSTAMP_FILTER_PTP_V1_L4_SYNC : HWTSTAMP_FILTER_NONE;
+ hwconfig_requested = hwconfig;
+ if (ioctl(sock, SIOCSHWTSTAMP, &hwtstamp) < 0) {
+ if ((errno == EINVAL || errno == ENOTSUP) &&
+ hwconfig_requested.tx_type == HWTSTAMP_TX_OFF &&
+ hwconfig_requested.rx_filter == HWTSTAMP_FILTER_NONE)
+ printf("SIOCSHWTSTAMP: disabling hardware time stamping not possible\n");
+ else
+ bail("SIOCSHWTSTAMP");
+ }
+ printf("SIOCSHWTSTAMP: tx_type %d requested, got %d; rx_filter %d requested, got %d\n",
+ hwconfig_requested.tx_type, hwconfig.tx_type,
+ hwconfig_requested.rx_filter, hwconfig.rx_filter);
+
+ /* bind to PTP port */
+ addr.sin_family = AF_INET;
+ addr.sin_addr.s_addr = htonl(INADDR_ANY);
+ addr.sin_port = htons(319 /* PTP event port */);
+ if (bind(sock,
+ (struct sockaddr *)&addr,
+ sizeof(struct sockaddr_in)) < 0)
+ bail("bind");
+
+ /* set multicast group for outgoing packets */
+ inet_aton("224.0.1.130", &iaddr); /* alternate PTP domain 1 */
+ addr.sin_addr = iaddr;
+ imr.imr_multiaddr.s_addr = iaddr.s_addr;
+ imr.imr_interface.s_addr =
+ ((struct sockaddr_in *)&device.ifr_addr)->sin_addr.s_addr;
+ if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF,
+ &imr.imr_interface.s_addr, sizeof(struct in_addr)) < 0)
+ bail("set multicast");
+
+ /* join multicast group, loop our own packet */
+ if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,
+ &imr, sizeof(struct ip_mreq)) < 0)
+ bail("join multicast group");
+
+ if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP,
+ &ip_multicast_loop, sizeof(enabled)) < 0) {
+ bail("loop multicast");
+ }
+
+ /* set socket options for time stamping */
+ if (so_timestamp &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP,
+ &enabled, sizeof(enabled)) < 0)
+ bail("setsockopt SO_TIMESTAMP");
+
+ if (so_timestampns &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS,
+ &enabled, sizeof(enabled)) < 0)
+ bail("setsockopt SO_TIMESTAMPNS");
+
+ if (so_timestamping_flags &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING,
+ &so_timestamping_flags,
+ sizeof(so_timestamping_flags)) < 0)
+ bail("setsockopt SO_TIMESTAMPING");
+
+ /* request IP_PKTINFO for debugging purposes */
+ if (setsockopt(sock, SOL_IP, IP_PKTINFO,
+ &enabled, sizeof(enabled)) < 0)
+ printf("%s: %s\n", "setsockopt IP_PKTINFO", strerror(errno));
+
+ /* verify socket options */
+ len = sizeof(val);
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &val, &len) < 0)
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMP", strerror(errno));
+ else
+ printf("SO_TIMESTAMP %d\n", val);
+
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &val, &len) < 0)
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMPNS",
+ strerror(errno));
+ else
+ printf("SO_TIMESTAMPNS %d\n", val);
+
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &val, &len) < 0) {
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMPING",
+ strerror(errno));
+ } else {
+ printf("SO_TIMESTAMPING %d\n", val);
+ if (val != so_timestamping_flags)
+ printf(" not the expected value %d\n",
+ so_timestamping_flags);
+ }
+
+ /* send packets forever every five seconds */
+ gettimeofday(&next, 0);
+ next.tv_sec = (next.tv_sec + 1) / 5 * 5;
+ next.tv_usec = 0;
+ while (1) {
+ struct timeval now;
+ struct timeval delta;
+ long delta_us;
+ int res;
+ fd_set readfs, errorfs;
+
+ gettimeofday(&now, 0);
+ delta_us = (long)(next.tv_sec - now.tv_sec) * 1000000 +
+ (long)(next.tv_usec - now.tv_usec);
+ if (delta_us > 0) {
+ /* continue waiting for timeout or data */
+ delta.tv_sec = delta_us / 1000000;
+ delta.tv_usec = delta_us % 1000000;
+
+ FD_ZERO(&readfs);
+ FD_ZERO(&errorfs);
+ FD_SET(sock, &readfs);
+ FD_SET(sock, &errorfs);
+ printf("%ld.%06ld: select %ldus\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ delta_us);
+ res = select(sock + 1, &readfs, 0, &errorfs, &delta);
+ gettimeofday(&now, 0);
+ printf("%ld.%06ld: select returned: %d, %s\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ res,
+ res < 0 ? strerror(errno) : "success");
+ if (res > 0) {
+ if (FD_ISSET(sock, &readfs))
+ printf("ready for reading\n");
+ if (FD_ISSET(sock, &errorfs))
+ printf("has error\n");
+ recvpacket(sock, 0,
+ siocgstamp,
+ siocgstampns);
+ recvpacket(sock, MSG_ERRQUEUE,
+ siocgstamp,
+ siocgstampns);
+ }
+ } else {
+ /* write one packet */
+ sendpacket(sock,
+ (struct sockaddr *)&addr,
+ sizeof(addr));
+ next.tv_sec += 5;
+ continue;
+ }
+ }
+
+ return 0;
+}
diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index a1057c2..3641ec1 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -62,6 +62,9 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
/* O_NONBLOCK clashes with the bits used for socket types. Therefore we
* have to define SOCK_NONBLOCK to a different value here.
*/
diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h
index 6817be9..537de4e 100644
--- a/arch/arm/include/asm/socket.h
+++ b/arch/arm/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/avr32/include/asm/socket.h b/arch/avr32/include/asm/socket.h
index 35863f2..04c8606 100644
--- a/arch/avr32/include/asm/socket.h
+++ b/arch/avr32/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/blackfin/include/asm/socket.h b/arch/blackfin/include/asm/socket.h
index 2ca702e..fac7fe9 100644
--- a/arch/blackfin/include/asm/socket.h
+++ b/arch/blackfin/include/asm/socket.h
@@ -53,4 +53,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/cris/include/asm/socket.h b/arch/cris/include/asm/socket.h
index 9df0ca8..d5cf740 100644
--- a/arch/cris/include/asm/socket.h
+++ b/arch/cris/include/asm/socket.h
@@ -56,6 +56,9 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h
index da2520d..602518a 100644
--- a/arch/h8300/include/asm/socket.h
+++ b/arch/h8300/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/asm/socket.h b/arch/ia64/include/asm/socket.h
index d5ef0aa..7454212 100644
--- a/arch/ia64/include/asm/socket.h
+++ b/arch/ia64/include/asm/socket.h
@@ -63,4 +63,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m68k/include/asm/socket.h b/arch/m68k/include/asm/socket.h
index dbc64e9..ca87f93 100644
--- a/arch/m68k/include/asm/socket.h
+++ b/arch/m68k/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/mips/include/asm/socket.h b/arch/mips/include/asm/socket.h
index facc2d7..2abca17 100644
--- a/arch/mips/include/asm/socket.h
+++ b/arch/mips/include/asm/socket.h
@@ -75,6 +75,9 @@ To add: #define SO_REUSEPORT 0x0200 /* Allow local address and port reuse. */
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#ifdef __KERNEL__
/** sock_type - Socket types
diff --git a/arch/parisc/include/asm/socket.h b/arch/parisc/include/asm/socket.h
index fba402c..885472b 100644
--- a/arch/parisc/include/asm/socket.h
+++ b/arch/parisc/include/asm/socket.h
@@ -54,6 +54,9 @@
#define SO_MARK 0x401f
+#define SO_TIMESTAMPING 0x4020
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
/* O_NONBLOCK clashes with the bits used for socket types. Therefore we
* have to define SOCK_NONBLOCK to a different value here.
*/
diff --git a/arch/powerpc/include/asm/socket.h b/arch/powerpc/include/asm/socket.h
index f5a4e16..1e5cfad 100644
--- a/arch/powerpc/include/asm/socket.h
+++ b/arch/powerpc/include/asm/socket.h
@@ -61,4 +61,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/asm/socket.h b/arch/s390/include/asm/socket.h
index c786ab6..02330c5 100644
--- a/arch/s390/include/asm/socket.h
+++ b/arch/s390/include/asm/socket.h
@@ -62,4 +62,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/sh/include/asm/socket.h b/arch/sh/include/asm/socket.h
index 6d4bf65..345653b 100644
--- a/arch/sh/include/asm/socket.h
+++ b/arch/sh/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* __ASM_SH_SOCKET_H */
diff --git a/arch/sparc/include/asm/socket.h b/arch/sparc/include/asm/socket.h
index bf50d0c..982a12f 100644
--- a/arch/sparc/include/asm/socket.h
+++ b/arch/sparc/include/asm/socket.h
@@ -50,6 +50,9 @@
#define SO_MARK 0x0022
+#define SO_TIMESTAMPING 0x0023
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
/* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 0x5001
#define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002
diff --git a/arch/x86/include/asm/socket.h b/arch/x86/include/asm/socket.h
index 8ab9cc8..ca8bf2c 100644
--- a/arch/x86/include/asm/socket.h
+++ b/arch/x86/include/asm/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_X86_SOCKET_H */
diff --git a/arch/xtensa/include/asm/socket.h b/arch/xtensa/include/asm/socket.h
index 6100682..dd1a7a4 100644
--- a/arch/xtensa/include/asm/socket.h
+++ b/arch/xtensa/include/asm/socket.h
@@ -65,4 +65,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _XTENSA_SOCKET_H */
diff --git a/include/asm-frv/socket.h b/include/asm-frv/socket.h
index e51ca67..57c3d40 100644
--- a/include/asm-frv/socket.h
+++ b/include/asm-frv/socket.h
@@ -54,5 +54,8 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/include/asm-m32r/socket.h b/include/asm-m32r/socket.h
index 9a0e200..be7ed58 100644
--- a/include/asm-m32r/socket.h
+++ b/include/asm-m32r/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_M32R_SOCKET_H */
diff --git a/include/asm-mn10300/socket.h b/include/asm-mn10300/socket.h
index 80af9c4..fb5daf4 100644
--- a/include/asm-mn10300/socket.h
+++ b/include/asm-mn10300/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/include/linux/errqueue.h b/include/linux/errqueue.h
index ceb1454..ec12cc7 100644
--- a/include/linux/errqueue.h
+++ b/include/linux/errqueue.h
@@ -18,6 +18,7 @@ struct sock_extended_err
#define SO_EE_ORIGIN_LOCAL 1
#define SO_EE_ORIGIN_ICMP 2
#define SO_EE_ORIGIN_ICMP6 3
+#define SO_EE_ORIGIN_TIMESTAMPING 4
#define SO_EE_OFFENDER(ee) ((struct sockaddr*)((ee)+1))
diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h
new file mode 100644
index 0000000..a3b8546
--- /dev/null
+++ b/include/linux/net_tstamp.h
@@ -0,0 +1,104 @@
+/*
+ * Userspace API for hardware time stamping of network packets
+ *
+ * Copyright (C) 2008,2009 Intel Corporation
+ * Author: Patrick Ohly <patrick.ohly@intel.com>
+ *
+ */
+
+#ifndef _NET_TIMESTAMPING_H
+#define _NET_TIMESTAMPING_H
+
+#include <linux/socket.h> /* for SO_TIMESTAMPING */
+
+/* SO_TIMESTAMPING gets an integer bit field comprised of these values */
+enum {
+ SOF_TIMESTAMPING_TX_HARDWARE = (1<<0),
+ SOF_TIMESTAMPING_TX_SOFTWARE = (1<<1),
+ SOF_TIMESTAMPING_RX_HARDWARE = (1<<2),
+ SOF_TIMESTAMPING_RX_SOFTWARE = (1<<3),
+ SOF_TIMESTAMPING_SOFTWARE = (1<<4),
+ SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5),
+ SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6),
+ SOF_TIMESTAMPING_MASK =
+ (SOF_TIMESTAMPING_RAW_HARDWARE - 1) |
+ SOF_TIMESTAMPING_RAW_HARDWARE
+};
+
+/**
+ * struct hwtstamp_config - %SIOCSHWTSTAMP parameter
+ *
+ * @flags: no flags defined right now, must be zero
+ * @tx_type: one of HWTSTAMP_TX_*
+ * @rx_type: one of one of HWTSTAMP_FILTER_*
+ *
+ * %SIOCSHWTSTAMP expects a &struct ifreq with a ifr_data pointer to
+ * this structure. dev_ifsioc() in the kernel takes care of the
+ * translation between 32 bit userspace and 64 bit kernel. The
+ * structure is intentionally chosen so that it has the same layout on
+ * 32 and 64 bit systems, don't break this!
+ */
+struct hwtstamp_config {
+ int flags;
+ int tx_type;
+ int rx_filter;
+};
+
+/* possible values for hwtstamp_config->tx_type */
+enum {
+ /*
+ * No outgoing packet will need hardware time stamping;
+ * should a packet arrive which asks for it, no hardware
+ * time stamping will be done.
+ */
+ HWTSTAMP_TX_OFF,
+
+ /*
+ * Enables hardware time stamping for outgoing packets;
+ * the sender of the packet decides which are to be
+ * time stamped by setting %SOF_TIMESTAMPING_TX_SOFTWARE
+ * before sending the packet.
+ */
+ HWTSTAMP_TX_ON,
+};
+
+/* possible values for hwtstamp_config->rx_filter */
+enum {
+ /* time stamp no incoming packet at all */
+ HWTSTAMP_FILTER_NONE,
+
+ /* time stamp any incoming packet */
+ HWTSTAMP_FILTER_ALL,
+
+ /* return value: time stamp all packets requested plus some others */
+ HWTSTAMP_FILTER_SOME,
+
+ /* PTP v1, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
+ /* PTP v1, UDP, Sync packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_SYNC,
+ /* PTP v1, UDP, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ,
+ /* PTP v2, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_EVENT,
+ /* PTP v2, UDP, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_SYNC,
+ /* PTP v2, UDP, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ,
+
+ /* 802.AS1, Ethernet, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_EVENT,
+ /* 802.AS1, Ethernet, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_SYNC,
+ /* 802.AS1, Ethernet, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ,
+
+ /* PTP v2/802.AS1, any layer, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_EVENT,
+ /* PTP v2/802.AS1, any layer, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_SYNC,
+ /* PTP v2/802.AS1, any layer, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_DELAY_REQ,
+};
+
+#endif /* _NET_TIMESTAMPING_H */
diff --git a/include/linux/sockios.h b/include/linux/sockios.h
index abef759..241f179 100644
--- a/include/linux/sockios.h
+++ b/include/linux/sockios.h
@@ -122,6 +122,9 @@
#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
#define SIOCBRDELIF 0x89a3 /* remove interface from bridge */
+/* hardware time stamping: parameters in linux/net_tstamp.h */
+#define SIOCSHWTSTAMP 0x89b0
+
/* Device private ioctl calls */
/*
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping
2009-02-12 15:03 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
The additional per-packet information (16 bytes for time stamps, 1
byte for flags) is stored for all packets in the skb_shared_info
struct. This implementation detail is hidden from users of that
information via skb_* accessor functions. A separate struct resp.
union is used for the additional information so that it can be
stored/copied easily outside of skb_shared_info.
Compared to previous implementations (reusing the tstamp field
depending on the context, optional additional structures) this
is the simplest solution. It does not extend sk_buff itself.
TX time stamping is implemented in software if the device driver
doesn't support hardware time stamping.
The new semantic for hardware/software time stamping around
ndo_start_xmit() is based on two assumptions about existing
network device drivers which don't support hardware time
stamping and know nothing about it:
- they leave the new skb_shared_tx unmodified
- the keep the connection to the originating socket in skb->sk
alive, i.e., don't call skb_orphan()
Given that skb_shared_tx is new, the first assumption is safe.
The second is only true for some drivers. As a result, software
TX time stamping currently works with the bnx2 driver, but not
with the unmodified igb driver (the two drivers this patch series
was tested with).
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
include/linux/skbuff.h | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
net/core/dev.c | 32 ++++++++++++++++-
net/core/skbuff.c | 41 +++++++++++++++++++++
3 files changed, 161 insertions(+), 3 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9247008..f96bc91 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -132,6 +132,57 @@ struct skb_frag_struct {
__u32 size;
};
+#define HAVE_HW_TIME_STAMP
+
+/**
+ * skb_shared_hwtstamps - hardware time stamps
+ *
+ * @hwtstamp: hardware time stamp transformed into duration
+ * since arbitrary point in time
+ * @syststamp: hwtstamp transformed to system time base
+ *
+ * Software time stamps generated by ktime_get_real() are stored in
+ * skb->tstamp. The relation between the different kinds of time
+ * stamps is as follows:
+ *
+ * syststamp and tstamp can be compared against each other in
+ * arbitrary combinations. The accuracy of a
+ * syststamp/tstamp/"syststamp from other device" comparison is
+ * limited by the accuracy of the transformation into system time
+ * base. This depends on the device driver and its underlying
+ * hardware.
+ *
+ * hwtstamps can only be compared against other hwtstamps from
+ * the same device.
+ *
+ * This structure is attached to packets as part of the
+ * &skb_shared_info. Use skb_hwtstamps() to get a pointer.
+ */
+struct skb_shared_hwtstamps {
+ ktime_t hwtstamp;
+ ktime_t syststamp;
+};
+
+/**
+ * skb_shared_tx - instructions for time stamping of outgoing packets
+ *
+ * @hardware: generate hardware time stamp
+ * @software: generate software time stamp
+ * @in_progress: device driver is going to provide
+ * hardware time stamp
+ *
+ * These flags are attached to packets as part of the
+ * &skb_shared_info. Use skb_tx() to get a pointer.
+ */
+union skb_shared_tx {
+ struct {
+ __u8 hardware:1,
+ software:1,
+ in_progress:1;
+ };
+ __u8 flags;
+};
+
/* This data is invariant across clones and lives at
* the end of the header data, ie. at skb->end.
*/
@@ -143,10 +194,12 @@ struct skb_shared_info {
unsigned short gso_segs;
unsigned short gso_type;
__be32 ip6_frag_id;
+ union skb_shared_tx tx_flags;
#ifdef CONFIG_HAS_DMA
unsigned int num_dma_maps;
#endif
struct sk_buff *frag_list;
+ struct skb_shared_hwtstamps hwtstamps;
skb_frag_t frags[MAX_SKB_FRAGS];
#ifdef CONFIG_HAS_DMA
dma_addr_t dma_maps[MAX_SKB_FRAGS + 1];
@@ -465,6 +518,16 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb)
/* Internal */
#define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB)))
+static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb)
+{
+ return &skb_shinfo(skb)->hwtstamps;
+}
+
+static inline union skb_shared_tx *skb_tx(struct sk_buff *skb)
+{
+ return &skb_shinfo(skb)->tx_flags;
+}
+
/**
* skb_queue_empty - check if a queue is empty
* @list: queue head
@@ -1730,6 +1793,11 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
extern void skb_init(void);
+static inline ktime_t skb_get_ktime(const struct sk_buff *skb)
+{
+ return skb->tstamp;
+}
+
/**
* skb_get_timestamp - get timestamp from a skb
* @skb: skb to get stamp from
@@ -1739,11 +1807,18 @@ extern void skb_init(void);
* This function converts the offset back to a struct timeval and stores
* it in stamp.
*/
-static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval *stamp)
+static inline void skb_get_timestamp(const struct sk_buff *skb,
+ struct timeval *stamp)
{
*stamp = ktime_to_timeval(skb->tstamp);
}
+static inline void skb_get_timestampns(const struct sk_buff *skb,
+ struct timespec *stamp)
+{
+ *stamp = ktime_to_timespec(skb->tstamp);
+}
+
static inline void __net_timestamp(struct sk_buff *skb)
{
skb->tstamp = ktime_get_real();
@@ -1759,6 +1834,20 @@ static inline ktime_t net_invalid_timestamp(void)
return ktime_set(0, 0);
}
+/**
+ * skb_tstamp_tx - queue clone of skb with send time stamps
+ * @orig_skb: the original outgoing packet
+ * @hwtstamps: hardware time stamps, may be NULL if not available
+ *
+ * If the skb has a socket associated, then this function clones the
+ * skb (thus sharing the actual data and optional structures), stores
+ * the optional hardware time stamping information (if non NULL) or
+ * generates a software time stamp (otherwise), then queues the clone
+ * to the error queue of the socket. Errors are silently ignored.
+ */
+extern void skb_tstamp_tx(struct sk_buff *orig_skb,
+ struct skb_shared_hwtstamps *hwtstamps);
+
extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 1e27a67..d20c28e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1672,10 +1672,21 @@ static int dev_gso_segment(struct sk_buff *skb)
return 0;
}
+static void tstamp_tx(struct sk_buff *skb)
+{
+ union skb_shared_tx *shtx =
+ skb_tx(skb);
+ if (unlikely(shtx->software &&
+ !shtx->in_progress)) {
+ skb_tstamp_tx(skb, NULL);
+ }
+}
+
int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
struct netdev_queue *txq)
{
const struct net_device_ops *ops = dev->netdev_ops;
+ int rc;
prefetch(&dev->netdev_ops->ndo_start_xmit);
if (likely(!skb->next)) {
@@ -1689,13 +1700,29 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
goto gso;
}
- return ops->ndo_start_xmit(skb, dev);
+ rc = ops->ndo_start_xmit(skb, dev);
+ /*
+ * TODO: if skb_orphan() was called by
+ * dev->hard_start_xmit() (for example, the unmodified
+ * igb driver does that; bnx2 doesn't), then
+ * skb_tx_software_timestamp() will be unable to send
+ * back the time stamp.
+ *
+ * How can this be prevented? Always create another
+ * reference to the socket before calling
+ * dev->hard_start_xmit()? Prevent that skb_orphan()
+ * does anything in dev->hard_start_xmit() by clearing
+ * the skb destructor before the call and restoring it
+ * afterwards, then doing the skb_orphan() ourselves?
+ */
+ if (likely(!rc))
+ tstamp_tx(skb);
+ return rc;
}
gso:
do {
struct sk_buff *nskb = skb->next;
- int rc;
skb->next = nskb->next;
nskb->next = NULL;
@@ -1705,6 +1732,7 @@ gso:
skb->next = nskb;
return rc;
}
+ tstamp_tx(skb);
if (unlikely(netif_tx_queue_stopped(txq) && skb->next))
return NETDEV_TX_BUSY;
} while (skb->next);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7657cec..7b831a7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -55,6 +55,7 @@
#include <linux/rtnetlink.h>
#include <linux/init.h>
#include <linux/scatterlist.h>
+#include <linux/errqueue.h>
#include <net/protocol.h>
#include <net/dst.h>
@@ -215,7 +216,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
shinfo->gso_segs = 0;
shinfo->gso_type = 0;
shinfo->ip6_frag_id = 0;
+ shinfo->tx_flags.flags = 0;
shinfo->frag_list = NULL;
+ memset(&shinfo->hwtstamps, 0, sizeof(shinfo->hwtstamps));
if (fclone) {
struct sk_buff *child = skb + 1;
@@ -2940,6 +2943,44 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
}
EXPORT_SYMBOL_GPL(skb_cow_data);
+void skb_tstamp_tx(struct sk_buff *orig_skb,
+ struct skb_shared_hwtstamps *hwtstamps)
+{
+ struct sock *sk = orig_skb->sk;
+ struct sock_exterr_skb *serr;
+ struct sk_buff *skb;
+ int err;
+
+ if (!sk)
+ return;
+
+ skb = skb_clone(orig_skb, GFP_ATOMIC);
+ if (!skb)
+ return;
+
+ if (hwtstamps) {
+ *skb_hwtstamps(skb) =
+ *hwtstamps;
+ } else {
+ /*
+ * no hardware time stamps available,
+ * so keep the skb_shared_tx and only
+ * store software time stamp
+ */
+ skb->tstamp = ktime_get_real();
+ }
+
+ serr = SKB_EXT_ERR(skb);
+ memset(serr, 0, sizeof(*serr));
+ serr->ee.ee_errno = ENOMSG;
+ serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+ err = sock_queue_err_skb(sk, skb);
+ if (err)
+ kfree_skb(skb);
+}
+EXPORT_SYMBOL_GPL(skb_tstamp_tx);
+
+
/**
* skb_partial_csum_set - set up and verify partial csum values for packet
* @skb: the skb to set
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING
2009-02-12 15:03 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
The overlap with the old SO_TIMESTAMP[NS] options is handled so
that time stamping in software (net_enable_timestamp()) is
enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
is set. It's disabled if all of these are off.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
include/net/sock.h | 44 +++++++++++++++++++++++++--
net/compat.c | 19 +++++++----
net/core/sock.c | 81 ++++++++++++++++++++++++++++++++++++++++++-------
net/socket.c | 84 +++++++++++++++++++++++++++++++++++++++------------
4 files changed, 186 insertions(+), 42 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index c047def..0a54d9b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -158,7 +158,7 @@ struct sock_common {
* @sk_allocation: allocation mode
* @sk_sndbuf: size of send buffer in bytes
* @sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
- * %SO_OOBINLINE settings
+ * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
* @sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets
* @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO)
* @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
@@ -488,6 +488,13 @@ enum sock_flags {
SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+ SOCK_TIMESTAMPING_TX_HARDWARE, /* %SOF_TIMESTAMPING_TX_HARDWARE */
+ SOCK_TIMESTAMPING_TX_SOFTWARE, /* %SOF_TIMESTAMPING_TX_SOFTWARE */
+ SOCK_TIMESTAMPING_RX_HARDWARE, /* %SOF_TIMESTAMPING_RX_HARDWARE */
+ SOCK_TIMESTAMPING_RX_SOFTWARE, /* %SOF_TIMESTAMPING_RX_SOFTWARE */
+ SOCK_TIMESTAMPING_SOFTWARE, /* %SOF_TIMESTAMPING_SOFTWARE */
+ SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */
+ SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */
};
static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -1346,14 +1353,45 @@ static __inline__ void
sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
{
ktime_t kt = skb->tstamp;
+ struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb);
- if (sock_flag(sk, SOCK_RCVTSTAMP))
+ /*
+ * generate control messages if
+ * - receive time stamping in software requested (SOCK_RCVTSTAMP
+ * or SOCK_TIMESTAMPING_RX_SOFTWARE)
+ * - software time stamp available and wanted
+ * (SOCK_TIMESTAMPING_SOFTWARE)
+ * - hardware time stamps available and wanted
+ * (SOCK_TIMESTAMPING_SYS_HARDWARE or
+ * SOCK_TIMESTAMPING_RAW_HARDWARE)
+ */
+ if (sock_flag(sk, SOCK_RCVTSTAMP) ||
+ sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE) ||
+ (kt.tv64 && sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) ||
+ (hwtstamps->hwtstamp.tv64 &&
+ sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) ||
+ (hwtstamps->syststamp.tv64 &&
+ sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE)))
__sock_recv_timestamp(msg, sk, skb);
else
sk->sk_stamp = kt;
}
/**
+ * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
+ * @msg: outgoing packet
+ * @sk: socket sending this packet
+ * @shtx: filled with instructions for time stamping
+ *
+ * Currently only depends on SOCK_TIMESTAMPING* flags. Returns error code if
+ * parameters are invalid.
+ */
+extern int sock_tx_timestamp(struct msghdr *msg,
+ struct sock *sk,
+ union skb_shared_tx *shtx);
+
+
+/**
* sk_eat_skb - Release a skb if it is no longer needed
* @sk: socket to eat this skb from
* @skb: socket buffer to eat
@@ -1421,7 +1459,7 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb)
return NULL;
}
-extern void sock_enable_timestamp(struct sock *sk);
+extern void sock_enable_timestamp(struct sock *sk, int flag);
extern int sock_get_timestamp(struct sock *, struct timeval __user *);
extern int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/net/compat.c b/net/compat.c
index a3a2ba0..8d73905 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -216,7 +216,7 @@ Efault:
int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *data)
{
struct compat_timeval ctv;
- struct compat_timespec cts;
+ struct compat_timespec cts[3];
struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
struct compat_cmsghdr cmhdr;
int cmlen;
@@ -233,12 +233,17 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
data = &ctv;
len = sizeof(ctv);
}
- if (level == SOL_SOCKET && type == SCM_TIMESTAMPNS) {
+ if (level == SOL_SOCKET &&
+ (type == SCM_TIMESTAMPNS || type == SCM_TIMESTAMPING)) {
+ int count = type == SCM_TIMESTAMPNS ? 1 : 3;
+ int i;
struct timespec *ts = (struct timespec *)data;
- cts.tv_sec = ts->tv_sec;
- cts.tv_nsec = ts->tv_nsec;
+ for (i = 0; i < count; i++) {
+ cts[i].tv_sec = ts[i].tv_sec;
+ cts[i].tv_nsec = ts[i].tv_nsec;
+ }
data = &cts;
- len = sizeof(cts);
+ len = sizeof(cts[0]) * count;
}
cmlen = CMSG_COMPAT_LEN(len);
@@ -455,7 +460,7 @@ int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
struct timeval tv;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
tv = ktime_to_timeval(sk->sk_stamp);
if (tv.tv_sec == -1)
return err;
@@ -479,7 +484,7 @@ int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *usersta
struct timespec ts;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
ts = ktime_to_timespec(sk->sk_stamp);
if (ts.tv_sec == -1)
return err;
diff --git a/net/core/sock.c b/net/core/sock.c
index c64996f..d22933a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -120,6 +120,7 @@
#include <net/net_namespace.h>
#include <net/request_sock.h>
#include <net/sock.h>
+#include <linux/net_tstamp.h>
#include <net/xfrm.h>
#include <linux/ipsec.h>
@@ -255,11 +256,14 @@ static void sock_warn_obsolete_bsdism(const char *name)
}
}
-static void sock_disable_timestamp(struct sock *sk)
+static void sock_disable_timestamp(struct sock *sk, int flag)
{
- if (sock_flag(sk, SOCK_TIMESTAMP)) {
- sock_reset_flag(sk, SOCK_TIMESTAMP);
- net_disable_timestamp();
+ if (sock_flag(sk, flag)) {
+ sock_reset_flag(sk, flag);
+ if (!sock_flag(sk, SOCK_TIMESTAMP) &&
+ !sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) {
+ net_disable_timestamp();
+ }
}
}
@@ -614,13 +618,38 @@ set_rcvbuf:
else
sock_set_flag(sk, SOCK_RCVTSTAMPNS);
sock_set_flag(sk, SOCK_RCVTSTAMP);
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
} else {
sock_reset_flag(sk, SOCK_RCVTSTAMP);
sock_reset_flag(sk, SOCK_RCVTSTAMPNS);
}
break;
+ case SO_TIMESTAMPING:
+ if (val & ~SOF_TIMESTAMPING_MASK) {
+ ret = EINVAL;
+ break;
+ }
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE,
+ val & SOF_TIMESTAMPING_TX_HARDWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE,
+ val & SOF_TIMESTAMPING_TX_SOFTWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE,
+ val & SOF_TIMESTAMPING_RX_HARDWARE);
+ if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
+ sock_enable_timestamp(sk,
+ SOCK_TIMESTAMPING_RX_SOFTWARE);
+ else
+ sock_disable_timestamp(sk,
+ SOCK_TIMESTAMPING_RX_SOFTWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_SOFTWARE,
+ val & SOF_TIMESTAMPING_SOFTWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE,
+ val & SOF_TIMESTAMPING_SYS_HARDWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE,
+ val & SOF_TIMESTAMPING_RAW_HARDWARE);
+ break;
+
case SO_RCVLOWAT:
if (val < 0)
val = INT_MAX;
@@ -766,6 +795,24 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
v.val = sock_flag(sk, SOCK_RCVTSTAMPNS);
break;
+ case SO_TIMESTAMPING:
+ v.val = 0;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE))
+ v.val |= SOF_TIMESTAMPING_TX_HARDWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE))
+ v.val |= SOF_TIMESTAMPING_TX_SOFTWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE))
+ v.val |= SOF_TIMESTAMPING_RX_HARDWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE))
+ v.val |= SOF_TIMESTAMPING_RX_SOFTWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE))
+ v.val |= SOF_TIMESTAMPING_SOFTWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE))
+ v.val |= SOF_TIMESTAMPING_SYS_HARDWARE;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE))
+ v.val |= SOF_TIMESTAMPING_RAW_HARDWARE;
+ break;
+
case SO_RCVTIMEO:
lv=sizeof(struct timeval);
if (sk->sk_rcvtimeo == MAX_SCHEDULE_TIMEOUT) {
@@ -967,7 +1014,8 @@ void sk_free(struct sock *sk)
rcu_assign_pointer(sk->sk_filter, NULL);
}
- sock_disable_timestamp(sk);
+ sock_disable_timestamp(sk, SOCK_TIMESTAMP);
+ sock_disable_timestamp(sk, SOCK_TIMESTAMPING_RX_SOFTWARE);
if (atomic_read(&sk->sk_omem_alloc))
printk(KERN_DEBUG "%s: optmem leakage (%d bytes) detected.\n",
@@ -1785,7 +1833,7 @@ int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
{
struct timeval tv;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
tv = ktime_to_timeval(sk->sk_stamp);
if (tv.tv_sec == -1)
return -ENOENT;
@@ -1801,7 +1849,7 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
{
struct timespec ts;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
ts = ktime_to_timespec(sk->sk_stamp);
if (ts.tv_sec == -1)
return -ENOENT;
@@ -1813,11 +1861,20 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
}
EXPORT_SYMBOL(sock_get_timestampns);
-void sock_enable_timestamp(struct sock *sk)
+void sock_enable_timestamp(struct sock *sk, int flag)
{
- if (!sock_flag(sk, SOCK_TIMESTAMP)) {
- sock_set_flag(sk, SOCK_TIMESTAMP);
- net_enable_timestamp();
+ if (!sock_flag(sk, flag)) {
+ sock_set_flag(sk, flag);
+ /*
+ * we just set one of the two flags which require net
+ * time stamping, but time stamping might have been on
+ * already because of the other one
+ */
+ if (!sock_flag(sk,
+ flag == SOCK_TIMESTAMP ?
+ SOCK_TIMESTAMPING_RX_SOFTWARE :
+ SOCK_TIMESTAMP))
+ net_enable_timestamp();
}
}
diff --git a/net/socket.c b/net/socket.c
index 35dd737..47a3dc0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -545,6 +545,18 @@ void sock_release(struct socket *sock)
sock->file = NULL;
}
+int sock_tx_timestamp(struct msghdr *msg, struct sock *sk,
+ union skb_shared_tx *shtx)
+{
+ shtx->flags = 0;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE))
+ shtx->hardware = 1;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE))
+ shtx->software = 1;
+ return 0;
+}
+EXPORT_SYMBOL(sock_tx_timestamp);
+
static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
struct msghdr *msg, size_t size)
{
@@ -595,33 +607,65 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
return result;
}
+static int ktime2ts(ktime_t kt, struct timespec *ts)
+{
+ if (kt.tv64) {
+ *ts = ktime_to_timespec(kt);
+ return 1;
+ } else {
+ return 0;
+ }
+}
+
/*
* called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
*/
void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
struct sk_buff *skb)
{
- ktime_t kt = skb->tstamp;
-
- if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) {
- struct timeval tv;
- /* Race occurred between timestamp enabling and packet
- receiving. Fill in the current time for now. */
- if (kt.tv64 == 0)
- kt = ktime_get_real();
- skb->tstamp = kt;
- tv = ktime_to_timeval(kt);
- put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP, sizeof(tv), &tv);
- } else {
- struct timespec ts;
- /* Race occurred between timestamp enabling and packet
- receiving. Fill in the current time for now. */
- if (kt.tv64 == 0)
- kt = ktime_get_real();
- skb->tstamp = kt;
- ts = ktime_to_timespec(kt);
- put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS, sizeof(ts), &ts);
+ int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP);
+ struct timespec ts[3];
+ int empty = 1;
+ struct skb_shared_hwtstamps *shhwtstamps =
+ skb_hwtstamps(skb);
+
+ /* Race occurred between timestamp enabling and packet
+ receiving. Fill in the current time for now. */
+ if (need_software_tstamp && skb->tstamp.tv64 == 0)
+ __net_timestamp(skb);
+
+ if (need_software_tstamp) {
+ if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+ struct timeval tv;
+ skb_get_timestamp(skb, &tv);
+ put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+ sizeof(tv), &tv);
+ } else {
+ struct timespec ts;
+ skb_get_timestampns(skb, &ts);
+ put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
+ sizeof(ts), &ts);
+ }
+ }
+
+
+ memset(ts, 0, sizeof(ts));
+ if (skb->tstamp.tv64 &&
+ sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) {
+ skb_get_timestampns(skb, ts + 0);
+ empty = 0;
+ }
+ if (shhwtstamps) {
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE) &&
+ ktime2ts(shhwtstamps->syststamp, ts + 1))
+ empty = 0;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE) &&
+ ktime2ts(shhwtstamps->hwtstamp, ts + 2))
+ empty = 0;
}
+ if (!empty)
+ put_cmsg(msg, SOL_SOCKET,
+ SCM_TIMESTAMPING, sizeof(ts), &ts);
}
EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets
2009-02-12 15:03 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
Documentation/networking/timestamping.txt | 2 ++
include/net/ip.h | 1 +
net/can/raw.c | 3 +++
net/ipv4/icmp.c | 2 ++
net/ipv4/ip_output.c | 6 ++++++
net/ipv4/raw.c | 1 +
net/ipv4/udp.c | 4 ++++
7 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index a681a65..0e58b45 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -56,6 +56,8 @@ and including the link layer, the scm_timestamping control message and
a sock_extended_err control message with ee_errno==ENOMSG and
ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending
bounced packet is ready for reading as far as select() is concerned.
+If the outgoing packet has to be fragmented, then only the first
+fragment is time stamped and returned to the sending socket.
All three values correspond to the same event in time, but were
generated in different ways. Each of these values may be empty (= all
diff --git a/include/net/ip.h b/include/net/ip.h
index 1086813..4ac7577 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -55,6 +55,7 @@ struct ipcm_cookie
__be32 addr;
int oif;
struct ip_options *opt;
+ union skb_shared_tx shtx;
};
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/can/raw.c b/net/can/raw.c
index 0703cba..6aa154e 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -648,6 +648,9 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
if (err < 0)
goto free_skb;
+ err = sock_tx_timestamp(msg, sk, skb_tx(skb));
+ if (err < 0)
+ goto free_skb;
skb->dev = dev;
skb->sk = sk;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 705b33b..382800a 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -375,6 +375,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
inet->tos = ip_hdr(skb)->tos;
daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
+ ipc.shtx.flags = 0;
if (icmp_param->replyopts.optlen) {
ipc.opt = &icmp_param->replyopts;
if (ipc.opt->srr)
@@ -532,6 +533,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
inet_sk(sk)->tos = tos;
ipc.addr = iph->saddr;
ipc.opt = &icmp_param.replyopts;
+ ipc.shtx.flags = 0;
{
struct flowi fl = {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8ebe86d..3e7e910 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -935,6 +935,10 @@ alloc_new_skb:
sk->sk_allocation);
if (unlikely(skb == NULL))
err = -ENOBUFS;
+ else
+ /* only the initial fragment is
+ time stamped */
+ ipc->shtx.flags = 0;
}
if (skb == NULL)
goto error;
@@ -945,6 +949,7 @@ alloc_new_skb:
skb->ip_summed = csummode;
skb->csum = 0;
skb_reserve(skb, hh_len);
+ *skb_tx(skb) = ipc->shtx;
/*
* Find where to start putting bytes.
@@ -1364,6 +1369,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
+ ipc.shtx.flags = 0;
if (replyopts.opt.optlen) {
ipc.opt = &replyopts.opt;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index dff8bc4..f774651 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -493,6 +493,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.addr = inet->saddr;
ipc.opt = NULL;
+ ipc.shtx.flags = 0;
ipc.oif = sk->sk_bound_dev_if;
if (msg->msg_controllen) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c47c989..4bd178a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -596,6 +596,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
return -EOPNOTSUPP;
ipc.opt = NULL;
+ ipc.shtx.flags = 0;
if (up->pending) {
/*
@@ -643,6 +644,9 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.addr = inet->saddr;
ipc.oif = sk->sk_bound_dev_if;
+ err = sock_tx_timestamp(msg, sk, &ipc.shtx);
+ if (err)
+ return err;
if (msg->msg_controllen) {
err = ip_cmsg_send(sock_net(sk), msg, &ipc);
if (err)
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers
2009-02-12 15:03 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
fs/compat_ioctl.c | 6 ++++++
net/core/dev.c | 2 ++
2 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index c03c10d..ab7585a 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -522,6 +522,11 @@ static int dev_ifsioc(unsigned int fd, unsigned int cmd, unsigned long arg)
if (err)
return -EFAULT;
break;
+ case SIOCSHWTSTAMP:
+ if (copy_from_user(&ifr, uifr32, sizeof(*uifr32)))
+ return -EFAULT;
+ ifr.ifr_data = compat_ptr(uifr32->ifr_ifru.ifru_data);
+ break;
default:
if (copy_from_user(&ifr, uifr32, sizeof(*uifr32)))
return -EFAULT;
@@ -2563,6 +2568,7 @@ HANDLE_IOCTL(SIOCSIFMAP, dev_ifsioc)
HANDLE_IOCTL(SIOCGIFADDR, dev_ifsioc)
HANDLE_IOCTL(SIOCSIFADDR, dev_ifsioc)
HANDLE_IOCTL(SIOCSIFHWBROADCAST, dev_ifsioc)
+HANDLE_IOCTL(SIOCSHWTSTAMP, dev_ifsioc)
/* ioctls used by appletalk ddp.c */
HANDLE_IOCTL(SIOCATALKDIFADDR, dev_ifsioc)
diff --git a/net/core/dev.c b/net/core/dev.c
index d20c28e..d393fc9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4019,6 +4019,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
cmd == SIOCSMIIREG ||
cmd == SIOCBRADDIF ||
cmd == SIOCBRDELIF ||
+ cmd == SIOCSHWTSTAMP ||
cmd == SIOCWANDEV) {
err = -EOPNOTSUPP;
if (ops->ndo_do_ioctl) {
@@ -4173,6 +4174,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
case SIOCBONDCHANGEACTIVE:
case SIOCBRADDIF:
case SIOCBRDELIF:
+ case SIOCSHWTSTAMP:
if (!capable(CAP_NET_ADMIN))
return -EPERM;
/* fall through */
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 08/10] igb: access to NIC time
2009-02-12 15:03 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Adds the register definitions and code to read the time
register.
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
drivers/net/igb/e1000_regs.h | 28 +++++++++++
drivers/net/igb/igb.h | 4 ++
drivers/net/igb/igb_main.c | 111 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 143 insertions(+), 0 deletions(-)
diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h
index 5038b73..64d95cd 100644
--- a/drivers/net/igb/e1000_regs.h
+++ b/drivers/net/igb/e1000_regs.h
@@ -75,6 +75,34 @@
#define E1000_FCRTH 0x02168 /* Flow Control Receive Threshold High - RW */
#define E1000_RDFPCQ(_n) (0x02430 + (0x4 * (_n)))
#define E1000_FCRTV 0x02460 /* Flow Control Refresh Timer Value - RW */
+
+/* IEEE 1588 TIMESYNCH */
+#define E1000_TSYNCTXCTL 0x0B614
+#define E1000_TSYNCRXCTL 0x0B620
+#define E1000_TSYNCRXCFG 0x05F50
+
+#define E1000_SYSTIML 0x0B600
+#define E1000_SYSTIMH 0x0B604
+#define E1000_TIMINCA 0x0B608
+
+#define E1000_RXMTRL 0x0B634
+#define E1000_RXSTMPL 0x0B624
+#define E1000_RXSTMPH 0x0B628
+#define E1000_RXSATRL 0x0B62C
+#define E1000_RXSATRH 0x0B630
+
+#define E1000_TXSTMPL 0x0B618
+#define E1000_TXSTMPH 0x0B61C
+
+#define E1000_ETQF0 0x05CB0
+#define E1000_ETQF1 0x05CB4
+#define E1000_ETQF2 0x05CB8
+#define E1000_ETQF3 0x05CBC
+#define E1000_ETQF4 0x05CC0
+#define E1000_ETQF5 0x05CC4
+#define E1000_ETQF6 0x05CC8
+#define E1000_ETQF7 0x05CCC
+
/* Split and Replication RX Control - RW */
/*
* Convenience macros
diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index e507449..797a9fe 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -34,6 +34,8 @@
#include "e1000_mac.h"
#include "e1000_82575.h"
+#include <linux/clocksource.h>
+
struct igb_adapter;
/* Interrupt defines */
@@ -251,6 +253,8 @@ struct igb_adapter {
struct napi_struct napi;
struct pci_dev *pdev;
struct net_device_stats net_stats;
+ struct cyclecounter cycles;
+ struct timecounter clock;
/* structs defined in e1000_hw.h */
struct e1000_hw hw;
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index f8c2919..8b2ba42 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -175,6 +175,54 @@ MODULE_DESCRIPTION("Intel(R) Gigabit Ethernet Network Driver");
MODULE_LICENSE("GPL");
MODULE_VERSION(DRV_VERSION);
+/**
+ * Scale the NIC clock cycle by a large factor so that
+ * relatively small clock corrections can be added or
+ * substracted at each clock tick. The drawbacks of a
+ * large factor are a) that the clock register overflows
+ * more quickly (not such a big deal) and b) that the
+ * increment per tick has to fit into 24 bits.
+ *
+ * Note that
+ * TIMINCA = IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS *
+ * IGB_TSYNC_SCALE
+ * TIMINCA += TIMINCA * adjustment [ppm] / 1e9
+ *
+ * The base scale factor is intentionally a power of two
+ * so that the division in %struct timecounter can be done with
+ * a shift.
+ */
+#define IGB_TSYNC_SHIFT (19)
+#define IGB_TSYNC_SCALE (1<<IGB_TSYNC_SHIFT)
+
+/**
+ * The duration of one clock cycle of the NIC.
+ *
+ * @todo This hard-coded value is part of the specification and might change
+ * in future hardware revisions. Add revision check.
+ */
+#define IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS 16
+
+#if (IGB_TSYNC_SCALE * IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS) >= (1<<24)
+# error IGB_TSYNC_SCALE and/or IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS are too large to fit into TIMINCA
+#endif
+
+/**
+ * igb_read_clock - read raw cycle counter (to be used by time counter)
+ */
+static cycle_t igb_read_clock(const struct cyclecounter *tc)
+{
+ struct igb_adapter *adapter =
+ container_of(tc, struct igb_adapter, cycles);
+ struct e1000_hw *hw = &adapter->hw;
+ u64 stamp;
+
+ stamp = rd32(E1000_SYSTIML);
+ stamp |= (u64)rd32(E1000_SYSTIMH) << 32ULL;
+
+ return stamp;
+}
+
#ifdef DEBUG
/**
* igb_get_hw_dev_name - return device name string
@@ -185,6 +233,29 @@ char *igb_get_hw_dev_name(struct e1000_hw *hw)
struct igb_adapter *adapter = hw->back;
return adapter->netdev->name;
}
+
+/**
+ * igb_get_time_str - format current NIC and system time as string
+ */
+static char *igb_get_time_str(struct igb_adapter *adapter,
+ char buffer[160])
+{
+ cycle_t hw = adapter->cycles.read(&adapter->cycles);
+ struct timespec nic = ns_to_timespec(timecounter_read(&adapter->clock));
+ struct timespec sys;
+ struct timespec delta;
+ getnstimeofday(&sys);
+
+ delta = timespec_sub(nic, sys);
+
+ sprintf(buffer,
+ "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ (long)nic.tv_sec, nic.tv_nsec,
+ (long)sys.tv_sec, sys.tv_nsec,
+ (long)delta.tv_sec, delta.tv_nsec);
+
+ return buffer;
+}
#endif
/**
@@ -1298,6 +1369,46 @@ static int __devinit igb_probe(struct pci_dev *pdev,
}
#endif
+ /*
+ * Initialize hardware timer: we keep it running just in case
+ * that some program needs it later on.
+ */
+ memset(&adapter->cycles, 0, sizeof(adapter->cycles));
+ adapter->cycles.read = igb_read_clock;
+ adapter->cycles.mask = CLOCKSOURCE_MASK(64);
+ adapter->cycles.mult = 1;
+ adapter->cycles.shift = IGB_TSYNC_SHIFT;
+ wr32(E1000_TIMINCA,
+ (1<<24) |
+ IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS * IGB_TSYNC_SCALE);
+#if 0
+ /*
+ * Avoid rollover while we initialize by resetting the time counter.
+ */
+ wr32(E1000_SYSTIML, 0x00000000);
+ wr32(E1000_SYSTIMH, 0x00000000);
+#else
+ /*
+ * Set registers so that rollover occurs soon to test this.
+ */
+ wr32(E1000_SYSTIML, 0x00000000);
+ wr32(E1000_SYSTIMH, 0xFF800000);
+#endif
+ wrfl();
+ timecounter_init(&adapter->clock,
+ &adapter->cycles,
+ ktime_to_ns(ktime_get_real()));
+
+#ifdef DEBUG
+ {
+ char buffer[160];
+ printk(KERN_DEBUG
+ "igb: %s: hw %p initialized timer\n",
+ igb_get_time_str(adapter, buffer),
+ &adapter->hw);
+ }
+#endif
+
dev_info(&pdev->dev, "Intel(R) Gigabit Ethernet Network Connection\n");
/* print bus type/speed/width info */
dev_info(&pdev->dev, "%s: (PCIe:%s:%s) %pM\n",
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP
2009-02-12 15:03 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
drivers/net/igb/igb_main.c | 31 +++++++++++++++++++++++++++++++
1 files changed, 31 insertions(+), 0 deletions(-)
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 8b2ba42..90090bb 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -34,6 +34,7 @@
#include <linux/ipv6.h>
#include <net/checksum.h>
#include <net/ip6_checksum.h>
+#include <linux/net_tstamp.h>
#include <linux/mii.h>
#include <linux/ethtool.h>
#include <linux/if_vlan.h>
@@ -4182,6 +4183,34 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
}
/**
+ * igb_hwtstamp_ioctl - control hardware time stamping
+ * @netdev:
+ * @ifreq:
+ * @cmd:
+ *
+ * Currently cannot enable any kind of hardware time stamping, but
+ * supports SIOCSHWTSTAMP in general.
+ **/
+static int igb_hwtstamp_ioctl(struct net_device *netdev,
+ struct ifreq *ifr, int cmd)
+{
+ struct hwtstamp_config config;
+
+ if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
+ return -EFAULT;
+
+ /* reserved for future extensions */
+ if (config.flags)
+ return -EINVAL;
+
+ if (config.tx_type == HWTSTAMP_TX_OFF &&
+ config.rx_filter == HWTSTAMP_FILTER_NONE)
+ return 0;
+
+ return -ERANGE;
+}
+
+/**
* igb_ioctl -
* @netdev:
* @ifreq:
@@ -4194,6 +4223,8 @@ static int igb_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
case SIOCGMIIREG:
case SIOCSMIIREG:
return igb_mii_ioctl(netdev, ifr, cmd);
+ case SIOCSHWTSTAMP:
+ return igb_hwtstamp_ioctl(netdev, ifr, cmd);
default:
return -EOPNOTSUPP;
}
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping
2009-02-12 15:03 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly
@ 2009-02-12 15:03 ` Patrick Ohly
0 siblings, 0 replies; 37+ messages in thread
From: Patrick Ohly @ 2009-02-12 15:03 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, John Stultz, John Ronciak, Patrick Ohly
Both TX and RX hardware time stamping are implemented. Due to
hardware limitations it is not possible to verify reliably which
packet was time stamped when multiple were pending for sending; this
could be solved by only allowing one packet marked for hardware time
stamping into the queue (not implemented yet).
RX time stamping relies on the flag in the packet descriptor which
marks packets that were time stamped. In "all packet" mode this flag
is not set. TODO: also support that mode (even though it'll suffer
from race conditions).
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
drivers/net/igb/e1000_82575.h | 1 +
drivers/net/igb/e1000_defines.h | 1 +
drivers/net/igb/e1000_regs.h | 40 ++++++
drivers/net/igb/igb.h | 4 +
drivers/net/igb/igb_main.c | 266 +++++++++++++++++++++++++++++++++++++--
5 files changed, 304 insertions(+), 8 deletions(-)
diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h
index dd50237..e613d5a 100644
--- a/drivers/net/igb/e1000_82575.h
+++ b/drivers/net/igb/e1000_82575.h
@@ -116,6 +116,7 @@ union e1000_adv_tx_desc {
};
/* Adv Transmit Descriptor Config Masks */
+#define E1000_ADVTXD_MAC_TSTAMP 0x00080000 /* IEEE1588 Timestamp packet */
#define E1000_ADVTXD_DTYP_CTXT 0x00200000 /* Advanced Context Descriptor */
#define E1000_ADVTXD_DTYP_DATA 0x00300000 /* Advanced Data Descriptor */
#define E1000_ADVTXD_DCMD_IFCS 0x02000000 /* Insert FCS (Ethernet CRC) */
diff --git a/drivers/net/igb/e1000_defines.h b/drivers/net/igb/e1000_defines.h
index 5342e23..79168ee 100644
--- a/drivers/net/igb/e1000_defines.h
+++ b/drivers/net/igb/e1000_defines.h
@@ -104,6 +104,7 @@
#define E1000_RXD_STAT_UDPCS 0x10 /* UDP xsum calculated */
#define E1000_RXD_STAT_TCPCS 0x20 /* TCP xsum calculated */
#define E1000_RXD_STAT_DYNINT 0x800 /* Pkt caused INT via DYNINT */
+#define E1000_RXD_STAT_TS 0x10000 /* Pkt was time stamped */
#define E1000_RXD_ERR_CE 0x01 /* CRC Error */
#define E1000_RXD_ERR_SE 0x02 /* Symbol Error */
#define E1000_RXD_ERR_SEQ 0x04 /* Sequence Error */
diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h
index 64d95cd..1fb19ca 100644
--- a/drivers/net/igb/e1000_regs.h
+++ b/drivers/net/igb/e1000_regs.h
@@ -78,9 +78,37 @@
/* IEEE 1588 TIMESYNCH */
#define E1000_TSYNCTXCTL 0x0B614
+#define E1000_TSYNCTXCTL_VALID (1<<0)
+#define E1000_TSYNCTXCTL_ENABLED (1<<4)
#define E1000_TSYNCRXCTL 0x0B620
+#define E1000_TSYNCRXCTL_VALID (1<<0)
+#define E1000_TSYNCRXCTL_ENABLED (1<<4)
+enum {
+ E1000_TSYNCRXCTL_TYPE_L2_V2 = 0,
+ E1000_TSYNCRXCTL_TYPE_L4_V1 = (1<<1),
+ E1000_TSYNCRXCTL_TYPE_L2_L4_V2 = (1<<2),
+ E1000_TSYNCRXCTL_TYPE_ALL = (1<<3),
+ E1000_TSYNCRXCTL_TYPE_EVENT_V2 = (1<<3) | (1<<1),
+};
#define E1000_TSYNCRXCFG 0x05F50
+enum {
+ E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE = 0<<0,
+ E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE = 1<<0,
+ E1000_TSYNCRXCFG_PTP_V1_FOLLOWUP_MESSAGE = 2<<0,
+ E1000_TSYNCRXCFG_PTP_V1_DELAY_RESP_MESSAGE = 3<<0,
+ E1000_TSYNCRXCFG_PTP_V1_MANAGEMENT_MESSAGE = 4<<0,
+ E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE = 0<<8,
+ E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE = 1<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_REQ_MESSAGE = 2<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_RESP_MESSAGE = 3<<8,
+ E1000_TSYNCRXCFG_PTP_V2_FOLLOWUP_MESSAGE = 8<<8,
+ E1000_TSYNCRXCFG_PTP_V2_DELAY_RESP_MESSAGE = 9<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_FOLLOWUP_MESSAGE = 0xA<<8,
+ E1000_TSYNCRXCFG_PTP_V2_ANNOUNCE_MESSAGE = 0xB<<8,
+ E1000_TSYNCRXCFG_PTP_V2_SIGNALLING_MESSAGE = 0xC<<8,
+ E1000_TSYNCRXCFG_PTP_V2_MANAGEMENT_MESSAGE = 0xD<<8,
+};
#define E1000_SYSTIML 0x0B600
#define E1000_SYSTIMH 0x0B604
#define E1000_TIMINCA 0x0B608
@@ -103,6 +131,18 @@
#define E1000_ETQF6 0x05CC8
#define E1000_ETQF7 0x05CCC
+/* Filtering Registers */
+#define E1000_SAQF(_n) (0x5980 + 4 * (_n))
+#define E1000_DAQF(_n) (0x59A0 + 4 * (_n))
+#define E1000_SPQF(_n) (0x59C0 + 4 * (_n))
+#define E1000_FTQF(_n) (0x59E0 + 4 * (_n))
+#define E1000_SAQF0 E1000_SAQF(0)
+#define E1000_DAQF0 E1000_DAQF(0)
+#define E1000_SPQF0 E1000_SPQF(0)
+#define E1000_FTQF0 E1000_FTQF(0)
+#define E1000_SYNQF(_n) (0x055FC + (4 * (_n))) /* SYN Packet Queue Fltr */
+#define E1000_ETQF(_n) (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */
+
/* Split and Replication RX Control - RW */
/*
* Convenience macros
diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index 797a9fe..bb8c35c 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -35,6 +35,8 @@
#include "e1000_82575.h"
#include <linux/clocksource.h>
+#include <linux/timecompare.h>
+#include <linux/net_tstamp.h>
struct igb_adapter;
@@ -255,6 +257,8 @@ struct igb_adapter {
struct net_device_stats net_stats;
struct cyclecounter cycles;
struct timecounter clock;
+ struct timecompare compare;
+ struct hwtstamp_config hwtstamp_config;
/* structs defined in e1000_hw.h */
struct e1000_hw hw;
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 90090bb..f9d576b 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -250,7 +250,8 @@ static char *igb_get_time_str(struct igb_adapter *adapter,
delta = timespec_sub(nic, sys);
sprintf(buffer,
- "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ "HW %llu, NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ hw,
(long)nic.tv_sec, nic.tv_nsec,
(long)sys.tv_sec, sys.tv_nsec,
(long)delta.tv_sec, delta.tv_nsec);
@@ -1400,6 +1401,18 @@ static int __devinit igb_probe(struct pci_dev *pdev,
&adapter->cycles,
ktime_to_ns(ktime_get_real()));
+ /*
+ * Synchronize our NIC clock against system wall clock. NIC
+ * time stamp reading requires ~3us per sample, each sample
+ * was pretty stable even under load => only require 10
+ * samples for each offset comparison.
+ */
+ memset(&adapter->compare, 0, sizeof(adapter->compare));
+ adapter->compare.source = &adapter->clock;
+ adapter->compare.target = ktime_get_real;
+ adapter->compare.num_samples = 10;
+ timecompare_update(&adapter->compare, 0);
+
#ifdef DEBUG
{
char buffer[160];
@@ -2748,6 +2761,7 @@ set_itr_now:
#define IGB_TX_FLAGS_VLAN 0x00000002
#define IGB_TX_FLAGS_TSO 0x00000004
#define IGB_TX_FLAGS_IPV4 0x00000008
+#define IGB_TX_FLAGS_TSTAMP 0x00000010
#define IGB_TX_FLAGS_VLAN_MASK 0xffff0000
#define IGB_TX_FLAGS_VLAN_SHIFT 16
@@ -2975,6 +2989,9 @@ static inline void igb_tx_queue_adv(struct igb_adapter *adapter,
if (tx_flags & IGB_TX_FLAGS_VLAN)
cmd_type_len |= E1000_ADVTXD_DCMD_VLE;
+ if (tx_flags & IGB_TX_FLAGS_TSTAMP)
+ cmd_type_len |= E1000_ADVTXD_MAC_TSTAMP;
+
if (tx_flags & IGB_TX_FLAGS_TSO) {
cmd_type_len |= E1000_ADVTXD_DCMD_TSE;
@@ -3065,6 +3082,7 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb,
unsigned int tx_flags = 0;
u8 hdr_len = 0;
int tso = 0;
+ union skb_shared_tx *shtx;
if (test_bit(__IGB_DOWN, &adapter->state)) {
dev_kfree_skb_any(skb);
@@ -3085,7 +3103,29 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb,
/* this is a hard error */
return NETDEV_TX_BUSY;
}
- skb_orphan(skb);
+
+ /*
+ * TODO: check that there currently is no other packet with
+ * time stamping in the queue
+ *
+ * When doing time stamping, keep the connection to the socket
+ * a while longer: it is still needed by skb_hwtstamp_tx(),
+ * called either in igb_tx_hwtstamp() or by our caller when
+ * doing software time stamping.
+ */
+ shtx = skb_tx(skb);
+ if (unlikely(shtx->hardware)) {
+ shtx->in_progress = 1;
+ tx_flags |= IGB_TX_FLAGS_TSTAMP;
+ } else if (likely(!shtx->software)) {
+ /*
+ * TODO: can this be solved in dev.c:dev_hard_start_xmit()?
+ * There are probably unmodified driver which do something
+ * like this and thus don't work in combination with
+ * SOF_TIMESTAMPING_TX_SOFTWARE.
+ */
+ skb_orphan(skb);
+ }
if (adapter->vlgrp && vlan_tx_tag_present(skb)) {
tx_flags |= IGB_TX_FLAGS_VLAN;
@@ -3744,6 +3784,43 @@ static int igb_clean_rx_ring_msix(struct napi_struct *napi, int budget)
}
/**
+ * igb_hwtstamp - utility function which checks for TX time stamp
+ * @adapter: board private structure
+ * @skb: packet that was just sent
+ *
+ * If we were asked to do hardware stamping and such a time stamp is
+ * available, then it must have been for this skb here because we only
+ * allow only one such packet into the queue.
+ */
+static void igb_tx_hwtstamp(struct igb_adapter *adapter, struct sk_buff *skb)
+{
+ union skb_shared_tx *shtx = skb_tx(skb);
+ struct e1000_hw *hw = &adapter->hw;
+
+ if (unlikely(shtx->hardware)) {
+ u32 valid = rd32(E1000_TSYNCTXCTL) & E1000_TSYNCTXCTL_VALID;
+ if (valid) {
+ u64 regval = rd32(E1000_TXSTMPL);
+ u64 ns;
+ struct skb_shared_hwtstamps shhwtstamps;
+
+ memset(&shhwtstamps, 0, sizeof(shhwtstamps));
+ regval |= (u64)rd32(E1000_TXSTMPH) << 32;
+ ns = timecounter_cyc2time(&adapter->clock,
+ regval);
+ timecompare_update(&adapter->compare, ns);
+ shhwtstamps.hwtstamp = ns_to_ktime(ns);
+ shhwtstamps.syststamp =
+ timecompare_transform(&adapter->compare, ns);
+ skb_tstamp_tx(skb, &shhwtstamps);
+ }
+
+ /* delayed orphaning: skb_tstamp_tx() needs the socket */
+ skb_orphan(skb);
+ }
+}
+
+/**
* igb_clean_tx_irq - Reclaim resources after transmit completes
* @adapter: board private structure
* returns true if ring is completely cleaned
@@ -3781,6 +3858,8 @@ static bool igb_clean_tx_irq(struct igb_ring *tx_ring)
skb->len;
total_packets += segs;
total_bytes += bytecount;
+
+ igb_tx_hwtstamp(adapter, skb);
}
igb_unmap_and_free_tx_resource(adapter, buffer_info);
@@ -3914,6 +3993,7 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring,
{
struct igb_adapter *adapter = rx_ring->adapter;
struct net_device *netdev = adapter->netdev;
+ struct e1000_hw *hw = &adapter->hw;
struct pci_dev *pdev = adapter->pdev;
union e1000_adv_rx_desc *rx_desc , *next_rxd;
struct igb_buffer *buffer_info , *next_buffer;
@@ -4006,6 +4086,47 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring,
goto next_desc;
}
send_up:
+ /*
+ * If this bit is set, then the RX registers contain
+ * the time stamp. No other packet will be time
+ * stamped until we read these registers, so read the
+ * registers to make them available again. Because
+ * only one packet can be time stamped at a time, we
+ * know that the register values must belong to this
+ * one here and therefore we don't need to compare
+ * any of the additional attributes stored for it.
+ *
+ * If nothing went wrong, then it should have a
+ * skb_shared_tx that we can turn into a
+ * skb_shared_hwtstamps.
+ *
+ * TODO: can time stamping be triggered (thus locking
+ * the registers) without the packet reaching this point
+ * here? In that case RX time stamping would get stuck.
+ *
+ * TODO: in "time stamp all packets" mode this bit is
+ * not set. Need a global flag for this mode and then
+ * always read the registers. Cannot be done without
+ * a race condition.
+ */
+ if (unlikely(staterr & E1000_RXD_STAT_TS)) {
+ u64 regval;
+ u64 ns;
+ struct skb_shared_hwtstamps *shhwtstamps =
+ skb_hwtstamps(skb);
+
+ WARN(!(rd32(E1000_TSYNCRXCTL) & E1000_TSYNCRXCTL_VALID),
+ "igb: no RX time stamp available for time stamped packet");
+ regval = rd32(E1000_RXSTMPL);
+ regval |= (u64)rd32(E1000_RXSTMPH) << 32;
+ ns = timecounter_cyc2time(&adapter->clock, regval);
+ timecompare_update(&adapter->compare, ns);
+ memset(shhwtstamps, 0, sizeof(*shhwtstamps));
+ shhwtstamps->hwtstamp = ns_to_ktime(ns);
+ shhwtstamps->syststamp =
+ timecompare_transform(&adapter->compare, ns);
+ }
+
if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
dev_kfree_skb_irq(skb);
goto next_desc;
@@ -4188,13 +4309,33 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
* @ifreq:
* @cmd:
*
- * Currently cannot enable any kind of hardware time stamping, but
- * supports SIOCSHWTSTAMP in general.
+ * Outgoing time stamping can be enabled and disabled. Play nice and
+ * disable it when requested, although it shouldn't case any overhead
+ * when no packet needs it. At most one packet in the queue may be
+ * marked for time stamping, otherwise it would be impossible to tell
+ * for sure to which packet the hardware time stamp belongs.
+ *
+ * Incoming time stamping has to be configured via the hardware
+ * filters. Not all combinations are supported, in particular event
+ * type has to be specified. Matching the kind of event packet is
+ * not supported, with the exception of "all V2 events regardless of
+ * level 2 or 4".
+ *
**/
static int igb_hwtstamp_ioctl(struct net_device *netdev,
struct ifreq *ifr, int cmd)
{
+ struct igb_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
struct hwtstamp_config config;
+ u32 tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED;
+ u32 tsync_rx_ctl_bit = E1000_TSYNCRXCTL_ENABLED;
+ u32 tsync_rx_ctl_type = 0;
+ u32 tsync_rx_cfg = 0;
+ int is_l4 = 0;
+ int is_l2 = 0;
+ short port = 319; /* PTP */
+ u32 regval;
if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
return -EFAULT;
@@ -4203,11 +4344,120 @@ static int igb_hwtstamp_ioctl(struct net_device *netdev,
if (config.flags)
return -EINVAL;
- if (config.tx_type == HWTSTAMP_TX_OFF &&
- config.rx_filter == HWTSTAMP_FILTER_NONE)
- return 0;
+ switch (config.tx_type) {
+ case HWTSTAMP_TX_OFF:
+ tsync_tx_ctl_bit = 0;
+ break;
+ case HWTSTAMP_TX_ON:
+ tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ switch (config.rx_filter) {
+ case HWTSTAMP_FILTER_NONE:
+ tsync_rx_ctl_bit = 0;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+ case HWTSTAMP_FILTER_ALL:
+ /*
+ * register TSYNCRXCFG must be set, therefore it is not
+ * possible to time stamp both Sync and Delay_Req messages
+ * => fall back to time stamping all packets
+ */
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_ALL;
+ config.rx_filter = HWTSTAMP_FILTER_ALL;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE;
+ is_l4 = 1;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE;
+ is_l4 = 1;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE;
+ is_l2 = 1;
+ is_l4 = 1;
+ config.rx_filter = HWTSTAMP_FILTER_SOME;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+ case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE;
+ is_l2 = 1;
+ is_l4 = 1;
+ config.rx_filter = HWTSTAMP_FILTER_SOME;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_EVENT_V2;
+ config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
+ is_l2 = 1;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ /* enable/disable TX */
+ regval = rd32(E1000_TSYNCTXCTL);
+ regval = (regval & ~E1000_TSYNCTXCTL_ENABLED) | tsync_tx_ctl_bit;
+ wr32(E1000_TSYNCTXCTL, regval);
+
+ /* enable/disable RX, define which PTP packets are time stamped */
+ regval = rd32(E1000_TSYNCRXCTL);
+ regval = (regval & ~E1000_TSYNCRXCTL_ENABLED) | tsync_rx_ctl_bit;
+ regval = (regval & ~0xE) | tsync_rx_ctl_type;
+ wr32(E1000_TSYNCRXCTL, regval);
+ wr32(E1000_TSYNCRXCFG, tsync_rx_cfg);
+
+ /*
+ * Ethertype Filter Queue Filter[0][15:0] = 0x88F7
+ * (Ethertype to filter on)
+ * Ethertype Filter Queue Filter[0][26] = 0x1 (Enable filter)
+ * Ethertype Filter Queue Filter[0][30] = 0x1 (Enable Timestamping)
+ */
+ wr32(E1000_ETQF0, is_l2 ? 0x440088f7 : 0);
+
+ /* L4 Queue Filter[0]: only filter by source and destination port */
+ wr32(E1000_SPQF0, htons(port));
+ wr32(E1000_IMIREXT(0), is_l4 ?
+ ((1<<12) | (1<<19) /* bypass size and control flags */) : 0);
+ wr32(E1000_IMIR(0), is_l4 ?
+ (htons(port)
+ | (0<<16) /* immediate interrupt disabled */
+ | 0 /* (1<<17) bit cleared: do not bypass
+ destination port check */)
+ : 0);
+ wr32(E1000_FTQF0, is_l4 ?
+ (0x11 /* UDP */
+ | (1<<15) /* VF not compared */
+ | (1<<27) /* Enable Timestamping */
+ | (7<<28) /* only source port filter enabled,
+ source/target address and protocol
+ masked */)
+ : ((1<<15) | (15<<28) /* all mask bits set = filter not
+ enabled */));
+
+ wrfl();
+
+ adapter->hwtstamp_config = config;
+
+ /* clear TX/RX time stamp registers, just to be sure */
+ regval = rd32(E1000_TXSTMPH);
+ regval = rd32(E1000_RXSTMPH);
- return -ERANGE;
+ return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ?
+ -EFAULT : 0;
}
/**
--
1.6.1.2
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo
2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
@ 2009-02-16 7:16 ` David Miller
2009-02-16 7:44 ` Patrick Ohly
1 sibling, 1 reply; 37+ messages in thread
From: David Miller @ 2009-02-16 7:16 UTC (permalink / raw)
To: patrick.ohly; +Cc: johnstul, john.ronciak, linux-kernel, netdev
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Thu, 12 Feb 2009 16:03:01 +0100
> This revision of the patch series transports hardware time stamps from
> the hardware into user space via additional fields in struct
> skb_shared_info. It's based on net-next-2.6 as of this morning.
>
> This is the solution that emerged from the discussion of the other
> approaches suggested before (reuse tstamp, extend skbuff, optional
> structs): it has the advantage of not touching skbuff and also has the
> simplest implementation.
>
> The clocksource and timecompare patches have been reviewed by John
> Stultz. He doesn't mind merging them via the net subtree.
>
> John Ronciak reviewed the igb driver patches. He suggested to merge the
> patches as they; after all, PTPd already works fine. I just tested again
> on 32 and 64 bit x86, both with the timestamping example programs as
> well as with PTPd. Latest patched PTPd is here:
> http://github.com/pohly/ptpd/tree/master
>
> The open TODOs in the igb driver will be fixed. We think that this will
> be easier with the infrastructure and the driver in a regular kernel
> tree.
Ok, I applied everything and pushed it out to net-next-2.6, let's
see how this goes :-)
That TX clone wrt. skb_orphan() issue will need a happier solution.
Can you describe that problem in detail? Maybe someone can come
up with a way to avoid that stuff.
Thanks.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo
2009-02-16 7:16 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo David Miller
@ 2009-02-16 7:44 ` Patrick Ohly
2009-02-21 9:15 ` Patrick Ohly
0 siblings, 1 reply; 37+ messages in thread
From: Patrick Ohly @ 2009-02-16 7:44 UTC (permalink / raw)
To: David Miller
Cc: johnstul@us.ibm.com, Ronciak, John, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
On Mon, 2009-02-16 at 09:16 +0200, David Miller wrote:
> Ok, I applied everything and pushed it out to net-next-2.6, let's
> see how this goes :-)
Thanks :-)
> That TX clone wrt. skb_orphan() issue will need a happier solution.
>
> Can you describe that problem in detail? Maybe someone can come
> up with a way to avoid that stuff.
Bouncing information about a sent packet back to the sender (in
skb_tstamp_tx()) requires access to the socket via which the packet was
sent (orig_skb->sk).
This information must be available after the packet was handed to
ops->ndo_start_xmit() in order to implement the TX time stamping
software fallback for devices which don't implement hardware time
stamping (not enabled and/or not implemented at all).
The problem now is that some device drivers call skb_orphan() on the skb
in ops->ndo_start_xmit(). I suppose the intention is to free some
resources a bit earlier. The unmodified igb driver did this, the bnx2
one didn't.
If skb_orphan() is called, then TX software time stamping is not
possible. A user space daemon cannot detect this reliably, only make a
guess based on the observation that it doesn't receive any TX time
stamps. This is not fatal: the traditional time stamping method used by
PTPd (multicast loop back) still works.
But it would be nicer if that multicast loop back hack wasn't necessary.
TX software time stamping might also be more accurate (it generates the
time stamp after ops->ndo_start_xmit() returns, not before as in the
looped packet code path).
I see several ways to solve this:
* Always create another reference to skb->sk before
ops->ndo_start_xmit() if TX time stamping is requested.
Drawback: performance penalty for drivers which support TX time
stamping or don't call skb_orphan().
* Turn skb_orphan() into a nop during ops->ndo_start_xmit(), again
only if necessary. I think this can done by temporarily
overriding the skb destructor. Drawback: looks like a hack to
me, not sure whether that is really safe.
* Extend the driver API + another reference. Let drivers which
implement TX time stamping (and thus know when to avoid
skb_orphan()) signal that. dev_hard_start_xmit() then can avoid
the "additional sk reference" thing for these drivers. Drawback:
requires analyzing and potentially flagging all drivers to have
a real effect (as for bnx2).
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo
2009-02-16 7:44 ` Patrick Ohly
@ 2009-02-21 9:15 ` Patrick Ohly
0 siblings, 0 replies; 37+ messages in thread
From: Patrick Ohly @ 2009-02-21 9:15 UTC (permalink / raw)
To: David Miller
Cc: johnstul@us.ibm.com, Ronciak, John, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, Kirsher, Jeffrey T, Brandeburg, Jesse,
Tantilov, Emil S
On Mon, 2009-02-16 at 07:44 +0000, Patrick Ohly wrote:
> On Mon, 2009-02-16 at 09:16 +0200, David Miller wrote:
> > That TX clone wrt. skb_orphan() issue will need a happier solution.
> >
> > Can you describe that problem in detail? Maybe someone can come
> > up with a way to avoid that stuff.
>
> Bouncing information about a sent packet back to the sender (in
> skb_tstamp_tx()) requires access to the socket via which the packet was
> sent (orig_skb->sk).
>
> This information must be available after the packet was handed to
> ops->ndo_start_xmit() in order to implement the TX time stamping
> software fallback for devices which don't implement hardware time
> stamping (not enabled and/or not implemented at all).
There's an even more fundamental problem: the TX software fallback in
net_dev_hard_start_xmit() has to access the skb after ndo_start_xmit()
has returned successfully.
During stress tests LAD colleagues at Intel ran into kernel panics that
pointed to net_dev_hard_start_xmit() as the culprit. Removing the
current TX software fallback code solved this. My apologies for not
realizing earlier that this code is violating skb ownership rules.
We propose to remove the faulty code and then solve it in a proper way
later on. Hardware time stamping will work fine without it. I will
forward the patch separately.
> I see several ways to solve this:
> * Always create another reference to skb->sk before
> ops->ndo_start_xmit() if TX time stamping is requested.
> Drawback: performance penalty for drivers which support TX time
> stamping or don't call skb_orphan().
[...]
> * Extend the driver API + another reference. Let drivers which
> implement TX time stamping (and thus know when to avoid
> skb_orphan()) signal that. dev_hard_start_xmit() then can avoid
> the "additional sk reference" thing for these drivers. Drawback:
> requires analyzing and potentially flagging all drivers to have
> a real effect (as for bnx2).
Something along these lines with an additional reference to the skb
might work.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2009-02-21 9:15 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-12 14:57 [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly
2009-02-12 15:00 ` [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 02/10] timecompare: generic infrastructure to map between two time bases Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 03/10] net: new user space API for time stamping of incoming and outgoing packets Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 04/10] net: infrastructure for hardware time stamping Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 05/10] net: socket infrastructure for SO_TIMESTAMPING Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 06/10] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 07/10] net: pass new SIOCSHWTSTAMP through to device drivers Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 08/10] igb: access to NIC time Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 09/10] igb: stub support for SIOCSHWTSTAMP Patrick Ohly
2009-02-12 15:03 ` [PATCH NET-NEXT 10/10] igb: use timecompare to implement hardware time stamping Patrick Ohly
2009-02-16 7:16 ` [PATCH NET-NEXT 0/10] hardware time stamping with new fields in shinfo David Miller
2009-02-16 7:44 ` Patrick Ohly
2009-02-21 9:15 ` Patrick Ohly
-- strict thread matches above, loose matches on Subject: below --
2009-02-04 13:01 clock synchronization utility code Patrick Ohly
2009-02-04 13:01 ` [PATCH NET-NEXT 01/10] clocksource: allow usage independent of timekeeping.c Patrick Ohly
2009-02-04 14:03 ` Daniel Walker
2009-02-04 14:46 ` Patrick Ohly
2009-02-04 15:09 ` Daniel Walker
2009-02-04 15:24 ` Patrick Ohly
2009-02-04 19:25 ` john stultz
2009-02-04 19:40 ` Daniel Walker
2009-02-04 20:06 ` john stultz
2009-02-04 21:04 ` Daniel Walker
2009-02-04 21:15 ` john stultz
2009-02-05 0:18 ` Daniel Walker
2009-02-05 10:21 ` Patrick Ohly
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).