* [RFC] NET_SCHED: restore HZ value to /proc/net/psched
@ 2008-06-23 18:13 Stephen Hemminger
2008-06-23 19:36 ` Patrick McHardy
0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2008-06-23 18:13 UTC (permalink / raw)
To: Patrick McHardy, David Miller; +Cc: netdev
Some changes back in 2.6.22 broke a number of
other places where the iproute2 utilities depended on kernel HZ
value. The use of HZ was a bad original ABI design choice; the
question is how to fix the impacts.
During 2.6.22 development, the psched clock was converted over
to use high resolution timers, and one of the changes was to the
return value of /proc/net/psched. This file is used by utilities
to determine conversion between time and jiffies.
It would be better to go back to the original format
output where the fourth field was the kernel HZ.
The motivation for the change was to make the HTB burst size
default smaller because the kernel clock resolution was better.
But the change broke the use of jiffies by dst_metrics.
With this revert all the metric RTT values will work again.
The only use of hz in htb is to set the buffer and ceiling buffer default,
and having the old value would cause a slightly larger value. Note: the comment
in q_htb is incorrect, the value calculated is the default not the
minimum. As an example if HTB rate is 1mbit, then the default buffer
would be come 11500 (with 100 HZ) vs 1500 (with 2.6.25).
This effectively reverts commit 4361cb17f0df5491fe6e2c3ae1defc98e9a64a79
I am not 100% convinced this is the best or only solution.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/sched/sch_api.c 2008-06-23 10:55:44.000000000 -0700
+++ b/net/sched/sch_api.c 2008-06-23 10:56:11.000000000 -0700
@@ -1266,13 +1266,9 @@ EXPORT_SYMBOL(tcf_destroy_chain);
#ifdef CONFIG_PROC_FS
static int psched_show(struct seq_file *seq, void *v)
{
- struct timespec ts;
-
- hrtimer_get_res(CLOCK_MONOTONIC, &ts);
seq_printf(seq, "%08x %08x %08x %08x\n",
(u32)NSEC_PER_USEC, (u32)PSCHED_US2NS(1),
- 1000000,
- (u32)NSEC_PER_SEC/(u32)ktime_to_ns(timespec_to_ktime(ts)));
+ 1000000, HZ);
return 0;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] NET_SCHED: restore HZ value to /proc/net/psched
2008-06-23 18:13 [RFC] NET_SCHED: restore HZ value to /proc/net/psched Stephen Hemminger
@ 2008-06-23 19:36 ` Patrick McHardy
2008-06-23 20:45 ` Stephen Hemminger
0 siblings, 1 reply; 5+ messages in thread
From: Patrick McHardy @ 2008-06-23 19:36 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
Stephen Hemminger wrote:
> Some changes back in 2.6.22 broke a number of
> other places where the iproute2 utilities depended on kernel HZ
> value. The use of HZ was a bad original ABI design choice; the
> question is how to fix the impacts.
>
> During 2.6.22 development, the psched clock was converted over
> to use high resolution timers, and one of the changes was to the
> return value of /proc/net/psched. This file is used by utilities
> to determine conversion between time and jiffies.
> It would be better to go back to the original format
> output where the fourth field was the kernel HZ.
>
> The motivation for the change was to make the HTB burst size
> default smaller because the kernel clock resolution was better.
> But the change broke the use of jiffies by dst_metrics.
>
> With this revert all the metric RTT values will work again.
> The only use of hz in htb is to set the buffer and ceiling buffer default,
> and having the old value would cause a slightly larger value. Note: the comment
> in q_htb is incorrect, the value calculated is the default not the
> minimum. As an example if HTB rate is 1mbit, then the default buffer
> would be come 11500 (with 100 HZ) vs 1500 (with 2.6.25).
>
> This effectively reverts commit 4361cb17f0df5491fe6e2c3ae1defc98e9a64a79
> I am not 100% convinced this is the best or only solution.
Me neither. As you've mentioned yourself, the fact that
the interfaces use kernel HZ as unit is a bug, and the use
of /proc/net/psched is another bug (one that isn't present
for *that* long I should add, the iproute git history
unfortunately only contains one huge patch which includes
this change).
IMO qdiscs shouldn't be punished for bugs in other code
and the main problem would still remain. Two alternative
approaches:
- just fix the interface (something we've done multiple
times, even after 2.6 was released).
- add a new proc file to get the value of kernel HZ (the
fix needs a kernel change anyway, this would additionally
require a new iproute version).
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] NET_SCHED: restore HZ value to /proc/net/psched
2008-06-23 19:36 ` Patrick McHardy
@ 2008-06-23 20:45 ` Stephen Hemminger
2008-06-24 0:15 ` Patrick McHardy
0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2008-06-23 20:45 UTC (permalink / raw)
To: Patrick McHardy; +Cc: David Miller, netdev
On Mon, 23 Jun 2008 21:36:08 +0200
Patrick McHardy <kaber@trash.net> wrote:
> Stephen Hemminger wrote:
> > Some changes back in 2.6.22 broke a number of
> > other places where the iproute2 utilities depended on kernel HZ
> > value. The use of HZ was a bad original ABI design choice; the
> > question is how to fix the impacts.
> >
> > During 2.6.22 development, the psched clock was converted over
> > to use high resolution timers, and one of the changes was to the
> > return value of /proc/net/psched. This file is used by utilities
> > to determine conversion between time and jiffies.
> > It would be better to go back to the original format
> > output where the fourth field was the kernel HZ.
> >
> > The motivation for the change was to make the HTB burst size
> > default smaller because the kernel clock resolution was better.
> > But the change broke the use of jiffies by dst_metrics.
> >
> > With this revert all the metric RTT values will work again.
> > The only use of hz in htb is to set the buffer and ceiling buffer default,
> > and having the old value would cause a slightly larger value. Note: the comment
> > in q_htb is incorrect, the value calculated is the default not the
> > minimum. As an example if HTB rate is 1mbit, then the default buffer
> > would be come 11500 (with 100 HZ) vs 1500 (with 2.6.25).
> >
> > This effectively reverts commit 4361cb17f0df5491fe6e2c3ae1defc98e9a64a79
> > I am not 100% convinced this is the best or only solution.
>
> Me neither. As you've mentioned yourself, the fact that
> the interfaces use kernel HZ as unit is a bug, and the use
> of /proc/net/psched is another bug (one that isn't present
> for *that* long I should add, the iproute git history
> unfortunately only contains one huge patch which includes
> this change).
>
> IMO qdiscs shouldn't be punished for bugs in other code
> and the main problem would still remain. Two alternative
> approaches:
>
> - just fix the interface (something we've done multiple
> times, even after 2.6 was released).
It is easy if the only thing to worry about is the current kernel
and current iproute2 utilities. But if you want to deal with 3rd
party applications that use the API's already, then the problem
is much harder. I fixed the places where get_hz was used and user hz already.
Do I have to get down to:
static unsigned long kernel_version(void)
{
struct utsname uts;
unsigned a, b, c;
uname(&uts);
sscanf(uts.release, "%u.%u.%u", &a, &b, &c);
return KERNEL_VERSION(a, b, c);
}
static unsigned long kernel_hz(void)
{
unsigned long version = kernel_version();
if (version <= KERNEL_VERSION(2,6,21))
return __get_hz();
else if (version <= KERNEL_VERSION(2,6,27)) {
fprintf(stderr, "Your kernel is screwed up and I can't find correct HZ\n");
exit(1);
}
else
return get_user_hz(); /* interface is now in correct units */
}
Even HTB shouldn't really care what the kernel clock resolution is.
> - add a new proc file to get the value of kernel HZ (the
> fix needs a kernel change anyway, this would additionally
> require a new iproute version).
Don't like this might encourage new bad ABI's.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] NET_SCHED: restore HZ value to /proc/net/psched
2008-06-23 20:45 ` Stephen Hemminger
@ 2008-06-24 0:15 ` Patrick McHardy
2008-06-28 2:59 ` David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Patrick McHardy @ 2008-06-24 0:15 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
Stephen Hemminger wrote:
> On Mon, 23 Jun 2008 21:36:08 +0200
> Patrick McHardy <kaber@trash.net> wrote:
>
>> Stephen Hemminger wrote:
>>> Some changes back in 2.6.22 broke a number of
>>> other places where the iproute2 utilities depended on kernel HZ
>>> value. The use of HZ was a bad original ABI design choice; the
>>> question is how to fix the impacts.
>>>
>>> During 2.6.22 development, the psched clock was converted over
>>> to use high resolution timers, and one of the changes was to the
>>> return value of /proc/net/psched. This file is used by utilities
>>> to determine conversion between time and jiffies.
>>> It would be better to go back to the original format
>>> output where the fourth field was the kernel HZ.
>>>
>>> The motivation for the change was to make the HTB burst size
>>> default smaller because the kernel clock resolution was better.
>>> But the change broke the use of jiffies by dst_metrics.
>>>
>>> With this revert all the metric RTT values will work again.
>>> The only use of hz in htb is to set the buffer and ceiling buffer default,
>>> and having the old value would cause a slightly larger value. Note: the comment
>>> in q_htb is incorrect, the value calculated is the default not the
>>> minimum. As an example if HTB rate is 1mbit, then the default buffer
>>> would be come 11500 (with 100 HZ) vs 1500 (with 2.6.25).
>>>
>>> This effectively reverts commit 4361cb17f0df5491fe6e2c3ae1defc98e9a64a79
>>> I am not 100% convinced this is the best or only solution.
>> Me neither. As you've mentioned yourself, the fact that
>> the interfaces use kernel HZ as unit is a bug, and the use
>> of /proc/net/psched is another bug (one that isn't present
>> for *that* long I should add, the iproute git history
>> unfortunately only contains one huge patch which includes
>> this change).
>>
>> IMO qdiscs shouldn't be punished for bugs in other code
>> and the main problem would still remain. Two alternative
>> approaches:
>>
>> - just fix the interface (something we've done multiple
>> times, even after 2.6 was released).
>
> It is easy if the only thing to worry about is the current kernel
> and current iproute2 utilities. But if you want to deal with 3rd
> party applications that use the API's already, then the problem
> is much harder. I fixed the places where get_hz was used and user hz already.
Thats true, and in the case of rtmetrics it might actually matter.
But since the bug is a combination of using kernel units *and*
using /proc/net/psched to convert them, I wonder if its actually
present in more applications. Are you aware of any applications
using these rt metrics and /proc/net/psched? I would expect if
there are any at all, they are simply interpreting the values
as USER_HZ as was valid in 2.4 days. All the affected metrics
belong in the "rarely used" category.
> Do I have to get down to:
>
> static unsigned long kernel_version(void)
> {
> struct utsname uts;
> unsigned a, b, c;
>
>
> uname(&uts);
> sscanf(uts.release, "%u.%u.%u", &a, &b, &c);
> return KERNEL_VERSION(a, b, c);
> }
>
> static unsigned long kernel_hz(void)
> {
> unsigned long version = kernel_version();
>
> if (version <= KERNEL_VERSION(2,6,21))
> return __get_hz();
> else if (version <= KERNEL_VERSION(2,6,27)) {
> fprintf(stderr, "Your kernel is screwed up and I can't find correct HZ\n");
> exit(1);
> }
> else
> return get_user_hz(); /* interface is now in correct units */
> }
>
>
> Even HTB shouldn't really care what the kernel clock resolution is.
Its not only HTB, its also used for rtabs, which are used by
CBQ, HTB, TBF and policing. They really need to know about
the packet scheduler clock resolution and that has at least
historical justification. The interface is using these units on
purpose to allow cheaps computations in the kernel. It might have
been better to convert within the kernel and use ns from the
beginning, but thats unfortunately much harder to change.
>> - add a new proc file to get the value of kernel HZ (the
>> fix needs a kernel change anyway, this would additionally
>> require a new iproute version).
>
> Don't like this might encourage new bad ABI's.
Me neither, it was just an ugly alternative to the API
change, but on second though I'm even more convinced
that the metrics should be fixed.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] NET_SCHED: restore HZ value to /proc/net/psched
2008-06-24 0:15 ` Patrick McHardy
@ 2008-06-28 2:59 ` David Miller
0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2008-06-28 2:59 UTC (permalink / raw)
To: kaber; +Cc: shemminger, netdev
From: Patrick McHardy <kaber@trash.net>
Date: Tue, 24 Jun 2008 02:15:53 +0200
> Stephen Hemminger wrote:
> > On Mon, 23 Jun 2008 21:36:08 +0200
> > Patrick McHardy <kaber@trash.net> wrote:
> >
> >> Stephen Hemminger wrote:
> >> - add a new proc file to get the value of kernel HZ (the
> >> fix needs a kernel change anyway, this would additionally
> >> require a new iproute version).
> >
> > Don't like this might encourage new bad ABI's.
>
> Me neither, it was just an ugly alternative to the API
> change, but on second though I'm even more convinced
> that the metrics should be fixed.
I just read over this thread and the my conclusion is the
same as Patrick's, let's fix the metrics.
I'm therefore dropping the RFC patch that started this discussion.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-06-28 2:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-23 18:13 [RFC] NET_SCHED: restore HZ value to /proc/net/psched Stephen Hemminger
2008-06-23 19:36 ` Patrick McHardy
2008-06-23 20:45 ` Stephen Hemminger
2008-06-24 0:15 ` Patrick McHardy
2008-06-28 2:59 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).