netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH iproute2-next] tc/police: Allow 64 bit burst size
@ 2025-09-07  1:42 Jay Vosburgh
  2025-09-07  1:42 ` [PATCH 1/4 iproute2-next] lib: Update backend of print_size to accept 64 bit size Jay Vosburgh
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Jay Vosburgh @ 2025-09-07  1:42 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David Ahern


	In summary, this patchset changes the user space handling of the
tc police burst parameter to permit burst sizes that exceed 4 GB when the
specified rate is high enough that the kernel API for burst can accomodate
such.

	Additionally, if the burst exceeds the upper limit of the kernel
API, this is now flagged as an error.  The existing behavior silently
overflows, resulting in arbitrary values passed to the kernel.

	In detail, as presently implemented, the tc police burst option
limits the size of the burst to to 4 GB, i.e., UINT_MAX for a 32 bit
unsigned int.  This is a reasonable limit for the low rates common when
this was developed.  However, the underlying implementation of burst is
computed as "time at the specified rate," and for higher rates, a burst
size exceeding 4 GB is feasible without modification to the kernel.

	The burst size provided on the command line is translated into a
duration, representing how much time is required at the specified rate to
transmit the given burst size.

	This time is calculated in units of "psched ticks," each of which
is 64 nsec[0].  The computed number of psched ticks is sent to the kernel
as a __u32 value.

	Because burst is ultimately calculated as a time duration, the
real upper limit for a burst is UINT_MAX psched ticks, i.e.,

	UINT_MAX * psched tick duration / NSEC_PER_SEC
	(2^32-1) *         64           / 1E9

	which is roughly 274.88 seconds (274.8779...).

	At low rates, e.g., 5 Mbit/sec, UINT_MAX psched ticks does not
correspond to a burst size in excess of 4 GB, so the above is moot, e.g.,

	5Mbit/sec / 8 = 625000 MBytes/sec
	625000 * ~274.88 seconds = ~171800000 max burst size, below UINT_MAX

	Thus, the burst size at 5Mbit/sec is limited by the __u32 size of
the psched tick field in the kernel API, not the 4 GB limit of the tc
police burst user space API.

	However, at higher rates, e.g., 10 Gbit/sec, the burst size is
currently limited by the 4 GB maximum for the burst command line parameter
value, rather than UINT_MAX psched ticks:

	10 Gbit/sec / 8 = 1250000000 MBbytes/sec
	1250000000 * ~274.88 seconds = ~343600000000, more than UINT_MAX

	Here, the maximum duration of a burst the kernel can handle
exceeds 4 GB of burst size.

	While the above maximum may be an excessively large burst value,
at 10 Gbit/sec, a 4 GB burst size corresponds to just under 3.5 seconds in
duration:

	2^32 bytes / 10 Gbit/sec
	2^32 bytes / 1250000000 bytes/sec
	equals ~3.43 sec

	So, at higher rates, burst sizes exceeding 4 GB are both
reasonable and feasible, up to the UINT_MAX limit for psched ticks.
Enabling this requires changes only to the user space processing of the
burst size parameter in tc.

	In principle, the other packet schedulers utilizing psched ticks
for burst sizing, htb and tbf, could be similarly changed to permit larger
burst sizes, but this patch set does not do so.

	Separately, for the burst duration calculation overflow (i.e.,
that the number of psched ticks exceeds UINT_MAX), under the current
implementation, one example of overflow is as follows:

# /sbin/tc filter add dev eth0 protocol ip prio 1 parent ffff: handle 1 fw police rate 1Mbit peakrate 10Gbit burst 34375000 mtu 64Kb conform-exceed reclassify

# /sbin/tc -raw filter get dev eth0 ingress protocol ip pref 1 handle 1 fw
filter ingress protocol ip pref 1 fw chain 0 handle 0x1  police 0x1 rate 1Mbit burst 15261b mtu 64Kb [001d1bf8] peakrate 10Gbit action reclassify overhead 0b 
        ref 1 bind 1 

	Note that the returned burst value is 15261b, which does not match
the supplied value of 34375000.  With this patch set applied, this
situation is flagged as an error.


[0] psched ticks are defined in the kernel in include/net/pkt_sched.h:

#define PSCHED_SHIFT                    6
#define PSCHED_TICKS2NS(x)              ((s64)(x) << PSCHED_SHIFT)
#define PSCHED_NS2TICKS(x)              ((x) >> PSCHED_SHIFT)

#define PSCHED_TICKS_PER_SEC            PSCHED_NS2TICKS(NSEC_PER_SEC)

	where PSCHED_TICKS_PER_SEC is 15625000.

	These values are exported to user space via /proc/net/psched, the
second field being PSCHED_TICKS2NS(1), which at present is 64 (0x40).  tc
uses this value to compute its internal "tick_in_usec" variable containing
the number of psched ticks per usec (15.625) used for the psched tick
computations.

	Lastly, note that PSCHED_SHIFT was previously 10, and changed to 6
in commit a4a710c4a7490 in 2009.  I have not tested backwards
compatibility of these changes with kernels of that era.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-15  7:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-07  1:42 [PATCH iproute2-next] tc/police: Allow 64 bit burst size Jay Vosburgh
2025-09-07  1:42 ` [PATCH 1/4 iproute2-next] lib: Update backend of print_size to accept 64 bit size Jay Vosburgh
2025-09-07  1:42 ` [PATCH 2/4 iproute2-next] tc: Add get_size64 and get_size64_and_cell Jay Vosburgh
2025-09-07  1:42 ` [PATCH 3/4 iproute2-next] tc: Expand tc_calc_xmittime, tc_calc_xmitsize to u64 Jay Vosburgh
2025-09-07  1:42 ` [PATCH 4/4 iproute2-next] tc/police: enable use of 64 bit burst parameter Jay Vosburgh
2025-09-10  3:32 ` [PATCH iproute2-next] tc/police: Allow 64 bit burst size Jamal Hadi Salim
2025-09-10  4:04   ` Jay Vosburgh
2025-09-10  4:29     ` Jamal Hadi Salim
2025-09-11 20:50   ` David Ahern
2025-09-12  4:19     ` Jay Vosburgh
2025-09-12 14:31       ` Jamal Hadi Salim
2025-09-12 15:48         ` Victor Nogueira
2025-09-15  7:51     ` Jamal Hadi Salim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).