netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jay Vosburgh <jv@jvosburgh.net>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: netdev@vger.kernel.org,
	Stephen Hemminger <stephen@networkplumber.org>,
	David Ahern <dsahern@gmail.com>
Subject: Re: [PATCH iproute2-next] tc/police: Allow 64 bit burst size
Date: Tue, 09 Sep 2025 21:04:53 -0700	[thread overview]
Message-ID: <2940856.1757477093@famine> (raw)
In-Reply-To: <CAM0EoMmJaC3OAncWnUOkz6mn7BVXudnG1YKUYZomUkbVu8Zb+g@mail.gmail.com>

Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>On Sat, Sep 6, 2025 at 9:50 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>>
>>
>>         In summary, this patchset changes the user space handling of the
>> tc police burst parameter to permit burst sizes that exceed 4 GB when the
>> specified rate is high enough that the kernel API for burst can accomodate
>> such.
>>
>>         Additionally, if the burst exceeds the upper limit of the kernel
>> API, this is now flagged as an error.  The existing behavior silently
>> overflows, resulting in arbitrary values passed to the kernel.
>>
>>         In detail, as presently implemented, the tc police burst option
>> limits the size of the burst to to 4 GB, i.e., UINT_MAX for a 32 bit
>> unsigned int.  This is a reasonable limit for the low rates common when
>> this was developed.  However, the underlying implementation of burst is
>> computed as "time at the specified rate," and for higher rates, a burst
>> size exceeding 4 GB is feasible without modification to the kernel.
>>
>>         The burst size provided on the command line is translated into a
>> duration, representing how much time is required at the specified rate to
>> transmit the given burst size.
>>
>>         This time is calculated in units of "psched ticks," each of which
>> is 64 nsec[0].  The computed number of psched ticks is sent to the kernel
>> as a __u32 value.
>>
>
>Please run tdc tests. David/Stephen - can we please make this a
>requirement for iproute2 tc related changes?

	I was not familiar with those tests (but see them now that I
look in the kernel source).  I did run the tests included in the
iproute2-next git repository.

>Jay, your patches fail at least one test because you changed the unit outputs.
>Either we fix the tdc test or you make your changes backward compatible.

	I'll run the tdc tests and have a look at the failures.

>In the future also cc kernel tc maintainers (I only saw this because
>someone pointed it to me).
>Overall the changes look fine.

	Understood, but this isn't documented in iproute2.  Perhaps the
iproute2 MAINTAINERS should have a tc section to clarify this
expectation?

	-J

>cheers,
>jamal
>
>>         Because burst is ultimately calculated as a time duration, the
>> real upper limit for a burst is UINT_MAX psched ticks, i.e.,
>>
>>         UINT_MAX * psched tick duration / NSEC_PER_SEC
>>         (2^32-1) *         64           / 1E9
>>
>>         which is roughly 274.88 seconds (274.8779...).
>>
>>         At low rates, e.g., 5 Mbit/sec, UINT_MAX psched ticks does not
>> correspond to a burst size in excess of 4 GB, so the above is moot, e.g.,
>>
>>         5Mbit/sec / 8 = 625000 MBytes/sec
>>         625000 * ~274.88 seconds = ~171800000 max burst size, below UINT_MAX
>>
>>         Thus, the burst size at 5Mbit/sec is limited by the __u32 size of
>> the psched tick field in the kernel API, not the 4 GB limit of the tc
>> police burst user space API.
>>
>>         However, at higher rates, e.g., 10 Gbit/sec, the burst size is
>> currently limited by the 4 GB maximum for the burst command line parameter
>> value, rather than UINT_MAX psched ticks:
>>
>>         10 Gbit/sec / 8 = 1250000000 MBbytes/sec
>>         1250000000 * ~274.88 seconds = ~343600000000, more than UINT_MAX
>>
>>         Here, the maximum duration of a burst the kernel can handle
>> exceeds 4 GB of burst size.
>>
>>         While the above maximum may be an excessively large burst value,
>> at 10 Gbit/sec, a 4 GB burst size corresponds to just under 3.5 seconds in
>> duration:
>>
>>         2^32 bytes / 10 Gbit/sec
>>         2^32 bytes / 1250000000 bytes/sec
>>         equals ~3.43 sec
>>
>>         So, at higher rates, burst sizes exceeding 4 GB are both
>> reasonable and feasible, up to the UINT_MAX limit for psched ticks.
>> Enabling this requires changes only to the user space processing of the
>> burst size parameter in tc.
>>
>>         In principle, the other packet schedulers utilizing psched ticks
>> for burst sizing, htb and tbf, could be similarly changed to permit larger
>> burst sizes, but this patch set does not do so.
>>
>>         Separately, for the burst duration calculation overflow (i.e.,
>> that the number of psched ticks exceeds UINT_MAX), under the current
>> implementation, one example of overflow is as follows:
>>
>> # /sbin/tc filter add dev eth0 protocol ip prio 1 parent ffff: handle 1 fw police rate 1Mbit peakrate 10Gbit burst 34375000 mtu 64Kb conform-exceed reclassify
>>
>> # /sbin/tc -raw filter get dev eth0 ingress protocol ip pref 1 handle 1 fw
>> filter ingress protocol ip pref 1 fw chain 0 handle 0x1  police 0x1 rate 1Mbit burst 15261b mtu 64Kb [001d1bf8] peakrate 10Gbit action reclassify overhead 0b
>>         ref 1 bind 1
>>
>>         Note that the returned burst value is 15261b, which does not match
>> the supplied value of 34375000.  With this patch set applied, this
>> situation is flagged as an error.
>>
>>
>> [0] psched ticks are defined in the kernel in include/net/pkt_sched.h:
>>
>> #define PSCHED_SHIFT                    6
>> #define PSCHED_TICKS2NS(x)              ((s64)(x) << PSCHED_SHIFT)
>> #define PSCHED_NS2TICKS(x)              ((x) >> PSCHED_SHIFT)
>>
>> #define PSCHED_TICKS_PER_SEC            PSCHED_NS2TICKS(NSEC_PER_SEC)
>>
>>         where PSCHED_TICKS_PER_SEC is 15625000.
>>
>>         These values are exported to user space via /proc/net/psched, the
>> second field being PSCHED_TICKS2NS(1), which at present is 64 (0x40).  tc
>> uses this value to compute its internal "tick_in_usec" variable containing
>> the number of psched ticks per usec (15.625) used for the psched tick
>> computations.
>>
>>         Lastly, note that PSCHED_SHIFT was previously 10, and changed to 6
>> in commit a4a710c4a7490 in 2009.  I have not tested backwards
>> compatibility of these changes with kernels of that era.

---
	-Jay Vosburgh, jv@jvosburgh.net


  reply	other threads:[~2025-09-10  4:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-07  1:42 [PATCH iproute2-next] tc/police: Allow 64 bit burst size Jay Vosburgh
2025-09-07  1:42 ` [PATCH 1/4 iproute2-next] lib: Update backend of print_size to accept 64 bit size Jay Vosburgh
2025-09-07  1:42 ` [PATCH 2/4 iproute2-next] tc: Add get_size64 and get_size64_and_cell Jay Vosburgh
2025-09-07  1:42 ` [PATCH 3/4 iproute2-next] tc: Expand tc_calc_xmittime, tc_calc_xmitsize to u64 Jay Vosburgh
2025-09-07  1:42 ` [PATCH 4/4 iproute2-next] tc/police: enable use of 64 bit burst parameter Jay Vosburgh
2025-09-10  3:32 ` [PATCH iproute2-next] tc/police: Allow 64 bit burst size Jamal Hadi Salim
2025-09-10  4:04   ` Jay Vosburgh [this message]
2025-09-10  4:29     ` Jamal Hadi Salim
2025-09-11 20:50   ` David Ahern
2025-09-12  4:19     ` Jay Vosburgh
2025-09-12 14:31       ` Jamal Hadi Salim
2025-09-12 15:48         ` Victor Nogueira
2025-09-15  7:51     ` Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2940856.1757477093@famine \
    --to=jv@jvosburgh.net \
    --cc=dsahern@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).