All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jay Vosburgh <jv@jvosburgh.net>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: netdev@vger.kernel.org,
	Stephen Hemminger <stephen@networkplumber.org>,
	David Ahern <dsahern@gmail.com>
Subject: Re: [PATCH iproute2-next] tc/police: Allow 64 bit burst size
Date: Tue, 09 Sep 2025 21:04:53 -0700	[thread overview]
Message-ID: <2940856.1757477093@famine> (raw)
In-Reply-To: <CAM0EoMmJaC3OAncWnUOkz6mn7BVXudnG1YKUYZomUkbVu8Zb+g@mail.gmail.com>

Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>On Sat, Sep 6, 2025 at 9:50 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>>
>>
>>         In summary, this patchset changes the user space handling of the
>> tc police burst parameter to permit burst sizes that exceed 4 GB when the
>> specified rate is high enough that the kernel API for burst can accomodate
>> such.
>>
>>         Additionally, if the burst exceeds the upper limit of the kernel
>> API, this is now flagged as an error.  The existing behavior silently
>> overflows, resulting in arbitrary values passed to the kernel.
>>
>>         In detail, as presently implemented, the tc police burst option
>> limits the size of the burst to to 4 GB, i.e., UINT_MAX for a 32 bit
>> unsigned int.  This is a reasonable limit for the low rates common when
>> this was developed.  However, the underlying implementation of burst is
>> computed as "time at the specified rate," and for higher rates, a burst
>> size exceeding 4 GB is feasible without modification to the kernel.
>>
>>         The burst size provided on the command line is translated into a
>> duration, representing how much time is required at the specified rate to
>> transmit the given burst size.
>>
>>         This time is calculated in units of "psched ticks," each of which
>> is 64 nsec[0].  The computed number of psched ticks is sent to the kernel
>> as a __u32 value.
>>
>
>Please run tdc tests. David/Stephen - can we please make this a
>requirement for iproute2 tc related changes?

	I was not familiar with those tests (but see them now that I
look in the kernel source).  I did run the tests included in the
iproute2-next git repository.

>Jay, your patches fail at least one test because you changed the unit outputs.
>Either we fix the tdc test or you make your changes backward compatible.

	I'll run the tdc tests and have a look at the failures.

>In the future also cc kernel tc maintainers (I only saw this because
>someone pointed it to me).
>Overall the changes look fine.

	Understood, but this isn't documented in iproute2.  Perhaps the
iproute2 MAINTAINERS should have a tc section to clarify this
expectation?

	-J

>cheers,
>jamal
>
>>         Because burst is ultimately calculated as a time duration, the
>> real upper limit for a burst is UINT_MAX psched ticks, i.e.,
>>
>>         UINT_MAX * psched tick duration / NSEC_PER_SEC
>>         (2^32-1) *         64           / 1E9
>>
>>         which is roughly 274.88 seconds (274.8779...).
>>
>>         At low rates, e.g., 5 Mbit/sec, UINT_MAX psched ticks does not
>> correspond to a burst size in excess of 4 GB, so the above is moot, e.g.,
>>
>>         5Mbit/sec / 8 = 625000 MBytes/sec
>>         625000 * ~274.88 seconds = ~171800000 max burst size, below UINT_MAX
>>
>>         Thus, the burst size at 5Mbit/sec is limited by the __u32 size of
>> the psched tick field in the kernel API, not the 4 GB limit of the tc
>> police burst user space API.
>>
>>         However, at higher rates, e.g., 10 Gbit/sec, the burst size is
>> currently limited by the 4 GB maximum for the burst command line parameter
>> value, rather than UINT_MAX psched ticks:
>>
>>         10 Gbit/sec / 8 = 1250000000 MBbytes/sec
>>         1250000000 * ~274.88 seconds = ~343600000000, more than UINT_MAX
>>
>>         Here, the maximum duration of a burst the kernel can handle
>> exceeds 4 GB of burst size.
>>
>>         While the above maximum may be an excessively large burst value,
>> at 10 Gbit/sec, a 4 GB burst size corresponds to just under 3.5 seconds in
>> duration:
>>
>>         2^32 bytes / 10 Gbit/sec
>>         2^32 bytes / 1250000000 bytes/sec
>>         equals ~3.43 sec
>>
>>         So, at higher rates, burst sizes exceeding 4 GB are both
>> reasonable and feasible, up to the UINT_MAX limit for psched ticks.
>> Enabling this requires changes only to the user space processing of the
>> burst size parameter in tc.
>>
>>         In principle, the other packet schedulers utilizing psched ticks
>> for burst sizing, htb and tbf, could be similarly changed to permit larger
>> burst sizes, but this patch set does not do so.
>>
>>         Separately, for the burst duration calculation overflow (i.e.,
>> that the number of psched ticks exceeds UINT_MAX), under the current
>> implementation, one example of overflow is as follows:
>>
>> # /sbin/tc filter add dev eth0 protocol ip prio 1 parent ffff: handle 1 fw police rate 1Mbit peakrate 10Gbit burst 34375000 mtu 64Kb conform-exceed reclassify
>>
>> # /sbin/tc -raw filter get dev eth0 ingress protocol ip pref 1 handle 1 fw
>> filter ingress protocol ip pref 1 fw chain 0 handle 0x1  police 0x1 rate 1Mbit burst 15261b mtu 64Kb [001d1bf8] peakrate 10Gbit action reclassify overhead 0b
>>         ref 1 bind 1
>>
>>         Note that the returned burst value is 15261b, which does not match
>> the supplied value of 34375000.  With this patch set applied, this
>> situation is flagged as an error.
>>
>>
>> [0] psched ticks are defined in the kernel in include/net/pkt_sched.h:
>>
>> #define PSCHED_SHIFT                    6
>> #define PSCHED_TICKS2NS(x)              ((s64)(x) << PSCHED_SHIFT)
>> #define PSCHED_NS2TICKS(x)              ((x) >> PSCHED_SHIFT)
>>
>> #define PSCHED_TICKS_PER_SEC            PSCHED_NS2TICKS(NSEC_PER_SEC)
>>
>>         where PSCHED_TICKS_PER_SEC is 15625000.
>>
>>         These values are exported to user space via /proc/net/psched, the
>> second field being PSCHED_TICKS2NS(1), which at present is 64 (0x40).  tc
>> uses this value to compute its internal "tick_in_usec" variable containing
>> the number of psched ticks per usec (15.625) used for the psched tick
>> computations.
>>
>>         Lastly, note that PSCHED_SHIFT was previously 10, and changed to 6
>> in commit a4a710c4a7490 in 2009.  I have not tested backwards
>> compatibility of these changes with kernels of that era.

---
	-Jay Vosburgh, jv@jvosburgh.net


  reply	other threads:[~2025-09-10  4:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-07  1:42 [PATCH iproute2-next] tc/police: Allow 64 bit burst size Jay Vosburgh
2025-09-07  1:42 ` [PATCH 1/4 iproute2-next] lib: Update backend of print_size to accept 64 bit size Jay Vosburgh
2025-09-07  1:42 ` [PATCH 2/4 iproute2-next] tc: Add get_size64 and get_size64_and_cell Jay Vosburgh
2025-09-07  1:42 ` [PATCH 3/4 iproute2-next] tc: Expand tc_calc_xmittime, tc_calc_xmitsize to u64 Jay Vosburgh
2025-09-07  1:42 ` [PATCH 4/4 iproute2-next] tc/police: enable use of 64 bit burst parameter Jay Vosburgh
2025-09-10  3:32 ` [PATCH iproute2-next] tc/police: Allow 64 bit burst size Jamal Hadi Salim
2025-09-10  4:04   ` Jay Vosburgh [this message]
2025-09-10  4:29     ` Jamal Hadi Salim
2025-09-11 20:50   ` David Ahern
2025-09-12  4:19     ` Jay Vosburgh
2025-09-12 14:31       ` Jamal Hadi Salim
2025-09-12 15:48         ` Victor Nogueira
2025-09-15  7:51     ` Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2940856.1757477093@famine \
    --to=jv@jvosburgh.net \
    --cc=dsahern@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.