From: Andrei Lalaev <andrey.lalaev@gmail.com>
To: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Miguel Ojeda <ojeda@kernel.org>,
Christian Schrefl <chrisi.schrefl@gmail.com>,
Arnd Bergmann <arnd@arndb.de>,
Russell King <linux@armlinux.org.uk>,
Simona Vetter <simona@ffwll.ch>,
Paolo Bonzini <pbonzini@redhat.com>,
David Airlie <airlied@gmail.com>,
Maxime Ripard <mripard@kernel.org>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Javier Martinez Canillas <javierm@redhat.com>,
Thomas Zimmermann <tzimmermann@suse.de>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
dri-devel@lists.freedesktop.org,
rust-for-linux <rust-for-linux@vger.kernel.org>
Subject: Re: [PATCH] drm/panic: Use a decimal fifo to avoid u64 by u64 divide
Date: Thu, 26 Jun 2025 17:16:15 +0200 [thread overview]
Message-ID: <5f01c181-515e-4cff-bf8a-81c620886cd2@gmail.com> (raw)
In-Reply-To: <80d092a3-0210-48d8-97da-f9175260a5fc@redhat.com>
On 26.06.25 16:19, Jocelyn Falempe wrote:
> On 25/06/2025 00:18, Jocelyn Falempe wrote:
>> On 24/06/2025 20:55, Andrei Lalaev wrote:
>>> On 18.04.25 18:48, Jocelyn Falempe wrote:
>>>> On 32bits ARM, u64/u64 is not supported [1], so change the algorithm
>>>> to use a simple fifo with decimal digits as u8 instead.
>>>> This is slower but should compile on all architecture.
>>>>
>>>> Link: https://lore.kernel.org/dri-devel/ CANiq72ke45eOwckMhWHvmwxc03dxr4rnxxKvx+HvWdBLopZfrQ@mail.gmail.com/ [1]
>>>> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
>>>> ---
>>>> drivers/gpu/drm/drm_panic_qr.rs | 71 ++++++++++++++++++++++-----------
>>>> 1 file changed, 48 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/ drm_panic_qr.rs
>>>> index 6025a705530e..dd55b1cb764d 100644
>>>> --- a/drivers/gpu/drm/drm_panic_qr.rs
>>>> +++ b/drivers/gpu/drm/drm_panic_qr.rs
>>>> @@ -366,8 +366,48 @@ fn iter(&self) -> SegmentIterator<'_> {
>>>> SegmentIterator {
>>>> segment: self,
>>>> offset: 0,
>>>> - carry: 0,
>>>> - carry_len: 0,
>>>> + decfifo: Default::default(),
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> +/// Max fifo size is 17 (max push) + 2 (max remaining)
>>>> +const MAX_FIFO_SIZE: usize = 19;
>>>> +
>>>> +/// A simple Decimal digit FIFO
>>>> +#[derive(Default)]
>>>> +struct DecFifo {
>>>> + decimals: [u8; MAX_FIFO_SIZE],
>>>> + len: usize,
>>>> +}
>>>> +
>>>> +impl DecFifo {
>>>> + fn push(&mut self, data: u64, len: usize) {
>>>> + let mut chunk = data;
>>>> + for i in (0..self.len).rev() {
>>>> + self.decimals[i + len] = self.decimals[i];
>>>> + }
>>>> + for i in 0..len {
>>>> + self.decimals[i] = (chunk % 10) as u8;
>>>> + chunk /= 10;
>>>> + }
>>>> + self.len += len;
>>>> + }
>>>> +
>>>> + /// Pop 3 decimal digits from the FIFO
>>>> + fn pop3(&mut self) -> Option<(u16, usize)> {
>>>> + if self.len == 0 {
>>>> + None
>>>> + } else {
>>>> + let poplen = 3.min(self.len);
>>>> + self.len -= poplen;
>>>> + let mut out = 0;
>>>> + let mut exp = 1;
>>>> + for i in 0..poplen {
>>>> + out += self.decimals[self.len + i] as u16 * exp;
>>>> + exp *= 10;
>>>> + }
>>>> + Some((out, NUM_CHARS_BITS[poplen]))
>>>> }
>>>> }
>>>> }
>>>> @@ -375,8 +415,7 @@ fn iter(&self) -> SegmentIterator<'_> {
>>>> struct SegmentIterator<'a> {
>>>> segment: &'a Segment<'a>,
>>>> offset: usize,
>>>> - carry: u64,
>>>> - carry_len: usize,
>>>> + decfifo: DecFifo,
>>>> }
>>>> impl Iterator for SegmentIterator<'_> {
>>>> @@ -394,31 +433,17 @@ fn next(&mut self) -> Option<Self::Item> {
>>>> }
>>>> }
>>>> Segment::Numeric(data) => {
>>>> - if self.carry_len < 3 && self.offset < data.len() {
>>>> - // If there are less than 3 decimal digits in the carry,
>>>> - // take the next 7 bytes of input, and add them to the carry.
>>>> + if self.decfifo.len < 3 && self.offset < data.len() {
>>>> + // If there are less than 3 decimal digits in the fifo,
>>>> + // take the next 7 bytes of input, and push them to the fifo.
>>>> let mut buf = [0u8; 8];
>>>> let len = 7.min(data.len() - self.offset);
>>>> buf[..len].copy_from_slice(&data[self.offset..self.offset + len]);
>>>> let chunk = u64::from_le_bytes(buf);
>>>> - let pow = u64::pow(10, BYTES_TO_DIGITS[len] as u32);
>>>> - self.carry = chunk + self.carry * pow;
>>>> + self.decfifo.push(chunk, BYTES_TO_DIGITS[len]);
>>>> self.offset += len;
>>>> - self.carry_len += BYTES_TO_DIGITS[len];
>>>> - }
>>>> - match self.carry_len {
>>>> - 0 => None,
>>>> - len => {
>>>> - // take the next 3 decimal digits of the carry
>>>> - // and return 10bits of numeric data.
>>>> - let out_len = 3.min(len);
>>>> - self.carry_len -= out_len;
>>>> - let pow = u64::pow(10, self.carry_len as u32);
>>>> - let out = (self.carry / pow) as u16;
>>>> - self.carry %= pow;
>>>> - Some((out, NUM_CHARS_BITS[out_len]))
>>>> - }
>>>> }
>>>> + self.decfifo.pop3()
>>>> }
>>>> }
>>>> }
>>>>
>>>> base-commit: 74757ad1c105c8fc00b4cac0b7918fe3262cdb18
>>>
>>> Hi Jocelyn,
>>>
>>> Apologies for reviving this old thread, but I'm still encountering
>>> the same issue with the latest master (78f4e737a53e).
>>>
>>> When compiling this module for ARM32 (multi_v7_defconfig),
>>> I get the following error:
>>>
>>> ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>> >>> drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>> >>> drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>> >>> drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>> >>> referenced 14 more times
>>> >>> did you mean: __aeabi_uidivmod
>>> >>> defined in: vmlinux.a(arch/arm/lib/lib1funcs.o)
>>>
>>> Since no one else has reported this in two months, I’m wondering
>>> if this might be a configuration issue on my end.
>>
>> Ok, that's surprising, the lines 391 and 392 are:
>>
>> self.decimals[i] = (chunk % 10) as u8;
>> chunk /= 10;
>>
>> So the compiler should be smart enough to do that without using a division.
>> I will try to reproduce, and see if I can fix that.
>
> I reproduced the issues, it looks like clang doesn't do the optimization on ARM32:
>
> https://github.com/llvm/llvm-project/issues/37280
>
> So I've made a quick test with the following changes, and it builds:
>
>
> diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs
> index dd55b1cb764d..57bd3c6465bb 100644
> --- a/drivers/gpu/drm/drm_panic_qr.rs
> +++ b/drivers/gpu/drm/drm_panic_qr.rs
> @@ -381,6 +381,20 @@ struct DecFifo {
> len: usize,
> }
>
> +fn div10(val: u64) -> u64
> +{
> + let val_h = val >> 32;
> + let val_l = val & 0xFFFFFFFF;
> + let b_h: u64 = 0x66666666;
> + let b_l: u64 = 0x66666667;
> +
> + let tmp1 = val_h * b_l + ((val_l * b_l) >> 32);
> + let tmp2 = val_l * b_h + (tmp1 & 0xffffffff);
> + let tmp3 = val_h * b_h + (tmp1 >> 32) + (tmp2 >> 32);
> +
> + tmp3 >> 2
> +}
> +
> impl DecFifo {
> fn push(&mut self, data: u64, len: usize) {
> let mut chunk = data;
> @@ -389,7 +403,7 @@ fn push(&mut self, data: u64, len: usize) {
> }
> for i in 0..len {
> self.decimals[i] = (chunk % 10) as u8;
> - chunk /= 10;
> + chunk = div10(chunk);
> }
> self.len += len;
> }
>
>
> Best regards,
>
Compiles for me too on clang 20.1.6.
Thanks a lot!
--
Best regards,
Andrei Lalaev
prev parent reply other threads:[~2025-06-26 16:08 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-18 16:48 [PATCH] drm/panic: Use a decimal fifo to avoid u64 by u64 divide Jocelyn Falempe
2025-04-18 18:18 ` Miguel Ojeda
2025-04-22 8:03 ` Jocelyn Falempe
2025-04-29 7:33 ` Javier Martinez Canillas
2025-05-02 12:23 ` Jocelyn Falempe
2025-06-24 18:55 ` Andrei Lalaev
2025-06-24 22:18 ` Jocelyn Falempe
2025-06-26 14:19 ` Jocelyn Falempe
2025-06-26 15:16 ` Andrei Lalaev [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5f01c181-515e-4cff-bf8a-81c620886cd2@gmail.com \
--to=andrey.lalaev@gmail.com \
--cc=airlied@gmail.com \
--cc=arnd@arndb.de \
--cc=chrisi.schrefl@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=javierm@redhat.com \
--cc=jfalempe@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=ojeda@kernel.org \
--cc=pbonzini@redhat.com \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.