[bisected] ext4 corruption on parisc since 6.12

Linux PARISC architecture development
 help / color / mirror / Atom feed

* [bisected] ext4 corruption on parisc since 6.12
@ 2024-12-02  0:26 matoro
  2024-12-02  1:47 ` John David Anglin
  0 siblings, 1 reply; 9+ messages in thread
From: matoro @ 2024-12-02  0:26 UTC (permalink / raw)
  To: Linux Parisc, John David Anglin, John David Anglin, deller,
	Deller, linmag7, Sam James, Linux Kernel Mailing List

Hi Helge, when booting 6.12 here myself and another user (CC'd) both observed 
our ext4 filesystems to be immediately corrupted in the same manner.

Every file that is read or written will have its access/modify times set to 
2446-05-10 18:38:55.0000, which is the maximum ext4 timestamp.  The 32-bit 
userspace doesn't seem to be able to handle this at all, as every further 
stat() call will error with "Value too large for defined data type".  
Unfortunately, simply rolling back to kernel 6.11 is insufficient to recover, 
as the filesystem corruption is persistent, and the errors come from 
userspace attempting to read the modified files.  I was able to recover with 
a command like:  find / -newermt 2446-01-01 -o -newerct 2446-01-01 -o 
-newerat 2446-01-01 | xargs touch -h

Luckily, lindholm was able to bisect and identified as the culprit commit:  
b5ff52be891347f8847872c49d7a5c2fa29400a7 ("parisc: Convert to generic 
clockevents").  Some other comments from the discussion:

17:20:37 <awilfox> would be curious if keeping that patch + CONFIG_SMP=n 
fixes it
17:20:44 <awilfox> this doesn't look necessarily correct on MP machines
17:23:56 <awilfox> time_keeper_id is now unused; the old code specifically 
marked the clocksource as unstable on MP machines despite having per_cpu 
before
17:24:11 <awilfox> and now it seems to imply CLOCK_EVT_FEAT_PERCPU is enough 
to work around it
17:24:13 <awilfox> maybe it isn't

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] ext4 corruption on parisc since 6.12
  2024-12-02  0:26 [bisected] ext4 corruption on parisc since 6.12 matoro
@ 2024-12-02  1:47 ` John David Anglin
  2024-12-02  4:55   ` matoro
  0 siblings, 1 reply; 9+ messages in thread
From: John David Anglin @ 2024-12-02  1:47 UTC (permalink / raw)
  To: matoro, Linux Parisc, deller, Deller, linmag7, Sam James,
	Linux Kernel Mailing List

I haven't seen any file system corruption on rp3440 with several weeks of running with clock events.  I just
started running 6.12.1 today though.

I have the following timer config:

# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

There was some concern about this change on systems where the CPU timers aren't synchronized.  what
systems do you see this on?

Dave

On 2024-12-01 7:26 p.m., matoro wrote:
> Hi Helge, when booting 6.12 here myself and another user (CC'd) both observed our ext4 filesystems to be immediately corrupted in the same 
> manner.
>
> Every file that is read or written will have its access/modify times set to 2446-05-10 18:38:55.0000, which is the maximum ext4 timestamp.  
> The 32-bit userspace doesn't seem to be able to handle this at all, as every further stat() call will error with "Value too large for defined 
> data type".  Unfortunately, simply rolling back to kernel 6.11 is insufficient to recover, as the filesystem corruption is persistent, and the 
> errors come from userspace attempting to read the modified files.  I was able to recover with a command like:  find / -newermt 2446-01-01 -o 
> -newerct 2446-01-01 -o -newerat 2446-01-01 | xargs touch -h
>
> Luckily, lindholm was able to bisect and identified as the culprit commit:  b5ff52be891347f8847872c49d7a5c2fa29400a7 ("parisc: Convert to 
> generic clockevents").  Some other comments from the discussion:
>
> 17:20:37 <awilfox> would be curious if keeping that patch + CONFIG_SMP=n fixes it
> 17:20:44 <awilfox> this doesn't look necessarily correct on MP machines
> 17:23:56 <awilfox> time_keeper_id is now unused; the old code specifically marked the clocksource as unstable on MP machines despite having 
> per_cpu before
> 17:24:11 <awilfox> and now it seems to imply CLOCK_EVT_FEAT_PERCPU is enough to work around it
> 17:24:13 <awilfox> maybe it isn't
>
> Thanks!


-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] ext4 corruption on parisc since 6.12
  2024-12-02  1:47 ` John David Anglin
@ 2024-12-02  4:55   ` matoro
  2024-12-02  6:30     ` Magnus Lindholm
  0 siblings, 1 reply; 9+ messages in thread
From: matoro @ 2024-12-02  4:55 UTC (permalink / raw)
  To: John David Anglin
  Cc: Linux Parisc, deller, Deller, linmag7, Sam James,
	Linux Kernel Mailing List

Hmm, this is my config, also on an rp3440:

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# end of Timers subsystem

lindholm can confirm on their hardware/config.  Maybe you can try that and 
see if you can reproduce?  I will try your config as well.

On 2024-12-01 20:47, John David Anglin wrote:
> I haven't seen any file system corruption on rp3440 with several weeks of 
> running with clock events.  I just
> started running 6.12.1 today though.
> 
> I have the following timer config:
> 
> # Timers subsystem
> #
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ is not set
> CONFIG_HIGH_RES_TIMERS=y
> # end of Timers subsystem
> 
> There was some concern about this change on systems where the CPU timers 
> aren't synchronized.  what
> systems do you see this on?
> 
> Dave
> 
> On 2024-12-01 7:26 p.m., matoro wrote:
>> Hi Helge, when booting 6.12 here myself and another user (CC'd) both 
>> observed our ext4 filesystems to be immediately corrupted in the same 
>> manner.
>> 
>> Every file that is read or written will have its access/modify times set to 
>> 2446-05-10 18:38:55.0000, which is the maximum ext4 timestamp.  The 32-bit 
>> userspace doesn't seem to be able to handle this at all, as every further 
>> stat() call will error with "Value too large for defined data type".  
>> Unfortunately, simply rolling back to kernel 6.11 is insufficient to 
>> recover, as the filesystem corruption is persistent, and the errors come 
>> from userspace attempting to read the modified files.  I was able to 
>> recover with a command like:  find / -newermt 2446-01-01 -o -newerct 
>> 2446-01-01 -o -newerat 2446-01-01 | xargs touch -h
>> 
>> Luckily, lindholm was able to bisect and identified as the culprit commit:  
>> b5ff52be891347f8847872c49d7a5c2fa29400a7 ("parisc: Convert to generic 
>> clockevents").  Some other comments from the discussion:
>> 
>> 17:20:37 <awilfox> would be curious if keeping that patch + CONFIG_SMP=n 
>> fixes it
>> 17:20:44 <awilfox> this doesn't look necessarily correct on MP machines
>> 17:23:56 <awilfox> time_keeper_id is now unused; the old code specifically 
>> marked the clocksource as unstable on MP machines despite having per_cpu 
>> before
>> 17:24:11 <awilfox> and now it seems to imply CLOCK_EVT_FEAT_PERCPU is 
>> enough to work around it
>> 17:24:13 <awilfox> maybe it isn't
>> 
>> Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] ext4 corruption on parisc since 6.12
  2024-12-02  4:55   ` matoro
@ 2024-12-02  6:30     ` Magnus Lindholm
  2024-12-02 14:54       ` John David Anglin
  0 siblings, 1 reply; 9+ messages in thread
From: Magnus Lindholm @ 2024-12-02  6:30 UTC (permalink / raw)
  To: matoro
  Cc: John David Anglin, Linux Parisc, deller, Deller, Sam James,
	Linux Kernel Mailing List

On Mon, Dec 2, 2024 at 5:55 AM matoro
<matoro_mailinglist_kernel@matoro.tk> wrote:
>
> Hmm, this is my config, also on an rp3440:
>
> #
> # Timers subsystem
> #
> CONFIG_HZ_PERIODIC=y
> # CONFIG_NO_HZ_IDLE is not set
> # CONFIG_NO_HZ is not set
> # CONFIG_HIGH_RES_TIMERS is not set
> # end of Timers subsystem
>
> lindholm can confirm on their hardware/config.  Maybe you can try that and
> see if you can reproduce?  I will try your config as well.

Hi, I'm on a HPC8000 "parisc64 PA8800 (Mako) 9000/785/C8000". I can confirm
that building a kernel CONFIG_SMP=n will mitigate this problem.
I haven't messed around with the config in the Timer subsystem so in my case the
parameters suggested are unset. (my config looks like matoros)

Magnus Lindholm

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] ext4 corruption on parisc since 6.12
  2024-12-02  6:30     ` Magnus Lindholm
@ 2024-12-02 14:54       ` John David Anglin
  2024-12-02 15:31         ` matoro
  0 siblings, 1 reply; 9+ messages in thread
From: John David Anglin @ 2024-12-02 14:54 UTC (permalink / raw)
  To: Magnus Lindholm, matoro
  Cc: Linux Parisc, deller, Deller, Sam James,
	Linux Kernel Mailing List

On 2024-12-02 1:30 a.m., Magnus Lindholm wrote:
> On Mon, Dec 2, 2024 at 5:55 AM matoro
> <matoro_mailinglist_kernel@matoro.tk> wrote:
>> Hmm, this is my config, also on an rp3440:
>>
>> #
>> # Timers subsystem
>> #
>> CONFIG_HZ_PERIODIC=y
>> # CONFIG_NO_HZ_IDLE is not set
>> # CONFIG_NO_HZ is not set
>> # CONFIG_HIGH_RES_TIMERS is not set
>> # end of Timers subsystem
>>
>> lindholm can confirm on their hardware/config.  Maybe you can try that and
>> see if you can reproduce?  I will try your config as well.
> Hi, I'm on a HPC8000 "parisc64 PA8800 (Mako) 9000/785/C8000". I can confirm
> that building a kernel CONFIG_SMP=n will mitigate this problem.
> I haven't messed around with the config in the Timer subsystem so in my case the
> parameters suggested are unset. (my config looks like matoros)
The clockevent driver was tested on both rp3440 and c8000, and some other SMP machines.
Helge knows details.  I have used it on rp3440 and c8000.

I would try my settings.  The primary reason in switching to the clockevent drivers was to
improve clock resolution.  The best resolution with the old drivers was 1 ms at 1000 HZ.
This caused problems with various package tests.  If config is the issue, probably
CONFIG_HIGH_RES_TIMERS needs to be forced when clockevent drivers are used.

Almost every other system uses the clockevent drivers.  So, there was a risk that parisc would
become unsupported.

I wonder if this could be caused by dead RTC battery.  Did you check output of date command?
Maybe a dead RTC battery interacts badly with clockevent drivers.

I run ntp on all my machines.

What files have bad dates (i.e., is this really a ext4 file system issue) or is it just that system has
a bad clock?

Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] ext4 corruption on parisc since 6.12
  2024-12-02 14:54       ` John David Anglin
@ 2024-12-02 15:31         ` matoro
  2024-12-02 16:35           ` Helge Deller
  2024-12-02 19:45           ` John David Anglin
  0 siblings, 2 replies; 9+ messages in thread
From: matoro @ 2024-12-02 15:31 UTC (permalink / raw)
  To: John David Anglin
  Cc: Magnus Lindholm, Linux Parisc, deller, Deller, Sam James,
	Linux Kernel Mailing List

On 2024-12-02 09:54, John David Anglin wrote:
> On 2024-12-02 1:30 a.m., Magnus Lindholm wrote:
>> On Mon, Dec 2, 2024 at 5:55 AM matoro
>> <matoro_mailinglist_kernel@matoro.tk> wrote:
>>> Hmm, this is my config, also on an rp3440:
>>> 
>>> #
>>> # Timers subsystem
>>> #
>>> CONFIG_HZ_PERIODIC=y
>>> # CONFIG_NO_HZ_IDLE is not set
>>> # CONFIG_NO_HZ is not set
>>> # CONFIG_HIGH_RES_TIMERS is not set
>>> # end of Timers subsystem
>>> 
>>> lindholm can confirm on their hardware/config.  Maybe you can try that and
>>> see if you can reproduce?  I will try your config as well.
>> Hi, I'm on a HPC8000 "parisc64 PA8800 (Mako) 9000/785/C8000". I can confirm
>> that building a kernel CONFIG_SMP=n will mitigate this problem.
>> I haven't messed around with the config in the Timer subsystem so in my 
>> case the
>> parameters suggested are unset. (my config looks like matoros)
> The clockevent driver was tested on both rp3440 and c8000, and some other 
> SMP machines.
> Helge knows details.  I have used it on rp3440 and c8000.
> 
> I would try my settings.  The primary reason in switching to the clockevent 
> drivers was to
> improve clock resolution.  The best resolution with the old drivers was 1 ms 
> at 1000 HZ.
> This caused problems with various package tests.  If config is the issue, 
> probably
> CONFIG_HIGH_RES_TIMERS needs to be forced when clockevent drivers are used.
> 
> Almost every other system uses the clockevent drivers.  So, there was a risk 
> that parisc would
> become unsupported.
> 
> I wonder if this could be caused by dead RTC battery.  Did you check output 
> of date command?
> Maybe a dead RTC battery interacts badly with clockevent drivers.
> 
> I run ntp on all my machines.
> 
> What files have bad dates (i.e., is this really a ext4 file system issue) or 
> is it just that system has
> a bad clock?
> 
> Dave

The files that have bad dates seem to be the ones /init on this system 
touches at early boot.  See the output here:  https://paste.matoro.tk/8cq8omg

When booted into the bad kernel, date(1) works and displays the correct time. 
  I'm using chrony for time syncing as well.

After switching to the config specified above, boot hangs before even getting 
to userspace with the following output:

[   12.473410] 0000:e0:01.1: ttyS2 at MMIO 0xfffffffff4050038 (irq = 73, 
base_baud = 115200) is a 16550A
[   12.757386] sym0: <1010-66> rev 0x1 at pci 0000:20:01.0 irq 70
[   12.761419] sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
[   12.885367] sym0: SCSI BUS has been reset.
[   12.889389] scsi host0: sym-2.2.3
[   13.053380] sym1: <1010-66> rev 0x1 at pci 0000:20:01.1 irq 71
[   13.055515] sym1: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
[   13.165367] sym1: SCSI BUS has been reset.
[   13.169388] scsi host1: sym-2.2.3
[   13.208927] rtc-generic rtc-generic: registered as rtc0
[   13.281367] rtc-generic rtc-generic: setting system clock to 
2024-12-02T07:17:02 UTC (1733123822)
[   13.281367] NET: Registered PF_INET6 protocol family
[   13.281367] Segment Routing with IPv6
[   13.281367] In-situ OAM (IOAM) with IPv6
[   13.281367] registered taskstats version 1
[   13.281367] Unstable clock detected, switching default tracing clock to 
"global"
[   13.281367] If you want to keep using the local clock, then add:
[   13.281367]   "trace_clock=local"
[   13.281367] on the kernel command line

At the end there the clock seems to stop progressing forward, as there are 
several real-time seconds that elapse in between messages with the same 
timestamp.  So I'm completely unable to boot with this config at all.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] ext4 corruption on parisc since 6.12
  2024-12-02 15:31         ` matoro
@ 2024-12-02 16:35           ` Helge Deller
  2024-12-02 19:45           ` John David Anglin
  1 sibling, 0 replies; 9+ messages in thread
From: Helge Deller @ 2024-12-02 16:35 UTC (permalink / raw)
  To: matoro, John David Anglin
  Cc: Magnus Lindholm, Linux Parisc, deller, Sam James,
	Linux Kernel Mailing List

Hi Matoro,

On 12/2/24 4:31 PM, matoro wrote:
> On 2024-12-02 09:54, John David Anglin wrote:
>> On 2024-12-02 1:30 a.m., Magnus Lindholm wrote:
>>> On Mon, Dec 2, 2024 at 5:55 AM matoro
>>> <matoro_mailinglist_kernel@matoro.tk> wrote:
>>>> Hmm, this is my config, also on an rp3440:
>>>>
>>>> #
>>>> # Timers subsystem
>>>> #
>>>> CONFIG_HZ_PERIODIC=y
>>>> # CONFIG_NO_HZ_IDLE is not set
>>>> # CONFIG_NO_HZ is not set
>>>> # CONFIG_HIGH_RES_TIMERS is not set
>>>> # end of Timers subsystem
>>>>
>>>> lindholm can confirm on their hardware/config.  Maybe you can try that and
>>>> see if you can reproduce?  I will try your config as well.
>>> Hi, I'm on a HPC8000 "parisc64 PA8800 (Mako) 9000/785/C8000". I can confirm
>>> that building a kernel CONFIG_SMP=n will mitigate this problem.
>>> I haven't messed around with the config in the Timer subsystem so in my case the
>>> parameters suggested are unset. (my config looks like matoros)
>> The clockevent driver was tested on both rp3440 and c8000, and some other SMP machines.
>> Helge knows details.  I have used it on rp3440 and c8000.
>>
>> I would try my settings.  The primary reason in switching to the clockevent drivers was to
>> improve clock resolution.  The best resolution with the old drivers was 1 ms at 1000 HZ.
>> This caused problems with various package tests.  If config is the issue, probably
>> CONFIG_HIGH_RES_TIMERS needs to be forced when clockevent drivers are used.
>>
>> Almost every other system uses the clockevent drivers.  So, there was a risk that parisc would
>> become unsupported.
>>
>> I wonder if this could be caused by dead RTC battery.  Did you check output of date command?
>> Maybe a dead RTC battery interacts badly with clockevent drivers.
>>
>> I run ntp on all my machines.
>>
>> What files have bad dates (i.e., is this really a ext4 file system issue) or is it just that system has
>> a bad clock?
>>
>> Dave
>
> The files that have bad dates seem to be the ones /init on this system touches at early boot.  See the output here:  https://paste.matoro.tk/8cq8omg
>
> When booted into the bad kernel, date(1) works and displays the correct time.  I'm using chrony for time syncing as well.
>
> After switching to the config specified above, boot hangs before even getting to userspace with the following output:
>
> [   12.473410] 0000:e0:01.1: ttyS2 at MMIO 0xfffffffff4050038 (irq = 73, base_baud = 115200) is a 16550A
> [   12.757386] sym0: <1010-66> rev 0x1 at pci 0000:20:01.0 irq 70
> [   12.761419] sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
> [   12.885367] sym0: SCSI BUS has been reset.
> [   12.889389] scsi host0: sym-2.2.3
> [   13.053380] sym1: <1010-66> rev 0x1 at pci 0000:20:01.1 irq 71
> [   13.055515] sym1: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
> [   13.165367] sym1: SCSI BUS has been reset.
> [   13.169388] scsi host1: sym-2.2.3
> [   13.208927] rtc-generic rtc-generic: registered as rtc0
> [   13.281367] rtc-generic rtc-generic: setting system clock to 2024-12-02T07:17:02 UTC (1733123822)
> [   13.281367] NET: Registered PF_INET6 protocol family
> [   13.281367] Segment Routing with IPv6
> [   13.281367] In-situ OAM (IOAM) with IPv6
> [   13.281367] registered taskstats version 1

this message...:

> [   13.281367] Unstable clock detected, switching default tracing clock to "global"
> [   13.281367] If you want to keep using the local clock, then add:
> [   13.281367]   "trace_clock=local"
> [   13.281367] on the kernel command line

is very misleading and has nothing to do with the parisc system clock from my commit.
You will find this message in all older kernels you had booted as well.
It refers to the tracing clock, which isn't used here.

> > At the end there the clock seems to stop progressing forward, as
> there are several real-time seconds that elapse in between messages
> with the same timestamp.  So I'm completely unable to boot with this
> config at all.

I think the problems you see are manifold, as it's probably not just
triggered by the kernel clock patch itself.
I've seen various issues on debian in the past, as long as we hadn't built glibc
with 64-bit time64_t enabled.
One such issue is what you see now: your system fails to bring up userspace
because the file date stamps are a lot in the future (even if the dates
are wrong, it should be able to boot).

So, the kernel patch might trigger some time inconsistencies, but
having non-time64_t enabled glibc is probably the main reason why your boot fails.

Both Dave and myself did test this commit:
  commit b5ff52be891347f8847872c49d7a5c2fa29400a7
  parisc: Convert to generic clockevents

Nevertheless, maybe on your system one of the SMP CPUs returned an invalid
time (e.g. outside of 32-bit range) and thus your glibc seems to have gone crazy.
That's currently the only idea I have right now which somehow might explain your situation.

I think we need more testing.

Helge

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] ext4 corruption on parisc since 6.12
  2024-12-02 15:31         ` matoro
  2024-12-02 16:35           ` Helge Deller
@ 2024-12-02 19:45           ` John David Anglin
  2024-12-03 16:38             ` John David Anglin
  1 sibling, 1 reply; 9+ messages in thread
From: John David Anglin @ 2024-12-02 19:45 UTC (permalink / raw)
  To: matoro
  Cc: Magnus Lindholm, Linux Parisc, deller, Deller, Sam James,
	Linux Kernel Mailing List

On 2024-12-02 10:31 a.m., matoro wrote:
> On 2024-12-02 09:54, John David Anglin wrote:
>> On 2024-12-02 1:30 a.m., Magnus Lindholm wrote:
>>> On Mon, Dec 2, 2024 at 5:55 AM matoro
>>> <matoro_mailinglist_kernel@matoro.tk> wrote:
>>>> Hmm, this is my config, also on an rp3440:
>>>>
>>>> #
>>>> # Timers subsystem
>>>> #
>>>> CONFIG_HZ_PERIODIC=y
>>>> # CONFIG_NO_HZ_IDLE is not set
>>>> # CONFIG_NO_HZ is not set
>>>> # CONFIG_HIGH_RES_TIMERS is not set
>>>> # end of Timers subsystem
>>>>
>>>> lindholm can confirm on their hardware/config.  Maybe you can try that and
>>>> see if you can reproduce?  I will try your config as well.
>>> Hi, I'm on a HPC8000 "parisc64 PA8800 (Mako) 9000/785/C8000". I can confirm
>>> that building a kernel CONFIG_SMP=n will mitigate this problem.
>>> I haven't messed around with the config in the Timer subsystem so in my case the
>>> parameters suggested are unset. (my config looks like matoros)
>> The clockevent driver was tested on both rp3440 and c8000, and some other SMP machines.
>> Helge knows details.  I have used it on rp3440 and c8000.
>>
>> I would try my settings.  The primary reason in switching to the clockevent drivers was to
>> improve clock resolution.  The best resolution with the old drivers was 1 ms at 1000 HZ.
>> This caused problems with various package tests.  If config is the issue, probably
>> CONFIG_HIGH_RES_TIMERS needs to be forced when clockevent drivers are used.
>>
>> Almost every other system uses the clockevent drivers.  So, there was a risk that parisc would
>> become unsupported.
>>
>> I wonder if this could be caused by dead RTC battery.  Did you check output of date command?
>> Maybe a dead RTC battery interacts badly with clockevent drivers.
>>
>> I run ntp on all my machines.
>>
>> What files have bad dates (i.e., is this really a ext4 file system issue) or is it just that system has
>> a bad clock?
>>
>> Dave
>
> The files that have bad dates seem to be the ones /init on this system touches at early boot.  See the output here: 
> https://paste.matoro.tk/8cq8omg
>
> When booted into the bad kernel, date(1) works and displays the correct time.  I'm using chrony for time syncing as well.
>
> After switching to the config specified above, boot hangs before even getting to userspace with the following output:
>
> [   12.473410] 0000:e0:01.1: ttyS2 at MMIO 0xfffffffff4050038 (irq = 73, base_baud = 115200) is a 16550A
> [   12.757386] sym0: <1010-66> rev 0x1 at pci 0000:20:01.0 irq 70
> [   12.761419] sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
> [   12.885367] sym0: SCSI BUS has been reset.
> [   12.889389] scsi host0: sym-2.2.3
> [   13.053380] sym1: <1010-66> rev 0x1 at pci 0000:20:01.1 irq 71
> [   13.055515] sym1: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
> [   13.165367] sym1: SCSI BUS has been reset.
> [   13.169388] scsi host1: sym-2.2.3
> [   13.208927] rtc-generic rtc-generic: registered as rtc0
> [   13.281367] rtc-generic rtc-generic: setting system clock to 2024-12-02T07:17:02 UTC (1733123822)
> [   13.281367] NET: Registered PF_INET6 protocol family
> [   13.281367] Segment Routing with IPv6
> [   13.281367] In-situ OAM (IOAM) with IPv6
> [   13.281367] registered taskstats version 1
> [   13.281367] Unstable clock detected, switching default tracing clock to "global"
> [   13.281367] If you want to keep using the local clock, then add:
> [   13.281367]   "trace_clock=local"
> [   13.281367] on the kernel command line
>
> At the end there the clock seems to stop progressing forward, as there are several real-time seconds that elapse in between messages with the 
> same timestamp.  So I'm completely unable to boot with this config at all.
I don't see "Unstable clock detected" message.

I also have in config:
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GENERIC_SCHED_CLOCK=y

Clock seems to get stuck here;
[   13.281367] rtc-generic rtc-generic: setting system clock to 2024-12-02T07:17:02 UTC (1733123822)

On mx3210, clock continues to increment:
[    1.995462] rtc-generic rtc-generic: registered as rtc0
[    2.003158] rtc-generic rtc-generic: setting system clock to 2024-12-01T15:23:25 UTC (1733066605)
[    2.003719] IR JVC protocol handler initialized
[    2.004109] IR MCE Keyboard/mouse protocol handler initialized

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] ext4 corruption on parisc since 6.12
  2024-12-02 19:45           ` John David Anglin
@ 2024-12-03 16:38             ` John David Anglin
  0 siblings, 0 replies; 9+ messages in thread
From: John David Anglin @ 2024-12-03 16:38 UTC (permalink / raw)
  To: matoro
  Cc: Magnus Lindholm, Linux Parisc, deller, Deller, Sam James,
	Linux Kernel Mailing List

On 2024-12-02 2:45 p.m., John David Anglin wrote:
> I also have in config:
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
Here is a comment from <https://www.kernel.org/doc/Documentation/timers/timekeeping.txt>:

On SMP systems, it is crucial for performance that sched_clock() can be called
independently on each CPU without any synchronization performance hits.
Some hardware (such as the x86 TSC) will cause the sched_clock() function to
drift between the CPUs on the system. The kernel can work around this by
enabling the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK option. This is another aspect
that makes sched_clock() different from the ordinary clock source.

Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-12-03 16:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-02  0:26 [bisected] ext4 corruption on parisc since 6.12 matoro
2024-12-02  1:47 ` John David Anglin
2024-12-02  4:55   ` matoro
2024-12-02  6:30     ` Magnus Lindholm
2024-12-02 14:54       ` John David Anglin
2024-12-02 15:31         ` matoro
2024-12-02 16:35           ` Helge Deller
2024-12-02 19:45           ` John David Anglin
2024-12-03 16:38             ` John David Anglin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox