linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
@ 2012-05-09 18:31 Stephen Warren
  2012-05-10 19:54 ` Stephen Warren
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Warren @ 2012-05-09 18:31 UTC (permalink / raw)
  To: Kay Sievers
  Cc: William Douglas, Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
	linux-next

For me, next-20120508 prints nothing when booted, and I think also
hangs. To solve this, I reverted:

7ff9554 printk: convert byte-buffer to variable-length record buffer

In order to build, I also had to revert:

c4e00da driver-core: extend dev_printk() to pass structured data

Note: I'm running on an ARM system using a serial console, with
earlyprintk enabled.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
  2012-05-09 18:31 Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer" Stephen Warren
@ 2012-05-10 19:54 ` Stephen Warren
  2012-05-10 20:06   ` Kay Sievers
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Warren @ 2012-05-10 19:54 UTC (permalink / raw)
  To: Kay Sievers, Greg Kroah-Hartman
  Cc: William Douglas, linux-kernel@vger.kernel.org, linux-next

On 05/09/2012 12:31 PM, Stephen Warren wrote:
> For me, next-20120508 prints nothing when booted, and I think also
> hangs. To solve this, I reverted:
> 
> 7ff9554 printk: convert byte-buffer to variable-length record buffer
> 
> In order to build, I also had to revert:
> 
> c4e00da driver-core: extend dev_printk() to pass structured data
> 
> Note: I'm running on an ARM system using a serial console, with
> earlyprintk enabled.

This issue still occurs in next-20120510.

I've tracked it down to the assignment of msg->ts_nsec near the end of
log_store(). If I comment this out, everything works. The problem is the
assignment, not the call to local_clock():

fails:
	msg->ts_nsec = local_clock();
fails:
	msg->ts_nsec = 0;//local_clock();
works:
	//msg->ts_nsec = local_clock();

I guess that msg isn't pointing at what it's supposed to. I'll keep
trying to track this down, but if you could think about why this might
happen (or even revert the changes until this is isolated), it'd be great.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
  2012-05-10 19:54 ` Stephen Warren
@ 2012-05-10 20:06   ` Kay Sievers
  2012-05-10 20:09     ` Stephen Warren
  0 siblings, 1 reply; 9+ messages in thread
From: Kay Sievers @ 2012-05-10 20:06 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Greg Kroah-Hartman, William Douglas, linux-kernel@vger.kernel.org,
	linux-next

On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 05/09/2012 12:31 PM, Stephen Warren wrote:
>> For me, next-20120508 prints nothing when booted, and I think also
>> hangs. To solve this, I reverted:
>>
>> 7ff9554 printk: convert byte-buffer to variable-length record buffer
>>
>> In order to build, I also had to revert:
>>
>> c4e00da driver-core: extend dev_printk() to pass structured data
>>
>> Note: I'm running on an ARM system using a serial console, with
>> earlyprintk enabled.
>
> This issue still occurs in next-20120510.
>
> I've tracked it down to the assignment of msg->ts_nsec near the end of
> log_store(). If I comment this out, everything works. The problem is the
> assignment, not the call to local_clock():
>
> fails:
>        msg->ts_nsec = local_clock();
> fails:
>        msg->ts_nsec = 0;//local_clock();
> works:
>        //msg->ts_nsec = local_clock();

Weird.

What happens if you change it to:
  cpu_clock(logbuf_cpu);
?

If it works, the timestamps look ok?

Kay

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
  2012-05-10 20:06   ` Kay Sievers
@ 2012-05-10 20:09     ` Stephen Warren
  2012-05-10 20:13       ` Stephen Warren
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Warren @ 2012-05-10 20:09 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg Kroah-Hartman, William Douglas, linux-kernel@vger.kernel.org,
	linux-next

On 05/10/2012 02:06 PM, Kay Sievers wrote:
> On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> On 05/09/2012 12:31 PM, Stephen Warren wrote:
>>> For me, next-20120508 prints nothing when booted, and I think also
>>> hangs. To solve this, I reverted:
>>>
>>> 7ff9554 printk: convert byte-buffer to variable-length record buffer
>>>
>>> In order to build, I also had to revert:
>>>
>>> c4e00da driver-core: extend dev_printk() to pass structured data
>>>
>>> Note: I'm running on an ARM system using a serial console, with
>>> earlyprintk enabled.
>>
>> This issue still occurs in next-20120510.
>>
>> I've tracked it down to the assignment of msg->ts_nsec near the end of
>> log_store(). If I comment this out, everything works. The problem is the
>> assignment, not the call to local_clock():
>>
>> fails:
>>        msg->ts_nsec = local_clock();
>> fails:
>>        msg->ts_nsec = 0;//local_clock();
>> works:
>>        //msg->ts_nsec = local_clock();
> 
> Weird.
> 
> What happens if you change it to:
>   cpu_clock(logbuf_cpu);
> ?
> 
> If it works, the timestamps look ok?

I doubt that would work - after all, assigning 0 fails, but not
performing the assignment at all works. But, I'll go try it...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
  2012-05-10 20:09     ` Stephen Warren
@ 2012-05-10 20:13       ` Stephen Warren
  2012-05-10 20:15         ` Kay Sievers
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Warren @ 2012-05-10 20:13 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg Kroah-Hartman, William Douglas, linux-kernel@vger.kernel.org,
	linux-next

On 05/10/2012 02:09 PM, Stephen Warren wrote:
> On 05/10/2012 02:06 PM, Kay Sievers wrote:
>> On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>> On 05/09/2012 12:31 PM, Stephen Warren wrote:
>>>> For me, next-20120508 prints nothing when booted, and I think also
>>>> hangs. To solve this, I reverted:
>>>>
>>>> 7ff9554 printk: convert byte-buffer to variable-length record buffer
>>>>
>>>> In order to build, I also had to revert:
>>>>
>>>> c4e00da driver-core: extend dev_printk() to pass structured data
>>>>
>>>> Note: I'm running on an ARM system using a serial console, with
>>>> earlyprintk enabled.
>>>
>>> This issue still occurs in next-20120510.
>>>
>>> I've tracked it down to the assignment of msg->ts_nsec near the end of
>>> log_store(). If I comment this out, everything works. The problem is the
>>> assignment, not the call to local_clock():
>>>
>>> fails:
>>>        msg->ts_nsec = local_clock();
>>> fails:
>>>        msg->ts_nsec = 0;//local_clock();
>>> works:
>>>        //msg->ts_nsec = local_clock();
>>
>> Weird.
>>
>> What happens if you change it to:
>>   cpu_clock(logbuf_cpu);
>> ?
>>
>> If it works, the timestamps look ok?
> 
> I doubt that would work - after all, assigning 0 fails, but not
> performing the assignment at all works. But, I'll go try it...

Calling cpu_clock() instead of local_clock() fails in the same way.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
  2012-05-10 20:13       ` Stephen Warren
@ 2012-05-10 20:15         ` Kay Sievers
  2012-05-10 20:20           ` Stephen Warren
  0 siblings, 1 reply; 9+ messages in thread
From: Kay Sievers @ 2012-05-10 20:15 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Greg Kroah-Hartman, William Douglas, linux-kernel@vger.kernel.org,
	linux-next

On Thu, May 10, 2012 at 10:13 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 05/10/2012 02:09 PM, Stephen Warren wrote:
>> On 05/10/2012 02:06 PM, Kay Sievers wrote:
>>> On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>> On 05/09/2012 12:31 PM, Stephen Warren wrote:
>>>>> For me, next-20120508 prints nothing when booted, and I think also
>>>>> hangs. To solve this, I reverted:
>>>>>
>>>>> 7ff9554 printk: convert byte-buffer to variable-length record buffer
>>>>>
>>>>> In order to build, I also had to revert:
>>>>>
>>>>> c4e00da driver-core: extend dev_printk() to pass structured data
>>>>>
>>>>> Note: I'm running on an ARM system using a serial console, with
>>>>> earlyprintk enabled.
>>>>
>>>> This issue still occurs in next-20120510.
>>>>
>>>> I've tracked it down to the assignment of msg->ts_nsec near the end of
>>>> log_store(). If I comment this out, everything works. The problem is the
>>>> assignment, not the call to local_clock():
>>>>
>>>> fails:
>>>>        msg->ts_nsec = local_clock();
>>>> fails:
>>>>        msg->ts_nsec = 0;//local_clock();
>>>> works:
>>>>        //msg->ts_nsec = local_clock();
>>>
>>> Weird.
>>>
>>> What happens if you change it to:
>>>   cpu_clock(logbuf_cpu);
>>> ?
>>>
>>> If it works, the timestamps look ok?
>>
>> I doubt that would work - after all, assigning 0 fails, but not
>> performing the assignment at all works. But, I'll go try it...
>
> Calling cpu_clock() instead of local_clock() fails in the same way.

Ok, didn't really see the assign to 0 you tried, sorry. :)

And 'dmesg' works when you run the box with the line commented out?

And 'cat /dev/kmsg'?

Kay

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
  2012-05-10 20:15         ` Kay Sievers
@ 2012-05-10 20:20           ` Stephen Warren
  2012-05-10 20:26             ` Kay Sievers
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Warren @ 2012-05-10 20:20 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg Kroah-Hartman, William Douglas, linux-kernel@vger.kernel.org,
	linux-next

On 05/10/2012 02:15 PM, Kay Sievers wrote:
> On Thu, May 10, 2012 at 10:13 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> On 05/10/2012 02:09 PM, Stephen Warren wrote:
>>> On 05/10/2012 02:06 PM, Kay Sievers wrote:
>>>> On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>>> On 05/09/2012 12:31 PM, Stephen Warren wrote:
>>>>>> For me, next-20120508 prints nothing when booted, and I think also
>>>>>> hangs. To solve this, I reverted:
>>>>>>
>>>>>> 7ff9554 printk: convert byte-buffer to variable-length record buffer
>>>>>>
>>>>>> In order to build, I also had to revert:
>>>>>>
>>>>>> c4e00da driver-core: extend dev_printk() to pass structured data
>>>>>>
>>>>>> Note: I'm running on an ARM system using a serial console, with
>>>>>> earlyprintk enabled.
>>>>>
>>>>> This issue still occurs in next-20120510.
>>>>>
>>>>> I've tracked it down to the assignment of msg->ts_nsec near the end of
>>>>> log_store(). If I comment this out, everything works. The problem is the
>>>>> assignment, not the call to local_clock():
>>>>>
>>>>> fails:
>>>>>        msg->ts_nsec = local_clock();
>>>>> fails:
>>>>>        msg->ts_nsec = 0;//local_clock();
>>>>> works:
>>>>>        //msg->ts_nsec = local_clock();
>>>>
>>>> Weird.
>>>>
>>>> What happens if you change it to:
>>>>   cpu_clock(logbuf_cpu);
>>>> ?
>>>>
>>>> If it works, the timestamps look ok?
>>>
>>> I doubt that would work - after all, assigning 0 fails, but not
>>> performing the assignment at all works. But, I'll go try it...
>>
>> Calling cpu_clock() instead of local_clock() fails in the same way.
> 
> Ok, didn't really see the assign to 0 you tried, sorry. :)
> 
> And 'dmesg' works when you run the box with the line commented out?
> 
> And 'cat /dev/kmsg'?

Yes, both work and produce reasonable output. "cat /dev/kmsg" does hang
at the end of the log until I CTRL-C it - is that expected?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
  2012-05-10 20:20           ` Stephen Warren
@ 2012-05-10 20:26             ` Kay Sievers
  2012-05-10 21:58               ` Stephen Warren
  0 siblings, 1 reply; 9+ messages in thread
From: Kay Sievers @ 2012-05-10 20:26 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Greg Kroah-Hartman, William Douglas, linux-kernel@vger.kernel.org,
	linux-next

On Thu, May 10, 2012 at 10:20 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 05/10/2012 02:15 PM, Kay Sievers wrote:
>> On Thu, May 10, 2012 at 10:13 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>> On 05/10/2012 02:09 PM, Stephen Warren wrote:
>>>> On 05/10/2012 02:06 PM, Kay Sievers wrote:
>>>>> On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>>>> On 05/09/2012 12:31 PM, Stephen Warren wrote:
>>>>>>> For me, next-20120508 prints nothing when booted, and I think also
>>>>>>> hangs. To solve this, I reverted:
>>>>>>>
>>>>>>> 7ff9554 printk: convert byte-buffer to variable-length record buffer
>>>>>>>
>>>>>>> In order to build, I also had to revert:
>>>>>>>
>>>>>>> c4e00da driver-core: extend dev_printk() to pass structured data
>>>>>>>
>>>>>>> Note: I'm running on an ARM system using a serial console, with
>>>>>>> earlyprintk enabled.
>>>>>>
>>>>>> This issue still occurs in next-20120510.
>>>>>>
>>>>>> I've tracked it down to the assignment of msg->ts_nsec near the end of
>>>>>> log_store(). If I comment this out, everything works. The problem is the
>>>>>> assignment, not the call to local_clock():
>>>>>>
>>>>>> fails:
>>>>>>        msg->ts_nsec = local_clock();
>>>>>> fails:
>>>>>>        msg->ts_nsec = 0;//local_clock();
>>>>>> works:
>>>>>>        //msg->ts_nsec = local_clock();
>>>>>
>>>>> Weird.
>>>>>
>>>>> What happens if you change it to:
>>>>>   cpu_clock(logbuf_cpu);
>>>>> ?
>>>>>
>>>>> If it works, the timestamps look ok?
>>>>
>>>> I doubt that would work - after all, assigning 0 fails, but not
>>>> performing the assignment at all works. But, I'll go try it...
>>>
>>> Calling cpu_clock() instead of local_clock() fails in the same way.
>>
>> Ok, didn't really see the assign to 0 you tried, sorry. :)
>>
>> And 'dmesg' works when you run the box with the line commented out?
>>
>> And 'cat /dev/kmsg'?
>
> Yes, both work and produce reasonable output.

Really weird. I have zero ideas at what this could be. The messages in
/dev/kmsg have seqnums, you see the 0 as the first?

5,0,0;Linux version 3.4.0-rc6+ ...
6,1,0;Command line: root=/dev/sda1 ...

> "cat /dev/kmsg" does hang
> at the end of the log until I CTRL-C it - is that expected?

Yeah, 'cat' is a blocking read(), tools would use poll().

Kay

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer"
  2012-05-10 20:26             ` Kay Sievers
@ 2012-05-10 21:58               ` Stephen Warren
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen Warren @ 2012-05-10 21:58 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg Kroah-Hartman, William Douglas, linux-kernel@vger.kernel.org,
	linux-next

On 05/10/2012 02:26 PM, Kay Sievers wrote:
> On Thu, May 10, 2012 at 10:20 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> On 05/10/2012 02:15 PM, Kay Sievers wrote:
>>> On Thu, May 10, 2012 at 10:13 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>> On 05/10/2012 02:09 PM, Stephen Warren wrote:
>>>>> On 05/10/2012 02:06 PM, Kay Sievers wrote:
>>>>>> On Thu, May 10, 2012 at 9:54 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>>>>> On 05/09/2012 12:31 PM, Stephen Warren wrote:
>>>>>>>> For me, next-20120508 prints nothing when booted, and I think also
>>>>>>>> hangs. To solve this, I reverted:
>>>>>>>>
>>>>>>>> 7ff9554 printk: convert byte-buffer to variable-length record buffer
>>>>>>>>
>>>>>>>> In order to build, I also had to revert:
>>>>>>>>
>>>>>>>> c4e00da driver-core: extend dev_printk() to pass structured data
>>>>>>>>
>>>>>>>> Note: I'm running on an ARM system using a serial console, with
>>>>>>>> earlyprintk enabled.
>>>>>>>
>>>>>>> This issue still occurs in next-20120510.
>>>>>>>
>>>>>>> I've tracked it down to the assignment of msg->ts_nsec near the end of
>>>>>>> log_store(). If I comment this out, everything works. The problem is the
>>>>>>> assignment, not the call to local_clock():
>>>>>>>
>>>>>>> fails:
>>>>>>>        msg->ts_nsec = local_clock();
>>>>>>> fails:
>>>>>>>        msg->ts_nsec = 0;//local_clock();
>>>>>>> works:
>>>>>>>        //msg->ts_nsec = local_clock();
>>>>>>
>>>>>> Weird.
>>>>>>
>>>>>> What happens if you change it to:
>>>>>>   cpu_clock(logbuf_cpu);
>>>>>> ?
>>>>>>
>>>>>> If it works, the timestamps look ok?
>>>>>
>>>>> I doubt that would work - after all, assigning 0 fails, but not
>>>>> performing the assignment at all works. But, I'll go try it...
>>>>
>>>> Calling cpu_clock() instead of local_clock() fails in the same way.
>>>
>>> Ok, didn't really see the assign to 0 you tried, sorry. :)
>>>
>>> And 'dmesg' works when you run the box with the line commented out?
>>>
>>> And 'cat /dev/kmsg'?
>>
>> Yes, both work and produce reasonable output.
> 
> Really weird. I have zero ideas at what this could be. The messages in
> /dev/kmsg have seqnums, you see the 0 as the first?
> 
> 5,0,0;Linux version 3.4.0-rc6+ ...
> 6,1,0;Command line: root=/dev/sda1 ...
> 
>> "cat /dev/kmsg" does hang
>> at the end of the log until I CTRL-C it - is that expected?
> 
> Yeah, 'cat' is a blocking read(), tools would use poll().

Found the bug. __log_buf isn't correctly aligned, so the very first
message gets the struct log written to an unaligned address. Subsequent
entries would be correctly aligned in log_store(), except the kernel has
already crashed by then.

The ts_nsec field is a 64-bit write, which triggers the alignment
requirement. I'm not quite sure why the previous 16-bit writes for the
various len fields didn't experience an issue, since __log_buf ended up
at an odd address for me. Perhaps ARM doesn't have an alignment
requirement for 16-bit writes, but does for 32-bit and greater.

I'll post a patch in a minute.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-05-10 21:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-09 18:31 Regression due to 7ff9554 "printk: convert byte-buffer to variable-length record buffer" Stephen Warren
2012-05-10 19:54 ` Stephen Warren
2012-05-10 20:06   ` Kay Sievers
2012-05-10 20:09     ` Stephen Warren
2012-05-10 20:13       ` Stephen Warren
2012-05-10 20:15         ` Kay Sievers
2012-05-10 20:20           ` Stephen Warren
2012-05-10 20:26             ` Kay Sievers
2012-05-10 21:58               ` Stephen Warren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).