* Clobbered file after jffs2 mount
@ 2007-09-21 17:23 Jon Ringle
2007-09-21 23:38 ` David Woodhouse
0 siblings, 1 reply; 8+ messages in thread
From: Jon Ringle @ 2007-09-21 17:23 UTC (permalink / raw)
To: linux-mtd
Hello,
This falls under the category of bizarre behavior and I'm hoping someone
on this list can help.
I have a jffs2 filesystem on my target (IXP455 processor) running Linux
2.6.16.29. Sometimes the first time that Linux mounts the jffs2
filesystem after reflashing the jffs2 image, sshd would strangely
misbehave and not allow incoming connections. After much
troubleshooting, it was discovered that the md5sum of the sshd file was
different than the expected md5sum. However, if the system is rebooted,
then sshd works correctly and the md5sum of sshd on the target magically
corrected itself to the expected value.
I patched jffs2reader.c with the set of 4 patches found in this thread
to help me extract the sshd that is found in the physical medium:
http://lists.infradead.org/pipermail/linux-mtd/2005-October/014066.html
The correct md5sum for sshd is: 9e7e08056641d4029d239dc3c6028de9
[root@isc1 ~]# md5sum /usr/sbin/sshd
8c56d103db48c6ec69bba8189f3e64bf /usr/sbin/sshd
[root@isc1 ~]# cp /dev/mtd1 /tmp/jffs2.dmp
[root@isc1 ~]# jffs2reader -b -f /usr/sbin/sshd /tmp/jffs2.dmp > /tmp/sshd
[root@isc1 ~]# md5sum /tmp/sshd
9e7e08056641d4029d239dc3c6028de9 /tmp/sshd
I then did a hexdump of both files and found that there are two 32 bit
values that got clobbered:
[root@isc1 ~]# hexdump -C /usr/sbin/sshd > /tmp/sshd.bad.hex
[root@isc1 ~]# hexdump -C /tmp/sshd > /tmp/sshd.good.hex
(The following diff has been hand edited slightly just to avoid line
wrap by email)
[root@isc1 ~]# diff -u /tmp/sshd.bad.hex /tmp/sshd.good.hex
--- /tmp/sshd.bad.hex 2007-09-21 09:42:55 +0000
+++ /tmp/sshd.good.hex 2007-09-21 09:43:06 +0000
@@ -9817,7 +9817,7 @@
00026630 e3a030cc ebffe64b e59f04b0 ebffee5b |..0....K.......[|
00026640 e5993048 e3130001 01a00009 0bffff77 |..0H...........w|
00026650 e3a01000 e2890024 ebfffef8 e28d1024 |.......$.......$|
-00026660 00000023 e1a00005 ebfffef4 e58d001c |...#............|
+00026660 e58d0020 e1a00005 ebfffef4 e58d001c |... ............|
00026670 e5993014 e3530000 159d1020 059d2020 |..0..S..... .. |
00026680 059d301c 158d0018 e3a0b000 158d1014 |..0.............|
00026690 058d2018 058d3014 e3a00040 ebfff933 |.. ...0....@...3|
@@ -9945,7 +9945,7 @@
00026e30 e2466004 e1a01007 81a02004 859f00d0 |.F`....... .....|
00026e40 8a000011 ebff7108 e1a02004 e1a01007 |......q... .....|
00026e50 e1a00005 ebff727b e1a00008 ebff719e |......r{......q.|
-00026e60 00000073 e1a02fc3 e0833ea2 e1a041c3 |...s../...>...A.|
+00026e60 e2803007 e1a02fc3 e0833ea2 e1a041c3 |..0.../...>...A.|
00026e70 e2443040 e3530d1f e1a01007 e1a00008 |.D0@.S..........|
00026e80 9a000003 e59f008c e1a02004 e59f1088 |.......... .....|
00026e90 ebffe21b ebff70f4 e1a01007 e1a02004 |......p....... .|
It would seem that the underlying image of sshd on the physical medium
is correct, however the image of sshd as presented via Linux's VFS and
JFFS2 has damaged the image.
Does anyone have any suggestions?
Thanks,
Jon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Clobbered file after jffs2 mount
2007-09-21 17:23 Clobbered file after jffs2 mount Jon Ringle
@ 2007-09-21 23:38 ` David Woodhouse
2007-09-22 1:35 ` Jon Ringle
0 siblings, 1 reply; 8+ messages in thread
From: David Woodhouse @ 2007-09-21 23:38 UTC (permalink / raw)
To: Jon Ringle; +Cc: linux-mtd
On Fri, 2007-09-21 at 13:23 -0400, Jon Ringle wrote:
> Hello,
>
> This falls under the category of bizarre behavior and I'm hoping someone
> on this list can help.
Yes, that is indeed bizarre.
> I have a jffs2 filesystem on my target (IXP455 processor) running Linux
> 2.6.16.29. Sometimes the first time that Linux mounts the jffs2
> filesystem after reflashing the jffs2 image, sshd would strangely
> misbehave and not allow incoming connections. After much
> troubleshooting, it was discovered that the md5sum of the sshd file was
> different than the expected md5sum. However, if the system is rebooted,
> then sshd works correctly and the md5sum of sshd on the target magically
> corrected itself to the expected value.
My first inclination is to suspect that somehow the pages of sshd got
scribbled on in RAM. Is the corruption repeatable?
Can you instrument jffs2_read_inode_range() to print a hex dump of the
"offending" bytes as soon as they've been read?
Can you boot directly into a shell (init=/bin/sh) and then md5sum the
offending file?
--
dwmw2
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Clobbered file after jffs2 mount
2007-09-21 23:38 ` David Woodhouse
@ 2007-09-22 1:35 ` Jon Ringle
2007-09-22 1:51 ` David Woodhouse
0 siblings, 1 reply; 8+ messages in thread
From: Jon Ringle @ 2007-09-22 1:35 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
On Friday, September 21, 2007 19:39, David Woodhouse wrote:
> On Fri, 2007-09-21 at 13:23 -0400, Jon Ringle wrote:
> > I have a jffs2 filesystem on my target (IXP455 processor) running
Linux
> > 2.6.16.29. Sometimes the first time that Linux mounts the jffs2
> > filesystem after reflashing the jffs2 image, sshd would strangely
> > misbehave and not allow incoming connections. After much
> > troubleshooting, it was discovered that the md5sum of the sshd file
was
> > different than the expected md5sum. However, if the system is
rebooted,
> > then sshd works correctly and the md5sum of sshd on the target
magically
> > corrected itself to the expected value.
>
> My first inclination is to suspect that somehow the pages of sshd got
> scribbled on in RAM. Is the corruption repeatable?
This is also what I suspect. The corruption is repeatable on the first
boot after a reflash of the jffs2 filesystem about 90% of the time. One
thing that is different on the first boot is that ssh keys get generated
on that first boot.
>
> Can you instrument jffs2_read_inode_range() to print a hex dump of the
> "offending" bytes as soon as they've been read?
Thanks David. This information will help further my investigation now
that you've pointed me towards how I can find the sshd image in RAM
pages. I will attach my JTAG debugger when I'm next in the office and
use GDB to extract a dump of the RAM sshd image using a conditional
breakpoint with sshd's inode number in jffs2_read_inode_range(). I will
also try to locate the clobbered bytes so I can put a data breakpoint on
it to see if I can get a backtrace pointing to what is causing the
corruption.
>
> Can you boot directly into a shell (init=/bin/sh) and then md5sum the
> offending file?
I tried init=/bin/sh on a first boot after reflash and the md5sum of
sshd was correct. I then rebooted again normally and the ssh keys were
generated. When I logged in, the md5sum of sshd was wrong. The
corruption that I observe is always the same incorrect md5sum.
Jon
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Clobbered file after jffs2 mount
2007-09-22 1:35 ` Jon Ringle
@ 2007-09-22 1:51 ` David Woodhouse
2007-09-22 14:43 ` Brian T
2007-09-25 15:57 ` Jon Ringle
0 siblings, 2 replies; 8+ messages in thread
From: David Woodhouse @ 2007-09-22 1:51 UTC (permalink / raw)
To: Jon Ringle; +Cc: linux-mtd
On Fri, 2007-09-21 at 21:35 -0400, Jon Ringle wrote:
> I tried init=/bin/sh on a first boot after reflash and the md5sum of
> sshd was correct. I then rebooted again normally and the ssh keys were
> generated. When I logged in, the md5sum of sshd was wrong. The
> corruption that I observe is always the same incorrect md5sum.
But there's no corruption on the _flash_ -- if you boot with
init=/bin/sh again after the keys are generated, you again get the
_correct_ md5sum? I'm fairly certain of that, since the failure mode
you'll get if you manage to scribble on the flash is that the data CRC
will fail and you'll get zeroes where the offending nodes go missing.
It's going to be something scribbling on the RAM pages after the file is
read from the file system. Be thankful it looks fairly repeatable. Can
you put a hardware watchpoint on the offending page in the page cache,
after it's read? And can you disable _everything_ in the system which
uses DMA, one at a time?
--
dwmw2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Clobbered file after jffs2 mount
2007-09-22 1:51 ` David Woodhouse
@ 2007-09-22 14:43 ` Brian T
2007-09-23 22:20 ` Jon Ringle
2007-09-25 15:57 ` Jon Ringle
1 sibling, 1 reply; 8+ messages in thread
From: Brian T @ 2007-09-22 14:43 UTC (permalink / raw)
To: David Woodhouse, Jon Ringle; +Cc: linux-mtd
Been reading this thread, and I was wondering what kind of hardware this is running on? I
remember running into something like this a few years ago on my companies own embedded
hardware, and the cause turned out to be a problem with an internal Multitech modem's
firmware on the same bus which was interfering with reading the jffs2 file system.
I would see many ( but not all ) of the sym links on the file system pointing to garbage
links like syslogd -> /m/m/m/m/m/m/m/s/s/s/s/e/e/e/ and also other programs on the system
would not run properly. After a reboot they would be fine for days to weeks.
Thought I would offer that up.
-Brian
> On Fri, 2007-09-21 at 21:35 -0400, Jon Ringle wrote:
>> I tried init=/bin/sh on a first boot after reflash and the md5sum of
>> sshd was correct. I then rebooted again normally and the ssh keys were
>> generated. When I logged in, the md5sum of sshd was wrong. The
>> corruption that I observe is always the same incorrect md5sum.
>
> But there's no corruption on the _flash_ -- if you boot with
> init=/bin/sh again after the keys are generated, you again get the
> _correct_ md5sum? I'm fairly certain of that, since the failure mode
> you'll get if you manage to scribble on the flash is that the data CRC
> will fail and you'll get zeroes where the offending nodes go missing.
>
> It's going to be something scribbling on the RAM pages after the file is
> read from the file system. Be thankful it looks fairly repeatable. Can
> you put a hardware watchpoint on the offending page in the page cache,
> after it's read? And can you disable _everything_ in the system which
> uses DMA, one at a time?
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Clobbered file after jffs2 mount
2007-09-22 14:43 ` Brian T
@ 2007-09-23 22:20 ` Jon Ringle
2007-09-24 14:26 ` Brian T
0 siblings, 1 reply; 8+ messages in thread
From: Jon Ringle @ 2007-09-23 22:20 UTC (permalink / raw)
To: Brian T, David Woodhouse; +Cc: linux-mtd
On Saturday, September 22, 2007 10:43, Brian T wrote:
> Been reading this thread, and I was wondering what kind of hardware
this
> is running on? I remember running into something like this a few years
ago > on my companies own embedded hardware, and the cause turned out to
be a
> problem with an internal Multitech modem's firmware on the same bus
which > was interfering with reading the jffs2 file system.
>
> I would see many ( but not all ) of the sym links on the file system
> pointing to garbage links like syslogd ->
/m/m/m/m/m/m/m/s/s/s/s/e/e/e/
> and also other programs on the system would not run properly. After a
> reboot they would be fine for days to weeks.
>
> Thought I would offer that up.
Brian, I find that quite interesting. This is on our companies own
hardware. We have an IXP455 in PCI option mode using an Intel
StrataFlash P30 that is on CS0 of the IXP's expansion bus. The PCI host
does perform some read-only access of the P30 flash via a PCI bus ->
Expansion bus -> P30 flash data path to read info such as serial number,
mac address and IP address of the IXP that is stored on the flash. We
know that sometimes when the PCI host does this, it happens to do so
when the IXP happens to be doing a write operation to flash, so the read
operation by the PCI host returns "garbage". This is ok, because the
data that the PCI host is reading is checksumed and all that the PCI
host needs to do is retry the read again later.
I've been pouring over the P30 datasheet flowcharts to see if there is
some race condition where a spurious flash read operation by the PCI
host could interfere with the way that the IXP reads the contents of
flash. So far, I haven't been able to find something like that.
How did you determine that the modem was causing your problem? It sure
sounds like similar symptoms.
I decided to create a script that allows me to scan all the files in the
jffs2 image and have found that when this problem doesn't appear in
sshd, that I see this in other files with the same replacement of either
0x00000023 or 0x00000073 on some 32 bit value. I've seen this now show
up in /etc/sshd_config and also /lib/libcrypto.so.0.9.7
When I get to the office tomorrow, I'll hook up the JTAG and try some of
the things that David suggested.
Jon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Clobbered file after jffs2 mount
2007-09-23 22:20 ` Jon Ringle
@ 2007-09-24 14:26 ` Brian T
0 siblings, 0 replies; 8+ messages in thread
From: Brian T @ 2007-09-24 14:26 UTC (permalink / raw)
To: Jon Ringle, David Woodhouse; +Cc: linux-mtd
Hi John,
>How did you determine that the modem was causing your problem? It sure
>sounds like similar symptoms.
I talked to our hardware engineer this morning who reminded me of what the problem was. It was with
a Conexant modem ( not Multitech like I had stated before ). These are basically his words, though
I don't follow it fully ( which is why I am in software, and not hardware ;) )
"Basically, on our hardware, the Conexant modem was connected using it's Parallel port to save a
UART on the main board. But while in this mode, it's address lines could become "output". The
Intel flash we use is on 0xa0-0xa2, and the Conexant modem could sometimes "jam" 0xa1-0xa2 and cause
use to see a temp corruption on the file system. "
After a new firmware for the modem was released, it should have fixed the problem, but by that time,
our hardware engineer has switched the modem to be connected through it's serial interface.
Not sure if this makes sense. If you need me to revise I can ask him to clarify the fuzzy parts.
Not sure if this helps you identify the problem you are seeing, but it might accidentally =)
-Brian
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Clobbered file after jffs2 mount
2007-09-22 1:51 ` David Woodhouse
2007-09-22 14:43 ` Brian T
@ 2007-09-25 15:57 ` Jon Ringle
1 sibling, 0 replies; 8+ messages in thread
From: Jon Ringle @ 2007-09-25 15:57 UTC (permalink / raw)
To: David Woodhouse, Brian T; +Cc: linux-mtd
On Friday, September 21, 2007 21:51, David Woodhouse wrote:
> It's going to be something scribbling on the RAM pages after the file
is
> read from the file system. Be thankful it looks fairly repeatable. Can
> you put a hardware watchpoint on the offending page in the page cache,
> after it's read? And can you disable _everything_ in the system which
> uses DMA, one at a time?
I was able to disable everything else that had external hardware access
to flash and the IXP's memory and the problem still appeared. I have now
been able to determine that the fault lies with some memory scribbling
done by one of our own kernel modules.
There is no need to further bother this mailing list with this issue.
Thanks David and Brian for your input in this problem :)
Jon
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-09-25 15:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-21 17:23 Clobbered file after jffs2 mount Jon Ringle
2007-09-21 23:38 ` David Woodhouse
2007-09-22 1:35 ` Jon Ringle
2007-09-22 1:51 ` David Woodhouse
2007-09-22 14:43 ` Brian T
2007-09-23 22:20 ` Jon Ringle
2007-09-24 14:26 ` Brian T
2007-09-25 15:57 ` Jon Ringle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox