All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Oops with 4GB memory setting in 2.4.0 stable
@ 2001-01-16 13:33 Petr Vandrovec
  2001-01-16 20:17 ` Urban Widmark
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Vandrovec @ 2001-01-16 13:33 UTC (permalink / raw)
  To: Urban Widmark; +Cc: linux-kernel, rmager

On 16 Jan 01 at 9:40, Urban Widmark wrote:
> On Tue, 16 Jan 2001, Rainer Mager wrote:
> 
> > Hi all,
> >
> >   I have a 100% reproducable bug in all of the 2.4.0 kernels including the
> > latest stable one. The issue is that if I compile the kernel to support 4GB
> > RAM (I have 1 GB) and then try to access a samba mount I get an oops. This
> 
> I'll have a look tonight or so. It works for you on non-bigmem?
> 
> > ALWAYS happens. Usually after this the system is frozen (although the magic
> > SYSREQ still works). If the system isn't frozen then any commands that
> > access the disk will freeze. Fortunately GPM worked and I was able to paste
> > the oops to a file via telnet.
> 
> smb_rename suggests mv, but the process is ls ... er? What commands where
> you running on smbfs when it crashed?
> 
> Could this be a symbol mismatch? Keith Owens suggested a less manual way
> to get module symbol output. Do you get the same results using that?

smb_get_dircache looks suspicious to me, as it can try to map unlimited
number of pages with kmap. And kmaps are not unlimited resource...
You have 512 kmaps, but one SMBFS cache page can contain about 504
pages... So two smbfs cached directories can consume all your kmaps,
dying then in endless loop in mm/highmem.c:map_new_virtual().

Also, smb_add_to_cache looks suspicious:

cachep->idx++;
if (cachep->idx > NINDEX) goto out_full;

cannot idx grow over any limit?

get_block:
  cachep->pages++;
  ...
  if (page) {
    block = kmap(page);
    ...
  }
  
Should not you increment cachep->pages only if grab_cache_page
succeeded? This can cause that smb_find_in_cache finds NULL
index->block, which then oopses...

smb_find_in_cache should verify index->block == NULL anyway, as
smb_get_dircache can return couple of index->block == NULL when system
decided to throw out one of cache pages connected to directory.

But I personally do not use neither smbfs nor PAE, so what I can say...
                                            Best regards,
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz

BTW: For ncpfs PAE testing I was using patch which needed kmap() for
all memory above 32MB... It was very educational...
                                                    
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Oops with 4GB memory setting in 2.4.0 stable
@ 2001-01-16 22:29 Petr Vandrovec
  2001-01-16 22:38 ` Urban Widmark
  2001-01-16 22:42 ` Rainer Mager
  0 siblings, 2 replies; 15+ messages in thread
From: Petr Vandrovec @ 2001-01-16 22:29 UTC (permalink / raw)
  To: Urban Widmark; +Cc: linux-kernel, rmager

On 16 Jan 01 at 21:17, Urban Widmark wrote:
> The smbfs dircache needs to find/kmap all of its cache pages since the
> entries in it are variable length and the way it is called. It would be
> nice to change that.
> 
> I haven't looked at all your detailed comments yet. They may not matter if
> the many kmaps are a problem.

I think that too many kmaps could explain reported 'silent hang'... (if
my memory serves good, there was some report about silent PAE hang during
last 7 days, yes?). Not-checking ->block for NULL looks like bug which
can be triggered without kmap too.
 
> how can it know that the dentry is the right one? I thought that dentries
> could be removed/reused by someone at will (d_count will be 0 because of
> the dput in ncp_fill_cache, no?). Why isn't it possible for someone to
> write a new dentry where the old one was.
> 
> fs/ncpfs/dir.c:ncp_d_validate() calls
>   valid = d_validate(dentry, dentry->d_parent, dentry->d_name.hash, len);
> 
> all values are taken from the dentry pointer on the cache page (including
> len). d_validate verifies that d_hash() points to a list and it searches
> the list for dentry. How do you know that it is the same dentry that was
> put in the cache and not someone elses dentry?

Before calling d_validate it checks whethern dentry->d_parent == parent
(readdir-ed directory). And if dentry is in directory we read,
it is in dentry d_hash, and even d_fsdata matches its position in
directory, I bet that it is valid dentry... 

If there is new dentry, which is at fpos postion, and it is child of
readdir-ed directory, we should return it anyway, no? There must not be
two ncpfs dentries with same d_parent and d_fsdata if d_fsdata != 0,
as each dentry can be in only one directory.

This looked as reasonable limitation to me ;-)
                                            Best regards,
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread
* FS corruption on 2.4.0-ac8
@ 2001-01-15 22:47 Jure Pecar
  2001-01-15 23:31 ` Oops with 4GB memory setting in 2.4.0 stable Rainer Mager
  0 siblings, 1 reply; 15+ messages in thread
From: Jure Pecar @ 2001-01-15 22:47 UTC (permalink / raw)
  To: linux-kernel

Hi all,

I was running 2.4.0test10pre5 happily for months and wanted to see how
things stand in the 'latest stuff'. Here's what i found:

I compiled 2.4.0-ac8 with nearly the same .config as test10pre5 (with
latest gcc on rh7). Then i booted it and used X for some normal browsing
and mp3s. Performance was poor, responsivness also, even the mouse
stopped responding for a couple of seconds at a time, a lot of disk
trashing & so on. I deceided to boot test10 back, and there was a nasty
suprise: fsck found filesystem with errors, and LOTS of them ... i had
to hold down 'y' for almost 5 minutes ... :)

Then i examined the logs for what would be the cause for this ... and
here's what 2.4.0-ac8 left in the logs:

Jan 14 16:26:47 open kernel: ee_blocks: Freeing blocks not in datazone -
block = 979727457, count = 1
Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 1769096736,
count = 1
Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 842080300,
count = 1
Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 1851869728,
count = 1
Jan 14 16:26:47 open kernel: EXT2-fs error (device md(9,1)):
ext2_free_blocks: Freeing blocks not in datazone - block = 808464928,
count = 1
...
and so on for about 150 such lines in 3 seconds.

There is something not that usual about my setup: i run raid1 /boot and
raid5 root with one disk disconnected (its simply too loud...), so the
array is in degraded mode all the time. Other hardware is more or less
standard, p200 classic, 430vx board, adaptec2940u, 64mb ram.

Is this a known problem? If it's not, please advise me on how to provide
more usefull informations.


-- 

Jure Pecar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2001-01-18  0:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-16 13:33 Oops with 4GB memory setting in 2.4.0 stable Petr Vandrovec
2001-01-16 20:17 ` Urban Widmark
  -- strict thread matches above, loose matches on Subject: below --
2001-01-16 22:29 Petr Vandrovec
2001-01-16 22:38 ` Urban Widmark
2001-01-16 22:42 ` Rainer Mager
2001-01-15 22:47 FS corruption on 2.4.0-ac8 Jure Pecar
2001-01-15 23:31 ` Oops with 4GB memory setting in 2.4.0 stable Rainer Mager
2001-01-15 21:47   ` Marcelo Tosatti
2001-01-15 23:45     ` Rainer Mager
2001-01-15 22:09       ` Marcelo Tosatti
2001-01-16  0:21         ` Rainer Mager
2001-01-15 22:37           ` Marcelo Tosatti
2001-01-16  2:03         ` Keith Owens
2001-01-16  8:40   ` Urban Widmark
2001-01-17 23:59     ` Rainer Mager
2001-01-18  0:30       ` Urban Widmark

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.