From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hans Reiser Subject: Re: ReiserFS v3 choking when free space falls below 10%? Date: Tue, 11 Jul 2006 22:42:24 -0700 Message-ID: <44B48BC0.2000608@namesys.com> References: <44A55271.7050501@namesys.com> <44A55591.6060500@suse.com> <1152059846.30139.11.camel@ipso.snappymail.ca> <44AB25EE.8030702@namesys.com> <20060706125856.fdac1d16.pegasus@nerv.eu.org> <1152200638.30139.45.camel@ipso.snappymail.ca> <44AD5020.6050803@suse.com> <1152210439.30139.59.camel@ipso.snappymail.ca> <44AD58D1.9040806@suse.com> <1152257397.13211.43.camel@ipso.snappymail.ca> <20060707174935.GB13606@atrey.karlin.mff.cuni.cz> <1152306272.13211.62.camel@ipso.snappymail.ca> <44AED01C.60601@namesys.com> <44AFFD50.8070905@suse.com> <44B44856.5070409@suse.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <44B44856.5070409@suse.com> List-Id: Content-Type: text/plain; charset="us-ascii" To: Jeffrey Mahoney Cc: Mike Benoit , Jan Kara , =?UTF-8?B?SnVyZSBQZcSNYXI=?= , reiserfs-list@namesys.com You make this way too complicated because you are trying to be way too perfect. If you scan 3 bitmap blocks and find nothing, stop trying to size match. Hans Jeffrey Mahoney wrote: > Jeff Mahoney wrote: > > >Hans Reiser wrote: > > >>>Guys, if you run the kernel under a debugger, and get it to where you > >>>see the excessive CPU usage, and then start stepping through the bitmap > >>>code, I am sure it will be very obvious what the error is. Can anyone > >>>do that for us? Jeff? > > >Apologies to everyone CC'd who've already seen this message. It was > >bounced from the namesys servers and I wanted to preserve the CC list. > > >*** > > >Mike sent me a copy of the metadata and I am now able to reproduce > >locally. My profiling looks like this: > > >samples % image name app name symbol name > >148596 17.8573 reiserfs.ko reiserfs reiserfs_in_journal > >58194 6.9934 reiserfs.ko reiserfs search_by_key > >38937 4.6792 vmlinux vmlinux memmove > >38783 4.6607 reiserfs.ko reiserfs scan_bitmap_block > >38466 4.6226 jbd jbd (no symbols) > >23249 2.7939 vmlinux vmlinux __find_get_block > >18196 2.1867 vmlinux vmlinux tty_write > >17734 2.1312 vmlinux vmlinux do_ioctl > >17293 2.0782 loop loop (no symbols) > >15400 1.8507 vmlinux vmlinux cond_resched_lock > >14836 1.7829 vmlinux vmlinux copy_user_generic_c > >14143 1.6996 reiserfs.ko reiserfs do_journal_end > >13638 1.6389 vmlinux vmlinux find_next_zero_bit > >13236 1.5906 vmlinux vmlinux default_llseek > >12925 1.5532 vmlinux vmlinux bit_waitqueue > >8921 1.0721 vmlinux vmlinux __delay > > > >Hans - > > >My speculation about the bitmaps being fragmented was right on. I wrote > >a quick little script to parse the output of debugreiserfs -m and report > >on the frequency of different window sizes. Windows of 1-31 blocks are > >extremely common, accounting for 99.8% of all free windows. The problem > >is that in my testing, where I made the allocator report the size of > >allocation requests, the most common request was for a window of 32 > blocks. > > >What's happening is that we keep finding windows that are too small, > >which results in a lot of wasted effort. The cycle goes like this: > > >if (unfm && is_block_in_journal(s, bmap_n, *beg, beg)) > > continue; > >/* first zero bit found; we check next bits */ > >for (end = *beg + 1;; end++) { > > if (end >= *beg + max || end >= boundary > > || reiserfs_test_le_bit(end, bi->bh->b_data)) { > > next = end; > > break; > > } > > /* finding the other end of zero bit window requires > > * looking into journal structures (in > > * case of searching for free blocks for unformatted nodes) */ > > if (unfm && is_block_in_journal(s, bmap_n, end, &next)) > > break; > >} > > >If the window is too small, we end up looping up to the top and try to > >find another one. Since the overwhelming majority of the windows are too > >small, we go through just about all the bitmaps without backing off the > >window size. > > > To be clear, eventually the allocations are honored, but only after > *all* of the bitmaps are searched. On the third pass, we drop the window > to a single block and restart the scan, eventually building a 32-block > set that is probably quite fragmented. This occurs on every write, hence > the huge performance hit. > > It appears as though ext3 doesn't have this problem because they don't > batch writes the way reiserfs does. They'll start a search at a decent > hint the same way we do, but the window is always one block. > > So, we're stuck between a rock and a hard place. We can have the better > allocation performance at lower usage and sacrifice performance later or > we can have stable allocation performance at an overall reduction in > performance. > > I have an idea that may get around both problems, but I'm not sure how > well it would be received. We currently do some very basic caching of > bitmap metadata such as the first zero bit and how many free blocks > there are. What if we constructed an extent map of the free windows in > each bitmaps when we cache the metadata and adjust the map when we > > There's a third option, but I'm not sure how well it would be received > > Right now, the allocator keeps track of things like how full a bitmap is > and where the first zero bit is. It would also be possible to cache a > list of windows in each bitmap to accelerate performance. This would > have to be a shrinkable cache, since the pathlogical case could mean > occupying > > -Jeff > > -- > Jeff Mahoney > SUSE Labs