On 4/2/07, Dan Williams <dan.j.williams@intel.com> wrote:
> On 3/30/07, Raz Ben-Jehuda(caro) <raziebe@gmail.com> wrote:
> > Please see bellow.
> >
> > On 8/28/06, Neil Brown <neilb@suse.de> wrote:
> > > On Sunday August 13, raziebe@gmail.com wrote:
> > > > well ... me again
> > > >
> > > > Following your advice....
> > > >
> > > > I added a deadline for every WRITE stripe head when it is created.
> > > > in raid5_activate_delayed i checked if deadline is expired and if not i am
> > > > setting the sh to prereadactive mode as .
> > > >
> > > > This small fix ( and in few other places in the code) reduced the
> > > > amount of reads
> > > > to zero with dd but with no improvement to throghput. But with random access to
> > > > the raid  ( buffers are aligned by the stripe width and with the size
> > > > of stripe width )
> > > > there is an improvement of at least 20 % .
> > > >
> > > > Problem is that a user must know what he is doing else there would be
> > > > a reduction
> > > > in performance if deadline line it too long (say 100 ms).
> > >
> > > So if I understand you correctly, you are delaying write requests to
> > > partial stripes slightly (your 'deadline') and this is sometimes
> > > giving you a 20% improvement ?
> > >
> > > I'm not surprised that you could get some improvement.  20% is quite
> > > surprising.  It would be worth following through with this to make
> > > that improvement generally available.
> > >
> > > As you say, picking a time in milliseconds is very error prone.  We
> > > really need to come up with something more natural.
> > > I had hopped that the 'unplug' infrastructure would provide the right
> > > thing, but apparently not.  Maybe unplug is just being called too
> > > often.
> > >
> > > I'll see if I can duplicate this myself and find out what is really
> > > going on.
> > >
> > > Thanks for the report.
> > >
> > > NeilBrown
> > >
> >
> > Neil Hello. I am sorry for this interval , I was assigned abruptly to
> > a different project.
> >
> > 1.
> >   I'd taken a look at the raid5 delay patch I have written a while
> > ago. I ported it to 2.6.17 and tested it. it makes sounds of working
> > and when used correctly it eliminates the reads penalty.
> >
> > 2. Benchmarks .
> >     configuration:
> >      I am testing a raid5 x 3 disks with 1MB chunk size.  IOs are
> > synchronous and non-buffered(o_direct) , 2 MB in size and always
> > aligned to the beginning of a stripe. kernel is 2.6.17. The
> > stripe_delay was set to 10ms.
> >
> >  Attached is the simple_write code.
> >
> >          command :
> >                simple_write /dev/md1 2048 0 1000
> >                        simple_write raw writes (O_DIRECT) sequentially
> > starting from offset zero 2048 kilobytes 1000 times.
> >
> > Benchmark Before patch
> >
> > sda            1848.00      8384.00     50992.00       8384      50992
> > sdb            1995.00     12424.00     51008.00      12424      51008
> > sdc            1698.00      8160.00     51000.00       8160      51000
> > sdd               0.00         0.00         0.00          0          0
> > md0               0.00         0.00         0.00          0          0
> > md1             450.00         0.00    102400.00          0     102400
> >
> >
> > Benchmark After patch
> >
> > sda             389.11         0.00    128530.69          0     129816
> > sdb             381.19         0.00    129354.46          0     130648
> > sdc             383.17         0.00    128530.69          0     129816
> > sdd               0.00         0.00         0.00          0          0
> > md0               0.00         0.00         0.00          0          0
> > md1            1140.59         0.00    259548.51          0     262144
> >
> > As one can see , no additional reads were done. One can actually
> > calculate  the raid's utilization: n-1/n * ( single disk throughput
> > with 1M writes ) .
> >
> >
> >       3.  The patch code.
> >           Kernel tested above was 2.6.17. The patch is of 2.6.20.2
> > because I have noticed a big code differences between 17 to 20.x .
> > This patch was not tested on 2.6.20.2 but it is essentialy the same. I
> > have not tested (yet) degraded mode or any other non-common pathes.
> >
> This is along the same lines of what I am working on, new cache
> policies for raid5/6, so I want to give it a try as well.
> Unfortunately gmail has mangled your patch.  Can you resend as an
> attachment?
>
> patch: **** malformed patch at line 10:
> (&((conf)->stripe_hashtbl[((sect) >> STRIPE_SHIFT) & HASH_MASK]))
>
> Thanks,
> Dan
>

Dan hello.
Attached are the patches. Also , I have added another test unit : random_writev.
It is not much of a code but it does the work. It tests writing a
vector .it shows the same results as writing using a single buffer.

What is the new cache poilcies ?

Please note !
I haven't indented the patch nor did the instructions according to
SubmitingPatches document. If Neil would approve this patch or parts
of it, I will do so.

# Benchmark 3:  Testing  8 disks raid5.

Tyan Numa dual (amd) CPU machine, with 8 sata maxtor disks, controller
is promise
in jbod mode.

raid conf:
md1 : active raid5 sda2[0] sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3]
sdc1[2] sdb2[1]
      3404964864 blocks level 5, 1024k chunk, algorithm 2 [8/8] [UUUUUUUU]

In order to achieve zero reads I had to tune the deadline to 20ms ( so
long ? ). stripe_cache_size is 256 which is exactly what is needed to
preform a full stripe hit
with this configuration.

> comand:    random_writev /dev/md1 7168 0 3000 10000

iostats snapshot

avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.00    0.00   21.00   29.00   50.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
hda               0.00         0.00         0.00          0          0
md0               0.00         0.00         0.00          0          0
sda             234.34         0.00     50400.00          0      49896
sdb             235.35         0.00     50658.59          0      50152
sdc             242.42         0.00     51014.14          0      50504
sdd             246.46         0.00     50755.56          0      50248
sde             248.48         0.00     51272.73          0      50760
sdf             245.45         0.00     50755.56          0      50248
sdg             244.44         0.00     50755.56          0      50248
sdh             245.45         0.00     50755.56          0      50248
md1            1407.07         0.00    347741.41          0     344264

Try setting it the stripe_cace_size to 255 and you will notice the delay.
Try lowering with the stripe_deadline and you will notice how the amount
of reads grow.

Cheers
-- 
Raz