linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: What represent 646345728 bytes
@ 2010-02-01 17:06 paul.chavent
  2010-02-01 20:07 ` Eric Sandeen
  0 siblings, 1 reply; 8+ messages in thread
From: paul.chavent @ 2010-02-01 17:06 UTC (permalink / raw)
  To: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

Thank you Eric for your reply.

My problem with preallocation is that i don't know the size of my final tar archive. So i would prefer to don't make any supposition.

Yes, i write a stream of 640x480x1 pnm images (307215 bytes each) to a single tar file.
So lets say that each write is 307712 (divisible by 512 bytes for tar).

Here is a test bench program that reproduce the behaviour of the real app.

The program log some write overhead at 645579776bytes and other at 645887488bytes, so not exactly the same thing as in the real app.

You will be able to tell me if my test bench is correct (metric, compilation options, etc.).

The system on which i run the test bench has no other workload, no other disk access.

Please find the attached files : 
- test bench source 
- the dumpe2fs log

Tonight, i will try with the "-O ^uninit_bg at mkfs time".

Thanks.

Paul.



[-- Attachment #2: dumpe2fs.txt --]
[-- Type: text/plain, Size: 1738 bytes --]

dumpe2fs 1.41.9 (22-Aug-2009)
Filesystem volume name:   DATA
Last mounted on:          /var/data
Filesystem UUID:          7b19bf38-cf32-11de-a163-0060c2140392
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         not clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              3670016
Block count:              14653288
Reserved block count:     732664
Free blocks:              11828118
Free inodes:              3669685
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1020
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Thu Nov 12 02:24:18 2009
Last mount time:          Thu Nov 12 02:26:17 2009
Last write time:          Thu Nov 12 06:05:46 2009
Mount count:              1
Maximum mount count:      30
Last checked:             Thu Nov 12 02:24:18 2009
Check interval:           15552000 (6 months)
Next check after:         Tue May 11 02:24:18 2010
Lifetime writes:          11 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Default directory hash:   half_md4
Directory Hash Seed:      7b19c00a-cf32-11de-a163-0060c2140392

[-- Attachment #3: main.c --]
[-- Type: application/octet-stream, Size: 3991 bytes --]

/* gcc -Wall -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -o main main.c -lrt */

/* open */
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

/* write,close,pathconf */
#include <unistd.h>

/* posix_memalign */
#include <stdlib.h>

/* perror */
#include <stdio.h>

/* signal */
#include <signal.h>

/* clock_* */
#include <time.h>

/* iob */
#include <sys/io.h>

#define PP_DATA 0x378

static volatile int flag = 1;

void sig_handler(int sig_num)
{
  flag = 0;
}

int main(int argc, char **argv)
{
  /*
   * I stream 640x480x1 pnm images (307215 bytes each) to a tar file.
   * The write buffer is multiple of 512.
   * So 307712 bytes.
   */
  const int buffer_size = 307712;
  /*
   * Alignement for direct io
   */
  int buffer_alignment;
  /*
   * The buffer will be allocated dynamicaly for alignement
   */
  void *buffer;

  int fd;

  /*
   * Monitoring variable
   */
  unsigned long long sample = 0;
  struct timespec start_time;
  struct timespec stop_time;
  unsigned long long diff_cur;
  unsigned long long diff_min;
  unsigned long long diff_max;
  unsigned long long diff_moy;
  struct timespec ts;
  unsigned long long period_ns = 100000000;

  /* handle ctrl-c */
  struct sigaction sigact;
  sigact.sa_handler= sig_handler;
  sigact.sa_flags = SA_RESETHAND;
  sigaction(SIGINT, &sigact, NULL);

  /* open */
  fd = open("test.log", O_WRONLY | O_CREAT | O_TRUNC | O_SYNC | O_DIRECT, 0644);
  if(fd < 0)
    {
      perror("open");
      return EXIT_FAILURE;
    }
 
  /* compute alignement constraints for direct io */
  buffer_alignment = pathconf("test.log", _PC_REC_XFER_ALIGN);
  if(buffer_alignment < 0)
    {
      perror("pathconf");
      return EXIT_FAILURE;
    }

  /* alloc aligned buffer */
  if(posix_memalign((void **)&buffer, buffer_alignment, buffer_size))
    {
      perror("posix_memalign");
      return EXIT_FAILURE;
    }
     
  /* for pp monitoring */
  ioperm(PP_DATA, 1, 1);

  clock_gettime(CLOCK_MONOTONIC, &ts);

  while(flag)
    {
      int nb_write;

      clock_gettime(CLOCK_MONOTONIC, &start_time);

      outb((inb(PP_DATA) | (0x0001)), PP_DATA);

      nb_write = write(fd, buffer, buffer_size);
 
      outb((inb(PP_DATA) & ~(0x0001)), PP_DATA);
 
      clock_gettime(CLOCK_MONOTONIC, &stop_time);

      /* error handling */
      if(nb_write != buffer_size)
        {
          perror("write");
          return EXIT_FAILURE;
        }

      /* compute stats */
      if(stop_time.tv_nsec < start_time.tv_nsec)
        {
          stop_time.tv_sec--;
          stop_time.tv_nsec+=1000000000;
        } 
    
      diff_cur = (stop_time.tv_sec - start_time.tv_sec) * 1000000000ULL + (stop_time.tv_nsec - start_time.tv_nsec);

      if(sample == 0)
        {
          diff_min = diff_cur;
          diff_max = diff_cur;
          diff_moy = diff_cur;
        }
      else
        {
          if(diff_cur < diff_min)
            {
              diff_min = diff_cur;
            }
          if(diff_max < diff_cur)
            {
              diff_max = diff_cur;
            }
          if(diff_cur < diff_moy)
            {
              diff_moy = diff_moy - (diff_moy - diff_cur) / sample;
            }
          else
            {
              diff_moy = diff_moy + (diff_cur - diff_moy) / sample;
            }
        }
      sample++;

      /* print suspect write */
      if(20000000 < diff_cur)
        {
          struct stat buf;
          fstat(fd, &buf);
          fprintf(stderr, "%llu %llu\n", buf.st_size, diff_cur);
        }

      /* sleep */
      ts.tv_nsec += period_ns;
      while(ts.tv_nsec >= 1000000000)
        {
          ts.tv_nsec -= 1000000000;
          ts.tv_sec++;
        }

      clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, NULL);
    }

  close(fd);

  fprintf(stderr, "diff min : %llu\n", diff_min);
  fprintf(stderr, "diff moy : %llu\n", diff_moy);
  fprintf(stderr, "diff max : %llu\n", diff_max);
  fprintf(stderr, "%llu iterations\n", sample);

  return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 8+ messages in thread
* What represent 646345728 bytes
@ 2010-02-01 14:08 paul.chavent
  2010-02-01 15:06 ` Eric Sandeen
  2010-02-01 17:20 ` Aneesh Kumar K. V
  0 siblings, 2 replies; 8+ messages in thread
From: paul.chavent @ 2010-02-01 14:08 UTC (permalink / raw)
  To: linux-ext4

Hi

I'am writing an application that write a stream of pictures of fixed size on a disk.

My app run on a self integrated gnu/linux (based on a 2.6.31.6-rt19 kernel).

My media is formated with

# mke2fs -t ext4 -L DATA -O large_file,^has_journal,extent -v /dev/sda3
[...]

And it is mounted with 

# mount -t ext4 /dev/sda3 /var/data/
EXT4-fs (sda3): no journal
EXT4-fs (sda3): delayed allocation enabled
EXT4-fs: file extents enabled
EXT4-fs: mballoc enabled
EXT4-fs (sda3): mounted filesystem without journal

My app opens the file with "O_WRONLY | O_CREAT | O_TRUNC | O_SYNC | O_DIRECT" flags.

Each write takes ~4.2ms for 304K (it is very good since it is the write bandwidth of my hard drive). There is a write every 100ms.

But every exactly 646345728 bytes, the write takes ~46ms.

I had the same problem with ext2 but every ~620M (the amount wasn't so constant).

Also i tryed to "posix_fallocate" with (eg 2G), and the first write overhead comes at this limit. I would like to avoid to preallocate.

I suppose it is a kind of block allocation issue. But i would like to have your opinion :
 - what is exatcly this amount of bytes ?
 - can i do something for having a "constant" write time from the user space point of view ?
 - is it a "probem" only for me ?

Thank you for your reading.

Paul.




--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-02-01 23:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-01 17:06 What represent 646345728 bytes paul.chavent
2010-02-01 20:07 ` Eric Sandeen
2010-02-01 22:36   ` Eric Sandeen
2010-02-01 23:01     ` Andreas Dilger
  -- strict thread matches above, loose matches on Subject: below --
2010-02-01 14:08 paul.chavent
2010-02-01 15:06 ` Eric Sandeen
2010-02-01 17:20 ` Aneesh Kumar K. V
2010-02-01 17:34   ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).