* [Fwd: Re: Implementing a file counter (like "ls | wc")]
@ 2004-04-07 16:50 Luciano Moreira - igLnx
2004-04-07 16:54 ` John T. Williams
0 siblings, 1 reply; 7+ messages in thread
From: Luciano Moreira - igLnx @ 2004-04-07 16:50 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
Does exist another way to detect a directory without stat() ?
Luciano
Holger Kiehl wrote:
>On Wed, 7 Apr 2004, Luciano Moreira - igLnx wrote:
>
>
>
>>-------------- THE CODE HAVE THIS STRUCTURE:
>>while ((pFile=readdir(pDir))!=NULL) {
>> sprintf(szBuf, "%s/%s", pPath, pFile->d_name);
>> stat(szBuf, &statFile);
>> if (S_ISDIR(statFile.st_mode)) /// LOOK THAT: We don't use recursive
>>searching, we count only files at current directory excluding others
>>directories.
>> continue;
>>
>> /* Filtering */
>> if (nNeedFilter) {
>> //// I DONT HAVE THE CODE OF FILTERING NOW
>> //// BUT I CAN SEND IT LATER IF NECESSARY
>> }
>>
>> }
>>-------------- CODE FINISH HERE
>>
>>
>>
>Don't use sprintf(), its very expansive. Before the while loop put a pointer
>after path and / and then strcpy(ptr, pFile->d_name)
>
>stat() is _very_ epansive! It means physical IO and fills up a structure
>with things you just don't need. If you really do need to filter out
>directories from your result do the stat after it has passed the filter.
>
>
>
>>Does have another mechanism to filter without using of strcmp() / memcmp() ?
>>How if we dont know the size of extension (.c, .cpp, .teste, .longextension,
>>and so on). ?
>>
>>
>>
>Compare them yourself with a pointer byte for byte. But the speed gain
>will not be so high as when you leave away the stat() call.
>
>Holger
>
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Fwd: Re: Implementing a file counter (like "ls | wc")]
2004-04-07 16:50 [Fwd: Re: Implementing a file counter (like "ls | wc")] Luciano Moreira - igLnx
@ 2004-04-07 16:54 ` John T. Williams
2004-04-07 22:29 ` Holger Kiehl
0 siblings, 1 reply; 7+ messages in thread
From: John T. Williams @ 2004-04-07 16:54 UTC (permalink / raw)
To: Luciano Moreira - igLnx; +Cc: linux-c-programming
You should read the GNU source code for ls, they actually filter out
directories in a quite interesting maner.
instead of using stat which is quite expensive, they try to open each
file as a directory, which is expensive, but not as expensive as stat.
If a file opens as a directory, then it is, and they treat it as one,
(in your case closing it and ignoring it); else if it fails to open as a
directory, they then use stat to get information about it to print;
a simple counter could look something like (my code is paraphrasing
parts of the GNU code, but is not a copy) Also treat it as Sudo code
that just looks a lot like C as it is untested (not even compiled) and
is only trying to make the point.
___________________________________________________
int count = 0;
struct dirent dir_entry;
DIR* directory, test;
char buff[512];
directory = opendir('/etc');
/* check that it opened*/
...
while( dir_entry = readdir(directory) ) {
strncpy(buff, '/etc/', 6);
strcat(buff, dir_entry->d_name);
if( test = opendir( buff ) ) {
/* gets here the its a directory */
closedir(test);
} else {
/* gets here not a directory */
/* code to test if its a regular file */
count++;
}
}
closedir(directory);
On Wed, 2004-04-07 at 12:50, Luciano Moreira - igLnx wrote:
> Does exist another way to detect a directory without stat() ?
>
> Luciano
>
>
> Holger Kiehl wrote:
>
> >On Wed, 7 Apr 2004, Luciano Moreira - igLnx wrote:
> >
> >
> >
> >>-------------- THE CODE HAVE THIS STRUCTURE:
> >>while ((pFile=readdir(pDir))!=NULL) {
> >> sprintf(szBuf, "%s/%s", pPath, pFile->d_name);
> >> stat(szBuf, &statFile);
> >> if (S_ISDIR(statFile.st_mode)) /// LOOK THAT: We don't use recursive
> >>searching, we count only files at current directory excluding others
> >>directories.
> >> continue;
> >>
> >> /* Filtering */
> >> if (nNeedFilter) {
> >> //// I DONT HAVE THE CODE OF FILTERING NOW
> >> //// BUT I CAN SEND IT LATER IF NECESSARY
> >> }
> >>
> >> }
> >>-------------- CODE FINISH HERE
> >>
> >>
> >>
> >Don't use sprintf(), its very expansive. Before the while loop put a pointer
> >after path and / and then strcpy(ptr, pFile->d_name)
> >
> >stat() is _very_ epansive! It means physical IO and fills up a structure
> >with things you just don't need. If you really do need to filter out
> >directories from your result do the stat after it has passed the filter.
> >
> >
> >
> >>Does have another mechanism to filter without using of strcmp() / memcmp() ?
> >>How if we dont know the size of extension (.c, .cpp, .teste, .longextension,
> >>and so on). ?
> >>
> >>
> >>
> >Compare them yourself with a pointer byte for byte. But the speed gain
> >will not be so high as when you leave away the stat() call.
> >
> >Holger
> >
> >
> >
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Fwd: Re: Implementing a file counter (like "ls | wc")]
2004-04-07 16:54 ` John T. Williams
@ 2004-04-07 22:29 ` Holger Kiehl
2004-04-08 0:06 ` A. Murat Eren
0 siblings, 1 reply; 7+ messages in thread
From: Holger Kiehl @ 2004-04-07 22:29 UTC (permalink / raw)
To: John T. Williams; +Cc: Luciano Moreira - igLnx, linux-c-programming
On Wed, 7 Apr 2004, John T. Williams wrote:
> You should read the GNU source code for ls, they actually filter out
> directories in a quite interesting maner.
>
> instead of using stat which is quite expensive, they try to open each
> file as a directory, which is expensive, but not as expensive as stat.
> If a file opens as a directory, then it is, and they treat it as one,
> (in your case closing it and ignoring it); else if it fails to open as a
> directory, they then use stat to get information about it to print;
>
Indeed this is a good idea, but even then I would only go through this
code path after the filtering. Any system call is always more expensive
then the filtering.
> a simple counter could look something like (my code is paraphrasing
> parts of the GNU code, but is not a copy) Also treat it as Sudo code
> that just looks a lot like C as it is untested (not even compiled) and
> is only trying to make the point.
>
Another minor improvement would be as stated earlier to put the strncpy()
outside the while loop:
> ___________________________________________________
> int count = 0;
> struct dirent dir_entry;
> DIR* directory, test;
> char buff[512];
char *ptr
>
>
> directory = opendir('/etc');
> /* check that it opened*/
> ...
>
strcpy(buff, '/etc/');
ptr = buff + 5;
> while( dir_entry = readdir(directory) ) {
if (filter fits)
{
strcpy(ptr, dir_entry->d_name);
> if( test = opendir( buff ) ) {
> /* gets here the its a directory */
> closedir(test);
> } else {
> /* gets here not a directory */
> /* code to test if its a regular file */
> count++;
> }
}
> }
>
> closedir(directory);
>
If you have a directory with many files this might make a minor difference.
Also note that we now have a strcpy() and not a strcat() in the loop.
A strcat() will always have to find the end of the string first before
it can do its job.
But most important always try to avoid system calls!
Holger
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Fwd: Re: Implementing a file counter (like "ls | wc")]
2004-04-07 22:29 ` Holger Kiehl
@ 2004-04-08 0:06 ` A. Murat Eren
2004-04-08 1:01 ` John T. Williams
2004-04-08 4:39 ` Glynn Clements
0 siblings, 2 replies; 7+ messages in thread
From: A. Murat Eren @ 2004-04-08 0:06 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
Hi,
I've read all of the mailing thread about this subject and i'd like to ask
some questions..
> But most important always try to avoid system calls!
I guess eveybody is accepting that the syscalls is slowing down the process
when invoking them from the critical parts of the code (i hope i understand
properly).. I'm trying to write a basic shell for me (just for fun, nothing
serious), i'm setting up my user's shell in passwd file to my basic shell's
compiled binary, and when i'm logging in from the console, normally i'm
falling to a prompt which is written by me.
I'm not using the well-known programs such as 'ls', 'cd' etc.. i have my own
envoriment variables and my own basic programs those are invoking from my
prompt with fork and execvp. If i do not want to use the syscalls such as
readdir, opendir what i must do?
With readdir, i can read everything from the every type of file system which
supported by kernel *transparently*, for example when i'm working on a disk
partition which is formatted with ext3 file system i don't have to try to
read the superblocks and inode's adresses from disk because VFS takes care
all of the low level operations via readdir (and kernel's modules for that
file system) for me, isn't it? If i want to do all of them by my self (sure,
this is not necessary but i just want to try and learn) which is the right
way i have to go? I thought that it would be a very simple and absolute way
looking into the glibc for implementations of the functions like readdir, but
nearly i'm sure that it would be impossible to undrstand (and this is another
problem, i can't understand the source code usually :( it would be great to
hear your advices about how i improve my vision).
Thanks for your time from now,
Regards.
--
- -- -- -- -- -- -- -- -- -- -- -- -- -- -
A. Murat Eren
meren@comu.edu.tr, evreniz@core.gen.tr
http://zion.comu.edu.tr/~evreniz/
- -- -- -- -- -- -- -- -- -- -- -- -- -- -
--
free software is a matter of liberty,
not price. to understand the concept,
you should think of "free speech",
not "free beer".
-
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Fwd: Re: Implementing a file counter (like "ls | wc")]
2004-04-08 0:06 ` A. Murat Eren
@ 2004-04-08 1:01 ` John T. Williams
2004-04-08 4:39 ` Glynn Clements
1 sibling, 0 replies; 7+ messages in thread
From: John T. Williams @ 2004-04-08 1:01 UTC (permalink / raw)
To: jtwilliams; +Cc: linux-c-programming
[-- Attachment #1: Type: text/plain, Size: 2061 bytes --]
I just want to say after much testing, I've gone and proved myself
wrong.
Anyone interested in playing with the concept I've included the code.
On Thu, 2004-04-08 at 22:00, A. Murat Eren wrote:
> Hi,
>
> I've read all of the mailing thread about this subject and i'd like to ask
> some questions..
>
> > But most important always try to avoid system calls!
>
> I guess eveybody is accepting that the syscalls is slowing down the process
> when invoking them from the critical parts of the code (i hope i understand
> properly).. I'm trying to write a basic shell for me (just for fun, nothing
> serious), i'm setting up my user's shell in passwd file to my basic shell's
> compiled binary, and when i'm logging in from the console, normally i'm
> falling to a prompt which is written by me.
>
> I'm not using the well-known programs such as 'ls', 'cd' etc.. i have my own
> envoriment variables and my own basic programs those are invoking from my
> prompt with fork and execvp. If i do not want to use the syscalls such as
> readdir, opendir what i must do?
>
> With readdir, i can read everything from the every type of file system which
> supported by kernel *transparently*, for example when i'm working on a disk
> partition which is formatted with ext3 file system i don't have to try to
> read the superblocks and inode's adresses from disk because VFS takes care
> all of the low level operations via readdir (and kernel's modules for that
> file system) for me, isn't it? If i want to do all of them by my self (sure,
> this is not necessary but i just want to try and learn) which is the right
> way i have to go? I thought that it would be a very simple and absolute way
> looking into the glibc for implementations of the functions like readdir, but
> nearly i'm sure that it would be impossible to undrstand (and this is another
> problem, i can't understand the source code usually :( it would be great to
> hear your advices about how i improve my vision).
>
>
> Thanks for your time from now,
> Regards.
[-- Attachment #2: ls.tgz --]
[-- Type: application/x-gzip, Size: 908 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Fwd: Re: Implementing a file counter (like "ls | wc")]
2004-04-08 0:06 ` A. Murat Eren
2004-04-08 1:01 ` John T. Williams
@ 2004-04-08 4:39 ` Glynn Clements
2004-04-08 8:05 ` A. Murat Eren
1 sibling, 1 reply; 7+ messages in thread
From: Glynn Clements @ 2004-04-08 4:39 UTC (permalink / raw)
To: A. Murat Eren; +Cc: Holger Kiehl, linux-c-programming
A. Murat Eren wrote:
> I've read all of the mailing thread about this subject and i'd like to ask
> some questions..
>
> > But most important always try to avoid system calls!
>
> I guess eveybody is accepting that the syscalls is slowing down the process
> when invoking them from the critical parts of the code (i hope i understand
> properly).. I'm trying to write a basic shell for me (just for fun, nothing
> serious), i'm setting up my user's shell in passwd file to my basic shell's
> compiled binary, and when i'm logging in from the console, normally i'm
> falling to a prompt which is written by me.
>
> I'm not using the well-known programs such as 'ls', 'cd' etc.
"cd" isn't a program; it's a built-in shell command. It has to be; a
process can't change the current directory of another process, so a
"cd" program just wouldn't work.
>. i have my own
> envoriment variables and my own basic programs those are invoking from my
> prompt with fork and execvp. If i do not want to use the syscalls such as
> readdir, opendir what i must do?
In most of the situations where you use syscalls, you can't avoid
using them; the point is not to use them excessively.
E.g. if you're reading a file (or any other descriptor), calling
read() a few times and reading a large amount of data each time is
more efficient than calling read() lots of times and reading a small
amount of data each time.
Or, for an "ls" replacement, only call stat() (or lstat()) if you
actually need the additional data (e.g. for "ls -l", "ls -F" etc)
rather than always calling it "just in case".
> With readdir, i can read everything from the every type of file system which
> supported by kernel *transparently*, for example when i'm working on a disk
> partition which is formatted with ext3 file system i don't have to try to
> read the superblocks and inode's adresses from disk because VFS takes care
> all of the low level operations via readdir (and kernel's modules for that
> file system) for me, isn't it? If i want to do all of them by my self (sure,
> this is not necessary but i just want to try and learn) which is the right
> way i have to go? I thought that it would be a very simple and absolute way
> looking into the glibc for implementations of the functions like readdir, but
> nearly i'm sure that it would be impossible to undrstand (and this is another
> problem, i can't understand the source code usually :( it would be great to
> hear your advices about how i improve my vision).
There is seldom any point in bypassing libc. For a syscall, the libc
function will usually consist of the syscall itself, plus setting
errno if the syscall returns an error status, so there wouldn't be
anything to gain from making the syscall directly.
BTW, on Linux, glibc's readdir() function isn't implemented using the
readdir syscall but using the getdents syscall instead. The latter
reads multiple directory entries per call, so is more efficient.
You can't realistically bypass the syscalls themselves; not only would
trying to read files by accessing the device directly require
sufficient privilege, it would also be error-prone, as a user-space
process can't hold kernel locks and can't prevent itself being
preempted at an inconvenient point.
--
Glynn Clements <glynn.clements@virgin.net>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Fwd: Re: Implementing a file counter (like "ls | wc")]
2004-04-08 4:39 ` Glynn Clements
@ 2004-04-08 8:05 ` A. Murat Eren
0 siblings, 0 replies; 7+ messages in thread
From: A. Murat Eren @ 2004-04-08 8:05 UTC (permalink / raw)
To: linux-c-programming
Hi,
> "cd" isn't a program; it's a built-in shell command. It has to be; a
> process can't change the current directory of another process, so a
> "cd" program just wouldn't work.
Sorry about that, yes cd is a built-in function of shell. btw, i'm not using
it to change any program's working dir. i'm filling up my shell's current
working dir environment variable with calling it to able to use commands like
clone of ls without any extra parameter. So for example when i invoke my
listing program to see what is in my current working directory, it already
knows that i'm in /usr/src from the CWD envoriment var.
> In most of the situations where you use syscalls, you can't avoid
> using them; the point is not to use them excessively.
>
> E.g. if you're reading a file (or any other descriptor), calling
> read() a few times and reading a large amount of data each time is
> more efficient than calling read() lots of times and reading a small
> amount of data each time.
>
> Or, for an "ls" replacement, only call stat() (or lstat()) if you
> actually need the additional data (e.g. for "ls -l", "ls -F" etc)
> rather than always calling it "just in case".
Thank you, i've got the point completely.
> There is seldom any point in bypassing libc. For a syscall, the libc
> function will usually consist of the syscall itself, plus setting
> errno if the syscall returns an error status, so there wouldn't be
> anything to gain from making the syscall directly.
Thank you very much for your advices and time :)
Regards.
--
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
A. Murat Eren
meren@comu.edu.tr, evreniz@core.gen.tr
http://zion.comu.edu.tr/~evreniz/
0x88FD9FC7,
910A FCB3 2AAB 4CA5 E4D9 EFFA 6555 A33A 88FD 9FC7
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
--
free software is a matter of liberty,
not price. to understand the concept,
you should think of "free speech",
not "free beer".
-
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-04-08 8:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-07 16:50 [Fwd: Re: Implementing a file counter (like "ls | wc")] Luciano Moreira - igLnx
2004-04-07 16:54 ` John T. Williams
2004-04-07 22:29 ` Holger Kiehl
2004-04-08 0:06 ` A. Murat Eren
2004-04-08 1:01 ` John T. Williams
2004-04-08 4:39 ` Glynn Clements
2004-04-08 8:05 ` A. Murat Eren
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).