* read() builtin doesn't read integer value /proc files (but bash's does) @ 2010-09-01 8:10 Steve Schnepp 2010-09-02 15:02 ` Steve Schnepp 2010-09-02 19:09 ` read() builtin doesn't read integer value /proc files (but bash's does) Jilles Tjoelker 0 siblings, 2 replies; 12+ messages in thread From: Steve Schnepp @ 2010-09-01 8:10 UTC (permalink / raw) To: dash; +Cc: 595063 Hi, I opened bug 595063 on the debian BTS [1] and I was suggested to resend the email upstream. So I copied the body of the bug below : dash's read() builtin seems to read the underlying file 1 char at a time. This doesn't work with some files under /proc, since procfs isn't fully POSIX compliant. With bash it works : $ bash -c 'read MAX < /proc/sys/kernel/pid_max; echo $MAX' 32768 With dash it only reads the first character : $ dash -c 'read MAX < /proc/sys/kernel/pid_max; echo $MAX' 3 If we use the cat(1) external program it works : $ dash -c 'MAX=$(cat /proc/sys/kernel/pid_max); echo $MAX' 32768 After a little digging, it only appears on files that contains just an integer value. When asked to read with a non-null offset (*ppos != 0), __do_proc_dointvec() just returns 0 (meaning an EOF) as shown on [2]. I'm aware that the issue isn't strictly a dash one, since it has the right to read one character at a time. But since fixing procfs to be conforming to POSIX isn't a realistic option, would it be possible to have a workaround that doesn't involve an external tool like cat(1) ? [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=595063 [2] http://lxr.linux.no/#linux+v2.6.32/kernel/sysctl.c#L2371 -- Steve Schnepp http://blog.pwkf.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesn't read integer value /proc files (but bash's does) 2010-09-01 8:10 read() builtin doesn't read integer value /proc files (but bash's does) Steve Schnepp @ 2010-09-02 15:02 ` Steve Schnepp 2010-09-03 21:25 ` Jilles Tjoelker 2010-09-02 19:09 ` read() builtin doesn't read integer value /proc files (but bash's does) Jilles Tjoelker 1 sibling, 1 reply; 12+ messages in thread From: Steve Schnepp @ 2010-09-02 15:02 UTC (permalink / raw) To: dash; +Cc: 595063 [-- Attachment #1: Type: text/plain, Size: 424 bytes --] 2010/9/1 Steve Schnepp <steve.schnepp@gmail.com>: > conforming to POSIX isn't a realistic option, would it be possible to > have a workaround that doesn't involve an external tool like cat(1) ? Hi, I just hacked & attached a little patch away to be able to solve this case. Feel free to reply with your comments. NB: I just targeted dash-0.5.5.1, but it might apply to any version. -- Steve Schnepp http://blog.pwkf.org/ [-- Attachment #2: chunked_bltin_read.diff --] [-- Type: application/octet-stream, Size: 1476 bytes --] --- dash-0.5.5.1/src/miscbltin.c.orig 2009-01-14 00:37:13.000000000 +0100 +++ dash-0.5.5.1/src/miscbltin.c 2010-09-02 16:54:29.330830859 +0200 @@ -58,6 +58,34 @@ #undef rflag +#define CHUNK_READ_SIZE 32 + +/* + * Reads from fd 0, with a CHUNK_READ_SIZE, + * but emitting one char at a time + */ +static int _read_bufferred(char* c) { + static char buffer[CHUNK_READ_SIZE]; + static int buffer_offset = 0; + static int buffer_len = 0; + + if (buffer_len == 0) { + // No caracter left, resetting buffer & read some more + buffer_offset = 0; + buffer_len = read(0, buffer, CHUNK_READ_SIZE); + + if (buffer_len == 0) { + // Nothing to read anymore + return 0; + } + } + + // Still some character left + *c = buffer[buffer_offset++]; + buffer_len--; + + return 1; +} /* @@ -65,6 +93,7 @@ * following character. * * This uses unbuffered input, which may be avoidable in some cases. + * XXX - Uses _read_bufferred() that chunks read(), but emits one char at a time */ int @@ -104,7 +133,7 @@ backslash = 0; STARTSTACKSTR(p); for (;;) { - if (read(0, &c, 1) != 1) { + if (_read_bufferred(&c) != 1) { status = 1; break; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesn't read integer value /proc files (but bash's does) 2010-09-02 15:02 ` Steve Schnepp @ 2010-09-03 21:25 ` Jilles Tjoelker 2010-09-04 18:20 ` Steve Schnepp 0 siblings, 1 reply; 12+ messages in thread From: Jilles Tjoelker @ 2010-09-03 21:25 UTC (permalink / raw) To: Steve Schnepp; +Cc: dash, 595063 On Thu, Sep 02, 2010 at 05:02:55PM +0200, Steve Schnepp wrote: > 2010/9/1 Steve Schnepp <steve.schnepp@gmail.com>: > > conforming to POSIX isn't a realistic option, would it be possible to > > have a workaround that doesn't involve an external tool like cat(1) ? > Hi, I just hacked & attached a little patch away to be able to solve > this case. > Feel free to reply with your comments. > NB: I just targeted dash-0.5.5.1, but it might apply to any version. This patch assumes that the file descriptor is discarded afterwards (its position does not matter). Therefore the very common construct while read x; do ... done stops working. A possible fix is to check first if the input supports seeking. If it does, use the buffering and at the end of the line seek backwards for the number of bytes remaining in the buffer. If it does not, read one byte at a time. -- Jilles Tjoelker ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesn't read integer value /proc files (but bash's does) 2010-09-03 21:25 ` Jilles Tjoelker @ 2010-09-04 18:20 ` Steve Schnepp 2010-09-04 19:35 ` Jilles Tjoelker 0 siblings, 1 reply; 12+ messages in thread From: Steve Schnepp @ 2010-09-04 18:20 UTC (permalink / raw) To: Jilles Tjoelker; +Cc: dash, 595063 [-- Attachment #1: Type: text/plain, Size: 1020 bytes --] 2010/9/3 Jilles Tjoelker <jilles@stack.nl>: > This patch assumes that the file descriptor is discarded afterwards (its > position does not matter). Therefore the very common construct > while read x; do > ... > done > stops working. Ohh.. thanks for that, I didn't see it. Actually "while read x" continues to work. But "reopening the file" doesn't as in : read a b < datafile echo ${a} ${b} read a b < datafile echo ${a} ${b} I attached an updated patch that corrects this pb by discarding the buffer when opening a new file. I also put everything in new files (bufreadcmd.c & .h), in order to ease its understanding. -- Steve Schnepp http://blog.pwkf.org/ > > A possible fix is to check first if the input supports seeking. If it > does, use the buffering and at the end of the line seek backwards for > the number of bytes remaining in the buffer. If it does not, read one > byte at a time. > > -- > Jilles Tjoelker > -- Steve Schnepp http://blog.pwkf.org/ [-- Attachment #2: dash_readbuf.diff --] [-- Type: text/x-patch, Size: 6637 bytes --] Common subdirectories: dash-0.5.4/src/bltin and dash-0.5.4-patched/src/bltin diff -puN dash-0.5.4/src/bufreadcmd.c dash-0.5.4-patched/src/bufreadcmd.c --- dash-0.5.4/src/bufreadcmd.c 1970-01-01 01:00:00.000000000 +0100 +++ dash-0.5.4-patched/src/bufreadcmd.c 2010-09-04 12:31:46.000000000 +0200 @@ -0,0 +1,59 @@ +/* + * Offers a buffered read builtin + */ + +#include <unistd.h> +#include <stdlib.h> + +#include "bufreadcmd.h" + +#ifdef BUF_READ_BUILTIN_DISABLED +int dup2_wrapper(int old, int new) { + return dup2(old, new); +} +int read_stdin_bufferred(char *c) { + return read(0, buffer, 1); +} +#else // BUF_READ_BUILTIN_DISABLED + +/* + * Reads from fd 0, with a CHUNK_READ_SIZE, + * but emitting one char at a time + */ +#define CHUNK_READ_SIZE 32 +static char buffer[CHUNK_READ_SIZE]; +static int buffer_offset = 0; +static int buffer_len = 0; + +int read_stdin_bufferred(char* c) { + if (buffer_len == 0) { + // No caracter left, resetting buffer & read some more + buffer_offset = 0; + buffer_len = read(0, buffer, CHUNK_READ_SIZE); + + if (buffer_len == 0) { + // Nothing to read anymore + return 0; + } + } + + // Still some character left + *c = buffer[buffer_offset++]; + buffer_len--; + + return 1; +} + +static void _flush_readcmd(int fd) { + if (fd == 0) { + // Flush the buffer, discarding its content + buffer_len = 0; + } +} + +/* Intercept dup2() calls */ +int dup2_wrapper(int old, int new) { + _flush_readcmd(new); + return dup2(old, new); +} +#endif // BUF_READ_BUILTIN_DISABLED diff -puN dash-0.5.4/src/bufreadcmd.h dash-0.5.4-patched/src/bufreadcmd.h --- dash-0.5.4/src/bufreadcmd.h 1970-01-01 01:00:00.000000000 +0100 +++ dash-0.5.4-patched/src/bufreadcmd.h 2010-09-04 12:23:53.000000000 +0200 @@ -0,0 +1,3 @@ +/* Used for flushing the readcmd read() buffer */ +int dup2_wrapper(int to, int from); +int read_stdin_bufferred(char *c); Common subdirectories: dash-0.5.4/src/.deps and dash-0.5.4-patched/src/.deps diff -puN dash-0.5.4/src/eval.c dash-0.5.4-patched/src/eval.c --- dash-0.5.4/src/eval.c 2007-07-13 10:26:42.000000000 +0200 +++ dash-0.5.4-patched/src/eval.c 2010-09-04 12:24:55.000000000 +0200 @@ -64,6 +64,8 @@ #include "myhistedit.h" #endif +#include "bufreadcmd.h" + /* flags in argument to evaltree */ #define EV_EXIT 01 /* exit after evaluating tree */ @@ -543,11 +545,12 @@ evalpipe(union node *n, int flags) close(pip[0]); } if (prevfd > 0) { - dup2(prevfd, 0); + dup2_wrapper(prevfd, 0); + close(prevfd); } if (pip[1] > 1) { - dup2(pip[1], 1); + dup2_wrapper(pip[1], 1); close(pip[1]); } evaltreenr(lp->n, flags); @@ -625,7 +628,7 @@ evalbackcmd(union node *n, struct backcm FORCEINTON; close(pip[0]); if (pip[1] != 1) { - dup2(pip[1], 1); + dup2_wrapper(pip[1], 1); close(pip[1]); } eflag = 0; diff -puN dash-0.5.4/src/Makefile dash-0.5.4-patched/src/Makefile --- dash-0.5.4/src/Makefile 2010-09-04 13:06:05.000000000 +0200 +++ dash-0.5.4-patched/src/Makefile 2010-09-04 13:08:26.000000000 +0200 @@ -57,7 +57,7 @@ am__objects_1 = alias.$(OBJEXT) arith_yy miscbltin.$(OBJEXT) mystring.$(OBJEXT) options.$(OBJEXT) \ parser.$(OBJEXT) redir.$(OBJEXT) show.$(OBJEXT) trap.$(OBJEXT) \ output.$(OBJEXT) printf.$(OBJEXT) system.$(OBJEXT) \ - test.$(OBJEXT) times.$(OBJEXT) var.$(OBJEXT) + test.$(OBJEXT) times.$(OBJEXT) var.$(OBJEXT) bufreadcmd.$(OBJEXT) am_dash_OBJECTS = $(am__objects_1) arith.$(OBJEXT) dash_OBJECTS = $(am_dash_OBJECTS) dash_DEPENDENCIES = builtins.o init.o nodes.o signames.o syntax.o @@ -169,14 +169,14 @@ dash_CFILES = \ alias.c arith_yylex.c cd.c error.c eval.c exec.c expand.c \ histedit.c input.c jobs.c mail.c main.c memalloc.c miscbltin.c \ mystring.c options.c parser.c redir.c show.c trap.c output.c \ - bltin/printf.c system.c bltin/test.c bltin/times.c var.c + bltin/printf.c system.c bltin/test.c bltin/times.c var.c bufreadcmd.c dash_SOURCES = \ $(dash_CFILES) arith.y \ alias.h bltin/bltin.h cd.h error.h eval.h exec.h expand.h hetio.h \ init.h input.h jobs.h machdep.h mail.h main.h memalloc.h miscbltin.h \ myhistedit.h mystring.h options.h output.h parser.h redir.h shell.h \ - show.h system.h trap.h var.h + show.h system.h trap.h var.h bufreadcmd.h dash_LDADD = builtins.o init.o nodes.o signames.o syntax.o HELPERS = mkinit mksyntax mknodes mksignames diff -puN dash-0.5.4/src/miscbltin.c dash-0.5.4-patched/src/miscbltin.c --- dash-0.5.4/src/miscbltin.c 2007-07-13 10:26:43.000000000 +0200 +++ dash-0.5.4-patched/src/miscbltin.c 2010-09-04 12:23:21.000000000 +0200 @@ -55,16 +55,17 @@ #include "miscbltin.h" #include "mystring.h" #include "main.h" +#include "bufreadcmd.h" #undef rflag - /* * The read builtin. The -e option causes backslashes to escape the * following character. * * This uses unbuffered input, which may be avoidable in some cases. + * XXX - Uses _read_bufferred() that chunks read(), but emits one char at a time */ int @@ -104,7 +105,7 @@ readcmd(int argc, char **argv) backslash = 0; STARTSTACKSTR(p); for (;;) { - if (read(0, &c, 1) != 1) { + if (read_stdin_bufferred(&c) != 1) { status = 1; break; } diff -puN dash-0.5.4/src/miscbltin.h dash-0.5.4-patched/src/miscbltin.h --- dash-0.5.4/src/miscbltin.h 2007-07-13 10:26:43.000000000 +0200 +++ dash-0.5.4-patched/src/miscbltin.h 2010-09-04 10:46:02.000000000 +0200 @@ -29,3 +29,6 @@ int readcmd(int, char **); int umaskcmd(int, char **); int ulimitcmd(int, char **); + +/* Used for flushing the readcmd read() buffer */ +void flush_readcmd(int fd); diff -puN dash-0.5.4/src/redir.c dash-0.5.4-patched/src/redir.c --- dash-0.5.4/src/redir.c 2007-07-13 10:26:43.000000000 +0200 +++ dash-0.5.4-patched/src/redir.c 2010-09-04 12:25:49.000000000 +0200 @@ -56,6 +56,8 @@ #include "memalloc.h" #include "error.h" +#include "bufreadcmd.h" + #define REALLY_CLOSED -3 /* fd that was closed and still is */ #define EMPTY -2 /* marks an unused slot in redirtab */ @@ -265,14 +267,14 @@ dupredirect(redir, f) memory[fd] = 1; else #endif - if (dup2(f, fd) < 0) { + if (dup2_wrapper(f, fd) < 0) { err = errno; goto err; } return; } f = fd; - } else if (dup2(f, fd) < 0) + } else if (dup2_wrapper(f, fd) < 0) err = errno; close(f); @@ -354,7 +356,7 @@ popredir(int drop) break; default: if (!drop) - dup2(rp->renamed[i], i); + dup2_wrapper(rp->renamed[i], i); close(rp->renamed[i]); break; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesn't read integer value /proc files (but bash's does) 2010-09-04 18:20 ` Steve Schnepp @ 2010-09-04 19:35 ` Jilles Tjoelker 2010-11-28 8:42 ` Herbert Xu 0 siblings, 1 reply; 12+ messages in thread From: Jilles Tjoelker @ 2010-09-04 19:35 UTC (permalink / raw) To: Steve Schnepp; +Cc: dash, 595063 On Sat, Sep 04, 2010 at 08:20:33PM +0200, Steve Schnepp wrote: > 2010/9/3 Jilles Tjoelker <jilles@stack.nl>: > > This patch assumes that the file descriptor is discarded afterwards (its > > position does not matter). Therefore the very common construct > > while read x; do > > ... > > done > > stops working. > Ohh.. thanks for that, I didn't see it. > Actually "while read x" continues to work. > But "reopening the file" doesn't as in : > read a b < datafile > echo ${a} ${b} > read a b < datafile > echo ${a} ${b} You're right, it's even stranger than I expected. > I attached an updated patch that corrects this pb by discarding the > buffer when opening a new file. This discarding is still bad as it throws away valid data if the open file description is shared. This happens if stdin is redirected inside a while read... loop. Furthermore, I think constructions like read x; cat and read x; (read y); read z should keep working. This requires that the input's file position be synced whenever another process may see it (fork/exit). Due to the highly dynamic character of the shell and the common use of fd 0, this probably means that you can't do better than syncing after each read builtin. (For example, 'read' could be overridden with a function after the third line.) Another thought: exec 3<&0; read x; read y <&3 or even sh -c 'read x; read y <&3' 3<&0 Different file descriptors may refer to the same open file description and the shell may not know this. -- Jilles Tjoelker ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesn't read integer value /proc files (but bash's does) 2010-09-04 19:35 ` Jilles Tjoelker @ 2010-11-28 8:42 ` Herbert Xu 2010-12-15 9:49 ` read() builtin doesnt read integer value /proc files (but bashs does) Cristian Ionescu-Idbohrn 0 siblings, 1 reply; 12+ messages in thread From: Herbert Xu @ 2010-11-28 8:42 UTC (permalink / raw) To: Jilles Tjoelker; +Cc: Steve Schnepp, dash, 595063 On Sat, Sep 04, 2010 at 07:35:04PM +0000, Jilles Tjoelker wrote: > > > I attached an updated patch that corrects this pb by discarding the > > buffer when opening a new file. > > This discarding is still bad as it throws away valid data if the open > file description is shared. This happens if stdin is redirected inside a I'm with Jilles on this. I also don't particularly feel like bloating dash just because of the borked /proc interface when there is a perfectly adequate work-around in "cat". value=$(cat /proc/file) Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesnt read integer value /proc files (but bashs does) 2010-11-28 8:42 ` Herbert Xu @ 2010-12-15 9:49 ` Cristian Ionescu-Idbohrn 2010-12-15 18:55 ` Jonathan Nieder 0 siblings, 1 reply; 12+ messages in thread From: Cristian Ionescu-Idbohrn @ 2010-12-15 9:49 UTC (permalink / raw) To: Herbert Xu; +Cc: dash, 595063 On Sun, 28 Nov 2010, Herbert Xu wrote: > On Sat, Sep 04, 2010 at 07:35:04PM +0000, Jilles Tjoelker wrote: > > > > > I attached an updated patch that corrects this pb by discarding the > > > buffer when opening a new file. > > > > This discarding is still bad as it throws away valid data if the open > > file description is shared. This happens if stdin is redirected inside a > > I'm with Jilles on this. I also don't particularly feel like > bloating dash just because of the borked /proc interface when > there is a perfectly adequate work-around in "cat". > > value=$(cat /proc/file) I wouldn't call that "a perfectly adequate work-around", but a painful and unadequate work-around. And this example will hopefully show why: $ dash -c 'loops=10000; while [ $loops -gt 0 ];do read MAX </proc/sys/kernel/pid_max; loops=$(($loops - 1)); done; times' 0m0.180000s 0m0.100000s 0m0.000000s 0m0.000000s total: 0.28s $ dash -c 'loops=10000; while [ $loops -gt 0 ];do MAX=$(cat /proc/sys/kernel/pid_max); loops=$(($loops - 1)); done; times' 0m0.280000s 0m1.330000s 0m3.840000s 0m1.560000s total: 7.01s That is, the first example is 24x more efficient than the second. And that realy _matters_, I would say. Cheers, -- Cristian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesnt read integer value /proc files (but bashs does) 2010-12-15 9:49 ` read() builtin doesnt read integer value /proc files (but bashs does) Cristian Ionescu-Idbohrn @ 2010-12-15 18:55 ` Jonathan Nieder 2010-12-15 19:12 ` Cristian Ionescu-Idbohrn 2010-12-18 22:23 ` Jilles Tjoelker 0 siblings, 2 replies; 12+ messages in thread From: Jonathan Nieder @ 2010-12-15 18:55 UTC (permalink / raw) To: Cristian Ionescu-Idbohrn; +Cc: Herbert Xu, dash, 595063 Hi Cristian, Cristian Ionescu-Idbohrn wrote: > On Sun, 28 Nov 2010, Herbert Xu wrote: > > On Sat, Sep 04, 2010 at 07:35:04PM +0000, Jilles Tjoelker wrote: >>> This discarding is still bad as it throws away valid data if the open >>> file description is shared. This happens if stdin is redirected inside a >> >> I'm with Jilles on this. I also don't particularly feel like >> bloating dash just because of the borked /proc interface when >> there is a perfectly adequate work-around in "cat". >> >> value=$(cat /proc/file) > > I wouldn't call that "a perfectly adequate work-around", but a painful and > unadequate work-around. For what it's worth, here's what bash does (based on strace): 1. Determine whether the file is seekable. That is, seek using SEEK_CUR with offset 0. 2. If seekable, read a nice big chunk and then seek back to put the file offset back in the right place. If not seekable, read one byte at a time. This works in /proc because files in /proc are seekable. That said, I don't think borked /proc is a great reason to do this (it's a better reason to fix /proc). Speeding up the read builtin might be a good reason. Regards, Jonathan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesnt read integer value /proc files (but bashs does) 2010-12-15 18:55 ` Jonathan Nieder @ 2010-12-15 19:12 ` Cristian Ionescu-Idbohrn 2010-12-18 22:23 ` Jilles Tjoelker 1 sibling, 0 replies; 12+ messages in thread From: Cristian Ionescu-Idbohrn @ 2010-12-15 19:12 UTC (permalink / raw) To: dash; +Cc: 595063 Jonathan, On Wed, 15 Dec 2010, Jonathan Nieder wrote: > Cristian Ionescu-Idbohrn wrote: > > On Sun, 28 Nov 2010, Herbert Xu wrote: > >> > >> I'm with Jilles on this. I also don't particularly feel like > >> bloating dash just because of the borked /proc interface when > >> there is a perfectly adequate work-around in "cat". > >> > >> value=$(cat /proc/file) > > > > I wouldn't call that "a perfectly adequate work-around", but a painful > > and unadequate work-around. > > This works in /proc because files in /proc are seekable. > > That said, I don't think borked /proc is a great reason to do this > (it's a better reason to fix /proc). Speeding up the read builtin > might be a good reason. Right. So, there are 2 options here. One is to to make dash work like bash on a proc filesystem, the other to "fix" the kernel. How many linux distributions depend on a "working" dash? Which alternative is the more realistic one? What are the ETAs odds? How do we proceed? Cheers, -- Cristian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesnt read integer value /proc files (but bashs does) 2010-12-15 18:55 ` Jonathan Nieder 2010-12-15 19:12 ` Cristian Ionescu-Idbohrn @ 2010-12-18 22:23 ` Jilles Tjoelker 1 sibling, 0 replies; 12+ messages in thread From: Jilles Tjoelker @ 2010-12-18 22:23 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Cristian Ionescu-Idbohrn, Herbert Xu, dash, 595063 On Wed, Dec 15, 2010 at 12:55:51PM -0600, Jonathan Nieder wrote: > Cristian Ionescu-Idbohrn wrote: > > On Sun, 28 Nov 2010, Herbert Xu wrote: > > > On Sat, Sep 04, 2010 at 07:35:04PM +0000, Jilles Tjoelker wrote: > >>> This discarding is still bad as it throws away valid data if the open > >>> file description is shared. This happens if stdin is redirected inside a > >> I'm with Jilles on this. I also don't particularly feel like > >> bloating dash just because of the borked /proc interface when > >> there is a perfectly adequate work-around in "cat". > >> value=$(cat /proc/file) > > I wouldn't call that "a perfectly adequate work-around", but a painful and > > unadequate work-around. > For what it's worth, here's what bash does (based on strace): > 1. Determine whether the file is seekable. That is, seek using > SEEK_CUR with offset 0. > 2. If seekable, read a nice big chunk and then seek back to put the > file offset back in the right place. If not seekable, read one byte > at a time. > This works in /proc because files in /proc are seekable. > That said, I don't think borked /proc is a great reason to do this > (it's a better reason to fix /proc). Speeding up the read builtin > might be a good reason. The optimization is of limited benefit (still way more syscalls than strictly necessary) and does not apply to the common use case of reading from a pipe. Generally, if 'read' is too slow, it is better to spend a fork on a tool like grep, sed or awk which processes large amounts of text much more efficiently. As for value=$(cat /proc/file), there are at least two ways to make this faster. The traditional ksh way is the extension value=$(</proc/file) which is permitted but not required by POSIX. I do not really like this as it makes the scripts not POSIX compliant. In recent mksh I noticed another way: by making cat(1) a builtin under certain circumstances. These circumstances include the absence of options (other than "--") and should probably also exclude foreground commands in interactive job control shells (as builtins cannot be suspended). To avoid needing to implement extensions like FreeBSD cat's ability to read from Unix domain sockets, named files could also be excluded, requiring value=$(cat </proc/file). -- Jilles Tjoelker ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesn't read integer value /proc files (but bash's does) 2010-09-01 8:10 read() builtin doesn't read integer value /proc files (but bash's does) Steve Schnepp 2010-09-02 15:02 ` Steve Schnepp @ 2010-09-02 19:09 ` Jilles Tjoelker 2010-09-03 9:23 ` Steve Schnepp 1 sibling, 1 reply; 12+ messages in thread From: Jilles Tjoelker @ 2010-09-02 19:09 UTC (permalink / raw) To: Steve Schnepp; +Cc: dash, 595063 On Wed, Sep 01, 2010 at 10:10:11AM +0200, Steve Schnepp wrote: > Hi, I opened bug 595063 on the debian BTS [1] and I was suggested to > resend the email upstream. > So I copied the body of the bug below : > dash's read() builtin seems to read the underlying file 1 char at a > time. This doesn't work with some files under /proc, since procfs isn't > fully POSIX compliant. > [snip] > After a little digging, it only appears on files that contains just an > integer value. When asked to read with a non-null offset (*ppos != 0), > __do_proc_dointvec() just returns 0 (meaning an EOF) as shown on [2]. > I'm aware that the issue isn't strictly a dash one, since it has the > right to read one character at a time. But since fixing procfs to be > conforming to POSIX isn't a realistic option, would it be possible to > have a workaround that doesn't involve an external tool like cat(1) ? Given that other files in /proc do work, I don't see why the ones that only contain an integer value cannot be fixed. All the necessary state to produce the second and further bytes is available. Choosing a powerful abstraction like a regular file has its implications. Note that a change in the file between the single-byte reads will cause an inconsistent value to be read. This is also the case with regular files on a filesystem, so it is acceptable. If single-byte reads are really unacceptable, then the proper way to read these files needs to be documented, and clear violations that will not work properly should cause an error (in this case, this means that reading one byte from offset 0 should fail like reading one byte from offset 1 does). -- Jilles Tjoelker ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: read() builtin doesn't read integer value /proc files (but bash's does) 2010-09-02 19:09 ` read() builtin doesn't read integer value /proc files (but bash's does) Jilles Tjoelker @ 2010-09-03 9:23 ` Steve Schnepp 0 siblings, 0 replies; 12+ messages in thread From: Steve Schnepp @ 2010-09-03 9:23 UTC (permalink / raw) To: Jilles Tjoelker; +Cc: dash, 595063 2010/9/2 Jilles Tjoelker <jilles@stack.nl>: Thanks for your prompt reply. > Note that a change in the file between the single-byte reads will cause > an inconsistent value to be read. This is also the case with regular > files on a filesystem, so it is acceptable. Are you implying that: - if the procfs is made to support char per char reads, dash reading an inconsistent value is actually a feature ? - buffering should, therefore, always be explicit ? On a side note, the whole procfs seems to be designed around one unique page read if possible (1x 4K). I think it does so in order to be able to vastly simplify its usage/implementation by kernel modules. > If single-byte reads are really unacceptable, then the proper way to > read these files needs to be documented, and clear violations that will > not work properly should cause an error (in this case, this means that > reading one byte from offset 0 should fail like reading one byte from > offset 1 does). +1 for "the proper way to read these files needs to be documented" and I also think that emitting an error would be better than silently returning erroneous data. [ EOVERFLOW is coming to my mind ] -- Steve Schnepp http://blog.pwkf.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-12-18 22:24 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-09-01 8:10 read() builtin doesn't read integer value /proc files (but bash's does) Steve Schnepp 2010-09-02 15:02 ` Steve Schnepp 2010-09-03 21:25 ` Jilles Tjoelker 2010-09-04 18:20 ` Steve Schnepp 2010-09-04 19:35 ` Jilles Tjoelker 2010-11-28 8:42 ` Herbert Xu 2010-12-15 9:49 ` read() builtin doesnt read integer value /proc files (but bashs does) Cristian Ionescu-Idbohrn 2010-12-15 18:55 ` Jonathan Nieder 2010-12-15 19:12 ` Cristian Ionescu-Idbohrn 2010-12-18 22:23 ` Jilles Tjoelker 2010-09-02 19:09 ` read() builtin doesn't read integer value /proc files (but bash's does) Jilles Tjoelker 2010-09-03 9:23 ` Steve Schnepp
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox