* [PATCH 1/6] fs: fix proc_handler for sysctl_nr_open
2024-11-23 18:08 [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open Jinliang Zheng
@ 2024-11-23 18:11 ` Jinliang Zheng
2024-11-23 18:11 ` [PATCH 2/6] fs: make files_stat globally visible Jinliang Zheng
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Jinliang Zheng @ 2024-11-23 18:11 UTC (permalink / raw)
To: alexjlzheng
Cc: adobriyan, alexjlzheng, brauner, flyingpeng, jack, joel.granados,
kees, linux-fsdevel, linux-kernel, mcgrof, viro
Use proc_douintvec_minmax() instead of proc_dointvec_minmax() to handle
sysctl_nr_open, because its data type is unsigned int, not int.
Fixes: 9b80a184eaad ("fs/file: more unsigned file descriptors")
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/file_table.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 976736be47cb..502b81f614d9 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -128,7 +128,7 @@ static struct ctl_table fs_stat_sysctls[] = {
.data = &sysctl_nr_open,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_douintvec_minmax,
.extra1 = &sysctl_nr_open_min,
.extra2 = &sysctl_nr_open_max,
},
--
2.41.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 2/6] fs: make files_stat globally visible
2024-11-23 18:08 [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open Jinliang Zheng
2024-11-23 18:11 ` [PATCH 1/6] fs: fix proc_handler for sysctl_nr_open Jinliang Zheng
@ 2024-11-23 18:11 ` Jinliang Zheng
2024-11-23 18:12 ` [PATCH 3/6] sysctl: refactor __do_proc_doulongvec_minmax() Jinliang Zheng
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Jinliang Zheng @ 2024-11-23 18:11 UTC (permalink / raw)
To: alexjlzheng
Cc: adobriyan, alexjlzheng, brauner, flyingpeng, jack, joel.granados,
kees, linux-fsdevel, linux-kernel, mcgrof, viro
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/file_table.c | 2 +-
include/linux/fs.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 502b81f614d9..db3d3a9cb421 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -33,7 +33,7 @@
#include "internal.h"
/* sysctl tunables... */
-static struct files_stat_struct files_stat = {
+struct files_stat_struct files_stat = {
.max_files = NR_FILE
};
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7e29433c5ecc..931076faadde 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -89,6 +89,7 @@ extern void __init files_maxfiles_init(void);
extern unsigned long get_max_files(void);
extern unsigned int sysctl_nr_open;
+extern struct files_stat_struct files_stat;
typedef __kernel_rwf_t rwf_t;
--
2.41.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 3/6] sysctl: refactor __do_proc_doulongvec_minmax()
2024-11-23 18:08 [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open Jinliang Zheng
2024-11-23 18:11 ` [PATCH 1/6] fs: fix proc_handler for sysctl_nr_open Jinliang Zheng
2024-11-23 18:11 ` [PATCH 2/6] fs: make files_stat globally visible Jinliang Zheng
@ 2024-11-23 18:12 ` Jinliang Zheng
2024-11-23 18:12 ` [PATCH 4/6] sysctl: ensure files_stat.max_files is not less than sysctl_nr_open Jinliang Zheng
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Jinliang Zheng @ 2024-11-23 18:12 UTC (permalink / raw)
To: alexjlzheng
Cc: adobriyan, alexjlzheng, brauner, flyingpeng, jack, joel.granados,
kees, linux-fsdevel, linux-kernel, mcgrof, viro
Extract the local variables min and max as parameters in
__do_proc_doulongvec_minmax() to facilitate code reuse in subsequent
patches. There are no functional changes.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
kernel/sysctl.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 79e6cb1d5c48..05b48b204ed4 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1020,9 +1020,10 @@ static int sysrq_sysctl_handler(const struct ctl_table *table, int write,
static int __do_proc_doulongvec_minmax(void *data,
const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos,
- unsigned long convmul, unsigned long convdiv)
+ unsigned long convmul, unsigned long convdiv,
+ unsigned long *min, unsigned long *max)
{
- unsigned long *i, *min, *max;
+ unsigned long *i;
int vleft, first = 1, err = 0;
size_t left;
char *p;
@@ -1033,8 +1034,6 @@ static int __do_proc_doulongvec_minmax(void *data,
}
i = data;
- min = table->extra1;
- max = table->extra2;
vleft = table->maxlen / sizeof(unsigned long);
left = *lenp;
@@ -1095,8 +1094,10 @@ static int do_proc_doulongvec_minmax(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos, unsigned long convmul,
unsigned long convdiv)
{
+ unsigned long *min = table->extra1;
+ unsigned long *max = table->extra2;
return __do_proc_doulongvec_minmax(table->data, table, write,
- buffer, lenp, ppos, convmul, convdiv);
+ buffer, lenp, ppos, convmul, convdiv, min, max);
}
/**
--
2.41.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 4/6] sysctl: ensure files_stat.max_files is not less than sysctl_nr_open
2024-11-23 18:08 [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open Jinliang Zheng
` (2 preceding siblings ...)
2024-11-23 18:12 ` [PATCH 3/6] sysctl: refactor __do_proc_doulongvec_minmax() Jinliang Zheng
@ 2024-11-23 18:12 ` Jinliang Zheng
2024-11-23 18:13 ` [PATCH 5/6] sysctl: ensure sysctl_nr_open is not greater than files_stat.max_files Jinliang Zheng
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Jinliang Zheng @ 2024-11-23 18:12 UTC (permalink / raw)
To: alexjlzheng
Cc: adobriyan, alexjlzheng, brauner, flyingpeng, jack, joel.granados,
kees, linux-fsdevel, linux-kernel, mcgrof, viro
Introduce proc_doulongvec_maxfiles_minmax(), ensure the value of
files_stat.max_files is not less than sysctl_nr_open.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/file_table.c | 2 +-
include/linux/sysctl.h | 2 ++
kernel/sysctl.c | 17 +++++++++++++++++
3 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index db3d3a9cb421..01faa9c2869e 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -119,7 +119,7 @@ static struct ctl_table fs_stat_sysctls[] = {
.data = &files_stat.max_files,
.maxlen = sizeof(files_stat.max_files),
.mode = 0644,
- .proc_handler = proc_doulongvec_minmax,
+ .proc_handler = proc_doulongvec_maxfiles_minmax,
.extra1 = SYSCTL_LONG_ZERO,
.extra2 = SYSCTL_LONG_MAX,
},
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index aa4c6d44aaa0..4ecf945de956 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -82,6 +82,8 @@ int proc_dointvec_userhz_jiffies(const struct ctl_table *, int, void *, size_t *
int proc_dointvec_ms_jiffies(const struct ctl_table *, int, void *, size_t *,
loff_t *);
int proc_doulongvec_minmax(const struct ctl_table *, int, void *, size_t *, loff_t *);
+int proc_doulongvec_maxfiles_minmax(const struct ctl_table *, int, void *,
+ size_t *, loff_t *);
int proc_doulongvec_ms_jiffies_minmax(const struct ctl_table *table, int, void *,
size_t *, loff_t *);
int proc_do_large_bitmap(const struct ctl_table *, int, void *, size_t *, loff_t *);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 05b48b204ed4..5ee2bfc7fcbe 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1122,6 +1122,23 @@ int proc_doulongvec_minmax(const struct ctl_table *table, int write,
return do_proc_doulongvec_minmax(table, write, buffer, lenp, ppos, 1l, 1l);
}
+/*
+ * Used for 'sysctl -w fs.file-max', ensuring its value will not be less
+ * than sysctl_nr_open.
+ */
+int proc_doulongvec_maxfiles_minmax(const struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ unsigned long *min = table->extra1;
+ unsigned long *max = table->extra2;
+ unsigned long nr_open = sysctl_nr_open;
+
+ if (write)
+ min = &nr_open;
+ return __do_proc_doulongvec_minmax(table->data, table, write,
+ buffer, lenp, ppos, 1l, 1l, min, max);
+}
+
/**
* proc_doulongvec_ms_jiffies_minmax - read a vector of millisecond values with min/max values
* @table: the sysctl table
--
2.41.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 5/6] sysctl: ensure sysctl_nr_open is not greater than files_stat.max_files
2024-11-23 18:08 [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open Jinliang Zheng
` (3 preceding siblings ...)
2024-11-23 18:12 ` [PATCH 4/6] sysctl: ensure files_stat.max_files is not less than sysctl_nr_open Jinliang Zheng
@ 2024-11-23 18:13 ` Jinliang Zheng
2024-11-23 18:13 ` [PATCH 6/6] fs: synchronize the access of fs.file-max and fs.nr_open Jinliang Zheng
2024-11-23 18:27 ` [PATCH 0/6] Maintain the relative size " Al Viro
6 siblings, 0 replies; 12+ messages in thread
From: Jinliang Zheng @ 2024-11-23 18:13 UTC (permalink / raw)
To: alexjlzheng
Cc: adobriyan, alexjlzheng, brauner, flyingpeng, jack, joel.granados,
kees, linux-fsdevel, linux-kernel, mcgrof, viro
Introduce proc_douintvec_nropen_minmax(), ensure the value of
sysctl_nr_open is not greater than files_stat.max_files.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/file_table.c | 2 +-
include/linux/sysctl.h | 2 ++
kernel/sysctl.c | 21 +++++++++++++++++++++
3 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 01faa9c2869e..43838354ce6d 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -128,7 +128,7 @@ static struct ctl_table fs_stat_sysctls[] = {
.data = &sysctl_nr_open,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = proc_douintvec_minmax,
+ .proc_handler = proc_douintvec_nropen_minmax,
.extra1 = &sysctl_nr_open_min,
.extra2 = &sysctl_nr_open_max,
},
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 4ecf945de956..ed7400841f82 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -72,6 +72,8 @@ int proc_douintvec(const struct ctl_table *, int, void *, size_t *, loff_t *);
int proc_dointvec_minmax(const struct ctl_table *, int, void *, size_t *, loff_t *);
int proc_douintvec_minmax(const struct ctl_table *table, int write, void *buffer,
size_t *lenp, loff_t *ppos);
+int proc_douintvec_nropen_minmax(const struct ctl_table *, int, void *,
+ size_t *, loff_t *);
int proc_dou8vec_minmax(const struct ctl_table *table, int write, void *buffer,
size_t *lenp, loff_t *ppos);
int proc_dointvec_jiffies(const struct ctl_table *, int, void *, size_t *, loff_t *);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5ee2bfc7fcbe..d8ce18368ab3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -944,6 +944,27 @@ int proc_douintvec_minmax(const struct ctl_table *table, int write,
do_proc_douintvec_minmax_conv, ¶m);
}
+/*
+ * Used for 'sysctl -w fs.nr_open', ensuring its value will not be greater
+ * than files_stat.max_files.
+ */
+int proc_douintvec_nropen_minmax(const struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ unsigned int file_max;
+ struct do_proc_douintvec_minmax_conv_param param = {
+ .min = (unsigned int *) table->extra1,
+ .max = (unsigned int *) table->extra2,
+ };
+
+ file_max = min_t(unsigned int, files_stat.max_files,
+ *(unsigned int *)table->extra2);
+ if (write)
+ param.max = &file_max;
+ return do_proc_douintvec(table, write, buffer, lenp, ppos,
+ do_proc_douintvec_minmax_conv, ¶m);
+}
+
/**
* proc_dou8vec_minmax - read a vector of unsigned chars with min/max values
* @table: the sysctl table
--
2.41.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 6/6] fs: synchronize the access of fs.file-max and fs.nr_open
2024-11-23 18:08 [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open Jinliang Zheng
` (4 preceding siblings ...)
2024-11-23 18:13 ` [PATCH 5/6] sysctl: ensure sysctl_nr_open is not greater than files_stat.max_files Jinliang Zheng
@ 2024-11-23 18:13 ` Jinliang Zheng
2024-11-23 18:27 ` [PATCH 0/6] Maintain the relative size " Al Viro
6 siblings, 0 replies; 12+ messages in thread
From: Jinliang Zheng @ 2024-11-23 18:13 UTC (permalink / raw)
To: alexjlzheng
Cc: adobriyan, alexjlzheng, brauner, flyingpeng, jack, joel.granados,
kees, linux-fsdevel, linux-kernel, mcgrof, viro
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/file_table.c | 9 ++++++++-
include/linux/fs.h | 1 +
kernel/sysctl.c | 18 ++++++++++++++----
3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 43838354ce6d..4c0113912d9b 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -37,6 +37,8 @@ struct files_stat_struct files_stat = {
.max_files = NR_FILE
};
+DECLARE_RWSEM(file_number_sem);
+
/* SLAB cache for file structures */
static struct kmem_cache *filp_cachep __ro_after_init;
static struct kmem_cache *bfilp_cachep __ro_after_init;
@@ -102,8 +104,13 @@ EXPORT_SYMBOL_GPL(get_max_files);
static int proc_nr_files(const struct ctl_table *table, int write, void *buffer,
size_t *lenp, loff_t *ppos)
{
+ int ret;
+
+ down_read(&file_number_sem);
files_stat.nr_files = get_nr_files();
- return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
+ ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
+ up_read(&file_number_sem);
+ return ret;
}
static struct ctl_table fs_stat_sysctls[] = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 931076faadde..f8f983e5dde6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -90,6 +90,7 @@ extern void __init files_maxfiles_init(void);
extern unsigned long get_max_files(void);
extern unsigned int sysctl_nr_open;
extern struct files_stat_struct files_stat;
+extern struct rw_semaphore file_number_sem;
typedef __kernel_rwf_t rwf_t;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d8ce18368ab3..cf860d0e2c8b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -951,18 +951,22 @@ int proc_douintvec_minmax(const struct ctl_table *table, int write,
int proc_douintvec_nropen_minmax(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
+ int ret;
unsigned int file_max;
struct do_proc_douintvec_minmax_conv_param param = {
.min = (unsigned int *) table->extra1,
.max = (unsigned int *) table->extra2,
};
+ down_write(&file_number_sem);
file_max = min_t(unsigned int, files_stat.max_files,
*(unsigned int *)table->extra2);
if (write)
param.max = &file_max;
- return do_proc_douintvec(table, write, buffer, lenp, ppos,
+ ret = do_proc_douintvec(table, write, buffer, lenp, ppos,
do_proc_douintvec_minmax_conv, ¶m);
+ up_write(&file_number_sem);
+ return ret;
}
/**
@@ -1150,14 +1154,20 @@ int proc_doulongvec_minmax(const struct ctl_table *table, int write,
int proc_doulongvec_maxfiles_minmax(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
+ int ret;
unsigned long *min = table->extra1;
unsigned long *max = table->extra2;
- unsigned long nr_open = sysctl_nr_open;
+ unsigned long nr_open;
- if (write)
+ down_write(&file_number_sem);
+ if (write) {
+ nr_open = sysctl_nr_open;
min = &nr_open;
- return __do_proc_doulongvec_minmax(table->data, table, write,
+ }
+ ret = __do_proc_doulongvec_minmax(table->data, table, write,
buffer, lenp, ppos, 1l, 1l, min, max);
+ up_write(&file_number_sem);
+ return ret;
}
/**
--
2.41.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open
2024-11-23 18:08 [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open Jinliang Zheng
` (5 preceding siblings ...)
2024-11-23 18:13 ` [PATCH 6/6] fs: synchronize the access of fs.file-max and fs.nr_open Jinliang Zheng
@ 2024-11-23 18:27 ` Al Viro
2024-11-23 19:32 ` Al Viro
6 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2024-11-23 18:27 UTC (permalink / raw)
To: Jinliang Zheng
Cc: brauner, jack, mcgrof, kees, joel.granados, adobriyan,
linux-fsdevel, linux-kernel, flyingpeng, Jinliang Zheng
On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote:
> According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
> fs.file-max represent the number of file-handles that can be opened
> by each process and the entire system, respectively.
>
> Therefore, it's necessary to maintain a relative size between them,
> meaning we should ensure that files_stat.max_files is not less than
> sysctl_nr_open.
NAK.
You are confusing descriptors (nr_open) and open IO channels (max_files).
We very well _CAN_ have more of the former. For further details,
RTFM dup(2) or any introductory Unix textbook.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open
2024-11-23 18:27 ` [PATCH 0/6] Maintain the relative size " Al Viro
@ 2024-11-23 19:32 ` Al Viro
2024-11-24 2:30 ` Theodore Ts'o
2024-11-24 9:48 ` Jinliang Zheng
0 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2024-11-23 19:32 UTC (permalink / raw)
To: Jinliang Zheng
Cc: brauner, jack, mcgrof, kees, joel.granados, adobriyan,
linux-fsdevel, linux-kernel, flyingpeng, Jinliang Zheng
On Sat, Nov 23, 2024 at 06:27:30PM +0000, Al Viro wrote:
> On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote:
> > According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
> > fs.file-max represent the number of file-handles that can be opened
> > by each process and the entire system, respectively.
> >
> > Therefore, it's necessary to maintain a relative size between them,
> > meaning we should ensure that files_stat.max_files is not less than
> > sysctl_nr_open.
>
> NAK.
>
> You are confusing descriptors (nr_open) and open IO channels (max_files).
>
> We very well _CAN_ have more of the former. For further details,
> RTFM dup(2) or any introductory Unix textbook.
Short version: there are 3 different notions -
1) file as a collection of data kept by filesystem. Such things as
contents, ownership, permissions, timestamps belong there.
2) IO channel used to access one of (1). open(2) creates such;
things like current position in file, whether it's read-only or read-write
open, etc. belong there. It does not belong to a process - after fork(),
child has access to all open channels parent had when it had spawned
a child. If you open a file in parent, read 10 bytes from it, then spawn
a child that reads 10 more bytes and exits, then have parent read another
5 bytes, the first read by parent will have read bytes 0 to 9, read by
child - bytes 10 to 19 and the second read by parent - bytes 20 to 24.
Position is a property of IO channel; it belongs neither to underlying
file (otherwise another process opening the file and reading from it
would play havoc on your process) nor to process (otherwise reads done
by child would not have affected the parent and the second read from
parent would have gotten bytes 10 to 14). Same goes for access mode -
it belongs to IO channel.
3) file descriptor - a number that has a meaning only in context
of a process and refers to IO channel. That's what system calls use
to identify the IO channel to operate upon; open() picks a descriptor
unused by the calling process, associates the new channel with it and
returns that descriptor (a number) to caller. Multiple descriptors can
refer to the same IO channel; e.g. dup(fd) grabs a new descriptor and
associates it with the same IO channel fd currently refers to.
IO channels are not directly exposed to userland, but they are
very much present in Unix-style IO API. Note that results of e.g.
int fd1 = open("/etc/issue", 0);
int fd2 = open("/etc/issue", 0);
and
int fd1 = open("/etc/issue", 0);
int fd2 = dup(fd1);
are not identical, even though in both cases fd1 and fd2 are opened
descriptors and reading from them will access the contents of the
/etc/issue; in the former case the positions being accessed by read from
fd1 and fd2 will be independent, in the latter they will be shared.
It's really quite basic - Unix Programming 101 stuff. It's not
just that POSIX requires that and that any Unix behaves that way,
anything even remotely Unix-like will be like that.
You won't find the words 'IO channel' in POSIX, but I refuse
to use the term they have chosen instead - 'file description'. Yes,
alongside with 'file descriptor', in the contexts where the distinction
between these notions is quite important. I would rather not say what
I really think of those unsung geniuses, lest CoC gets overexcited...
Anyway, in casual conversations the expression 'opened file'
usually refers to that thing. Which is somewhat clumsy (sounds like
'file on filesystem that happens to be opened'), but usually it's
good enough. If you need to be pedantic (e.g. when explaining that
material in aforementioned Unix Programming 101 class), 'IO channel'
works well enough, IME.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open
2024-11-23 19:32 ` Al Viro
@ 2024-11-24 2:30 ` Theodore Ts'o
2024-11-24 9:48 ` Jinliang Zheng
1 sibling, 0 replies; 12+ messages in thread
From: Theodore Ts'o @ 2024-11-24 2:30 UTC (permalink / raw)
To: Al Viro
Cc: Jinliang Zheng, brauner, jack, mcgrof, kees, joel.granados,
adobriyan, linux-fsdevel, linux-kernel, flyingpeng,
Jinliang Zheng
On Sat, Nov 23, 2024 at 07:32:27PM +0000, Al Viro wrote:
>
> You won't find the words 'IO channel' in POSIX, but I refuse
> to use the term they have chosen instead - 'file description'. Yes,
> alongside with 'file descriptor', in the contexts where the distinction
> between these notions is quite important.
What I tend to do is use the term "struct file" instead. The "file
descriptor" literally is an integer index into an array of "struct
file" pointers.
"struct file" is how things are actually implemented in Linux and most
Unix systems. And while it's admittedly ugly to use an implementation
detail as an abstract term, it's infinitely less ugly than Posix's
"file description". :-)
- Ted
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open
2024-11-23 19:32 ` Al Viro
2024-11-24 2:30 ` Theodore Ts'o
@ 2024-11-24 9:48 ` Jinliang Zheng
2024-11-24 15:59 ` Theodore Ts'o
1 sibling, 1 reply; 12+ messages in thread
From: Jinliang Zheng @ 2024-11-24 9:48 UTC (permalink / raw)
To: viro
Cc: adobriyan, alexjlzheng, alexjlzheng, brauner, flyingpeng, jack,
joel.granados, kees, linux-fsdevel, linux-kernel, mcgrof
On Sat, 23 Nov 2024 19:32:27 +0000, Al Viro wrote:
> On Sat, Nov 23, 2024 at 06:27:30PM +0000, Al Viro wrote:
> > On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote:
> > > According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and
> > > fs.file-max represent the number of file-handles that can be opened
> > > by each process and the entire system, respectively.
> > >
> > > Therefore, it's necessary to maintain a relative size between them,
> > > meaning we should ensure that files_stat.max_files is not less than
> > > sysctl_nr_open.
> >
> > NAK.
> >
> > You are confusing descriptors (nr_open) and open IO channels (max_files).
> >
> > We very well _CAN_ have more of the former. For further details,
> > RTFM dup(2) or any introductory Unix textbook.
>
> Short version: there are 3 different notions -
> 1) file as a collection of data kept by filesystem. Such things as
> contents, ownership, permissions, timestamps belong there.
> 2) IO channel used to access one of (1). open(2) creates such;
> things like current position in file, whether it's read-only or read-write
> open, etc. belong there. It does not belong to a process - after fork(),
> child has access to all open channels parent had when it had spawned
> a child. If you open a file in parent, read 10 bytes from it, then spawn
> a child that reads 10 more bytes and exits, then have parent read another
> 5 bytes, the first read by parent will have read bytes 0 to 9, read by
> child - bytes 10 to 19 and the second read by parent - bytes 20 to 24.
> Position is a property of IO channel; it belongs neither to underlying
> file (otherwise another process opening the file and reading from it
> would play havoc on your process) nor to process (otherwise reads done
> by child would not have affected the parent and the second read from
> parent would have gotten bytes 10 to 14). Same goes for access mode -
> it belongs to IO channel.
I'm sorry that I don't know much about the implementation of UNIX, but
specific to the implementation of Linux, struct file is more like a
combination of what you said 1) and 2).
But I see your point, I missed the dup() case. dup() will occupy the
element position of the fdtable->fd array, but will not create a new
struct file.
Thank you.
Jinliang Zheng
> 3) file descriptor - a number that has a meaning only in context
> of a process and refers to IO channel. That's what system calls use
> to identify the IO channel to operate upon; open() picks a descriptor
> unused by the calling process, associates the new channel with it and
> returns that descriptor (a number) to caller. Multiple descriptors can
> refer to the same IO channel; e.g. dup(fd) grabs a new descriptor and
> associates it with the same IO channel fd currently refers to.
>
> IO channels are not directly exposed to userland, but they are
> very much present in Unix-style IO API. Note that results of e.g.
> int fd1 = open("/etc/issue", 0);
> int fd2 = open("/etc/issue", 0);
> and
> int fd1 = open("/etc/issue", 0);
> int fd2 = dup(fd1);
> are not identical, even though in both cases fd1 and fd2 are opened
> descriptors and reading from them will access the contents of the
> /etc/issue; in the former case the positions being accessed by read from
> fd1 and fd2 will be independent, in the latter they will be shared.
>
> It's really quite basic - Unix Programming 101 stuff. It's not
> just that POSIX requires that and that any Unix behaves that way,
> anything even remotely Unix-like will be like that.
>
> You won't find the words 'IO channel' in POSIX, but I refuse
> to use the term they have chosen instead - 'file description'. Yes,
> alongside with 'file descriptor', in the contexts where the distinction
> between these notions is quite important. I would rather not say what
> I really think of those unsung geniuses, lest CoC gets overexcited...
>
> Anyway, in casual conversations the expression 'opened file'
> usually refers to that thing. Which is somewhat clumsy (sounds like
> 'file on filesystem that happens to be opened'), but usually it's
> good enough. If you need to be pedantic (e.g. when explaining that
> material in aforementioned Unix Programming 101 class), 'IO channel'
> works well enough, IME.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 0/6] Maintain the relative size of fs.file-max and fs.nr_open
2024-11-24 9:48 ` Jinliang Zheng
@ 2024-11-24 15:59 ` Theodore Ts'o
0 siblings, 0 replies; 12+ messages in thread
From: Theodore Ts'o @ 2024-11-24 15:59 UTC (permalink / raw)
To: Jinliang Zheng
Cc: viro, adobriyan, alexjlzheng, brauner, flyingpeng, jack,
joel.granados, kees, linux-fsdevel, linux-kernel, mcgrof
On Sun, Nov 24, 2024 at 05:48:13PM +0800, Jinliang Zheng wrote:
> >
> > Short version: there are 3 different notions -
> > 1) file as a collection of data kept by filesystem. Such things as
> > contents, ownership, permissions, timestamps belong there.
> > 2) IO channel used to access one of (1). open(2) creates such;
> > things like current position in file, whether it's read-only or read-write
> > open, etc. belong there. It does not belong to a process - after fork(),
> > ...
>
> I'm sorry that I don't know much about the implementation of UNIX, but
> specific to the implementation of Linux, struct file is more like a
> combination of what you said 1) and 2).
This is incorrect. In Linux (and historical implementations of Unix)
struct file is precisely (2). The struct file has a pointer to a
struct dentry, which in turn has a pointer to a struct inode. So a
struct file *refers* to (1), but it is *not* (1).
- Ted
^ permalink raw reply [flat|nested] 12+ messages in thread