Discussion:
[fuse-devel] [RFC PATCH 0/5] fuse: make maximum read/write request size tunable
Nikolaus Rath
2012-07-05 13:04:40 UTC
Permalink
Hi,
This patch series make maximum read/write request size tunable in FUSE.
Currently, it is limited to FUSE_MAX_PAGES_PER_REQ which is equal
to 32 pages. It is required to change it in order to improve the
throughput since optimized value depends on various factors such
as type and version of local filesystems used and HW specs, etc.
This truly is a joyful week for FUSE :-).

Are these patches compatible with the fuse write-back patch series
posted by Pavel a few days ago?


Thanks,

-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
Liu Yuan
2012-07-06 05:53:46 UTC
Permalink
One of the ways to solve this is to make them tunable.
In this series, the new sysfs parameter max_pages_per_req is introduced.
It limits the maximum read/write size in fuse request and it can be
changed from 32 to 256 pages in current implementations. When the
max_read/max_write mount option is specified, FUSE request size is set
per mount. (The size is rounded-up to page size and limited up to
max_pages_per_req.)
Why maxim 256 pages? If we are here, we can go further: most of object
storage system has object size of multiple to dozens of megabytes. So I
think probably 1M is too small. Our distribution storage system has 4M
per object, so I think at least maxim size could be bigger than 4M.

Thanks,
Yuan
Han-Wen Nienhuys
2012-07-06 13:58:07 UTC
Permalink
Post by Liu Yuan
One of the ways to solve this is to make them tunable.
In this series, the new sysfs parameter max_pages_per_req is introduced.
It limits the maximum read/write size in fuse request and it can be
changed from 32 to 256 pages in current implementations. When the
max_read/max_write mount option is specified, FUSE request size is set
per mount. (The size is rounded-up to page size and limited up to
max_pages_per_req.)
Why maxim 256 pages? If we are here, we can go further: most of object
storage system has object size of multiple to dozens of megabytes. So I
think probably 1M is too small. Our distribution storage system has 4M
per object, so I think at least maxim size could be bigger than 4M.
The maximum pipe size on my system is 1M, so if you go beyond that,
splicing from the FD won't work.

Also, the userspace client must reserve a buffer this size so it can
receive a write, which is a waste since most requests are much
smaller.
--
Han-Wen Nienhuys - ***@xs4all.nl - http://www.xs4all.nl/~hanwen
HAYASAKA Mitsuo
2012-07-12 05:58:29 UTC
Permalink
Hi Yuan and Han-Wen,

Thank you for your comments.
Post by Han-Wen Nienhuys
Post by Liu Yuan
One of the ways to solve this is to make them tunable.
In this series, the new sysfs parameter max_pages_per_req is introduced.
It limits the maximum read/write size in fuse request and it can be
changed from 32 to 256 pages in current implementations. When the
max_read/max_write mount option is specified, FUSE request size is set
per mount. (The size is rounded-up to page size and limited up to
max_pages_per_req.)
Why maxim 256 pages? If we are here, we can go further: most of object
storage system has object size of multiple to dozens of megabytes. So I
think probably 1M is too small. Our distribution storage system has 4M
per object, so I think at least maxim size could be bigger than 4M.
The maximum pipe size on my system is 1M, so if you go beyond that,
splicing from the FD won't work.
Also, the userspace client must reserve a buffer this size so it can
receive a write, which is a waste since most requests are much
smaller.
I checked the maximum pipe size can be changed using fcntl(2) or
/proc/sys/fs/pipe-max-size. It is clear that it is not a fixed value.

Also, it seems that there is a request for setting the maximum number
of pages per fuse request to 4M (1024 pages). One of the reasons to
introduce the sysfs max_pages_per_req parameter is to set a threshold
of the maximum number of pages dynamically according to the
administrator's demand, and root can only change it.

So, when the maximum value is required to be set to not more than the
pipe-max-size, the max_pages_per_req should be changed considering it.
It seems that the upper limit of this parameter does not have to be
not more than it.

I'm planning to limit max_pages_per_req up to 1024 pages and add the
document to /Documentation/filesystems/fuse.txt, as follows.

"the sysfs max_pages_per_req parameter can be changed from 32 to 1024.
The default is 32 pages. Generally, the pipe-max-size is 1M (256 pages)
and it is better to set it to not more than the pipe-max-size."

This is just a plan and any comments are appreciated.

Thanks,
Liu Yuan
2012-07-12 06:13:05 UTC
Permalink
Post by HAYASAKA Mitsuo
Hi Yuan and Han-Wen,
Thank you for your comments.
Post by Han-Wen Nienhuys
Post by Liu Yuan
One of the ways to solve this is to make them tunable.
In this series, the new sysfs parameter max_pages_per_req is introduced.
It limits the maximum read/write size in fuse request and it can be
changed from 32 to 256 pages in current implementations. When the
max_read/max_write mount option is specified, FUSE request size is set
per mount. (The size is rounded-up to page size and limited up to
max_pages_per_req.)
Why maxim 256 pages? If we are here, we can go further: most of object
storage system has object size of multiple to dozens of megabytes. So I
think probably 1M is too small. Our distribution storage system has 4M
per object, so I think at least maxim size could be bigger than 4M.
The maximum pipe size on my system is 1M, so if you go beyond that,
splicing from the FD won't work.
Also, the userspace client must reserve a buffer this size so it can
receive a write, which is a waste since most requests are much
smaller.
I checked the maximum pipe size can be changed using fcntl(2) or
/proc/sys/fs/pipe-max-size. It is clear that it is not a fixed value.
Also, it seems that there is a request for setting the maximum number
of pages per fuse request to 4M (1024 pages). One of the reasons to
introduce the sysfs max_pages_per_req parameter is to set a threshold
of the maximum number of pages dynamically according to the
administrator's demand, and root can only change it.
So, when the maximum value is required to be set to not more than the
pipe-max-size, the max_pages_per_req should be changed considering it.
It seems that the upper limit of this parameter does not have to be
not more than it.
I'm planning to limit max_pages_per_req up to 1024 pages and add the
document to /Documentation/filesystems/fuse.txt, as follows.
"the sysfs max_pages_per_req parameter can be changed from 32 to 1024.
The default is 32 pages. Generally, the pipe-max-size is 1M (256 pages)
and it is better to set it to not more than the pipe-max-size."
This is just a plan and any comments are appreciated.
This looks reasonable to me, we should try our best to maximize the
upper ceiling to deal with various of kinds of demands.

Thanks for your work, Mitsuo, as a user of FUSE, I'd vote +1 for your
patch set.

Thanks,
Yuan
Miklos Szeredi
2012-07-12 10:13:53 UTC
Permalink
Post by HAYASAKA Mitsuo
Hi Yuan and Han-Wen,
Thank you for your comments.
Post by Han-Wen Nienhuys
Post by Liu Yuan
One of the ways to solve this is to make them tunable.
In this series, the new sysfs parameter max_pages_per_req is introduced.
It limits the maximum read/write size in fuse request and it can be
changed from 32 to 256 pages in current implementations. When the
max_read/max_write mount option is specified, FUSE request size is set
per mount. (The size is rounded-up to page size and limited up to
max_pages_per_req.)
Why maxim 256 pages? If we are here, we can go further: most of object
storage system has object size of multiple to dozens of megabytes. So I
think probably 1M is too small. Our distribution storage system has 4M
per object, so I think at least maxim size could be bigger than 4M.
The maximum pipe size on my system is 1M, so if you go beyond that,
splicing from the FD won't work.
Also, the userspace client must reserve a buffer this size so it can
receive a write, which is a waste since most requests are much
smaller.
I checked the maximum pipe size can be changed using fcntl(2) or
/proc/sys/fs/pipe-max-size. It is clear that it is not a fixed value.
Also, it seems that there is a request for setting the maximum number
of pages per fuse request to 4M (1024 pages). One of the reasons to
introduce the sysfs max_pages_per_req parameter is to set a threshold
of the maximum number of pages dynamically according to the
administrator's demand, and root can only change it.
So, when the maximum value is required to be set to not more than the
pipe-max-size, the max_pages_per_req should be changed considering it.
It seems that the upper limit of this parameter does not have to be
not more than it.
I'm planning to limit max_pages_per_req up to 1024 pages and add the
document to /Documentation/filesystems/fuse.txt, as follows.
"the sysfs max_pages_per_req parameter can be changed from 32 to 1024.
The default is 32 pages. Generally, the pipe-max-size is 1M (256 pages)
and it is better to set it to not more than the pipe-max-size."
Can't we just use pipe-max-size for the limit?

Then we'll use the minimum of pipe-max-size and max_read/max_write for
sizing the requests.

Another comment: do we really need to allocate each and every request
with space for the pages? I don't think that makes sense. Let's leave
some small number of pages inline in the request and allocate a separate
array if the number of pages is too large. There may even be some utilities
in the kernel to handle dynamically sized page arrays (I haven't looked
but I suspect there is).

Thanks,
Miklos
HAYASAKA Mitsuo
2012-07-13 07:30:07 UTC
Permalink
Hi Miklos,

Thank you for your comments.
Post by Miklos Szeredi
Post by HAYASAKA Mitsuo
Hi Yuan and Han-Wen,
Thank you for your comments.
Post by Han-Wen Nienhuys
Post by Liu Yuan
One of the ways to solve this is to make them tunable.
In this series, the new sysfs parameter max_pages_per_req is introduced.
It limits the maximum read/write size in fuse request and it can be
changed from 32 to 256 pages in current implementations. When the
max_read/max_write mount option is specified, FUSE request size is set
per mount. (The size is rounded-up to page size and limited up to
max_pages_per_req.)
Why maxim 256 pages? If we are here, we can go further: most of object
storage system has object size of multiple to dozens of megabytes. So I
think probably 1M is too small. Our distribution storage system has 4M
per object, so I think at least maxim size could be bigger than 4M.
The maximum pipe size on my system is 1M, so if you go beyond that,
splicing from the FD won't work.
Also, the userspace client must reserve a buffer this size so it can
receive a write, which is a waste since most requests are much
smaller.
I checked the maximum pipe size can be changed using fcntl(2) or
/proc/sys/fs/pipe-max-size. It is clear that it is not a fixed value.
Also, it seems that there is a request for setting the maximum number
of pages per fuse request to 4M (1024 pages). One of the reasons to
introduce the sysfs max_pages_per_req parameter is to set a threshold
of the maximum number of pages dynamically according to the
administrator's demand, and root can only change it.
So, when the maximum value is required to be set to not more than the
pipe-max-size, the max_pages_per_req should be changed considering it.
It seems that the upper limit of this parameter does not have to be
not more than it.
I'm planning to limit max_pages_per_req up to 1024 pages and add the
document to /Documentation/filesystems/fuse.txt, as follows.
"the sysfs max_pages_per_req parameter can be changed from 32 to 1024.
The default is 32 pages. Generally, the pipe-max-size is 1M (256 pages)
and it is better to set it to not more than the pipe-max-size."
Can't we just use pipe-max-size for the limit?
This is great!
I'd like to change this patch to using the pipe-max-size for the upper
limit of the max_pages_per_req sysfs paramter, and resubmit it.
Post by Miklos Szeredi
Then we'll use the minimum of pipe-max-size and max_read/max_write for
sizing the requests.
Another comment: do we really need to allocate each and every request
with space for the pages? I don't think that makes sense. Let's leave
some small number of pages inline in the request and allocate a separate
array if the number of pages is too large. There may even be some utilities
in the kernel to handle dynamically sized page arrays (I haven't looked
but I suspect there is).
This is interesting and enables to dramatically reduce the number of page
allocation and free. However, it seems that it is necessary to investigate
if this is feasible.

Thanks,
Post by Miklos Szeredi
Thanks,
Miklos
HAYASAKA Mitsuo
2012-07-06 10:09:15 UTC
Permalink
Hi Nikolaus,

Thank you for your comments.
Post by Nikolaus Rath
Hi,
This patch series make maximum read/write request size tunable in FUSE.
Currently, it is limited to FUSE_MAX_PAGES_PER_REQ which is equal
to 32 pages. It is required to change it in order to improve the
throughput since optimized value depends on various factors such
as type and version of local filesystems used and HW specs, etc.
This truly is a joyful week for FUSE :-).
Are these patches compatible with the fuse write-back patch series
posted by Pavel a few days ago?
I applied this patch series to the latest upstream kernel and measured
the read/write throughput using it. So, I have not try Pavel's patch yet.

However, I think it is compatible with his write-back patch series
since it just makes maximum limit of fuse request size tunable.
And my patch will be effective even for direct I/O.

Thanks,
Post by Nikolaus Rath
Thanks,
-Nikolaus
Mitsuo Hayasaka
2012-07-05 10:50:50 UTC
Permalink
The fuse_req_cachep was used for request allocation in fuse.
This patch does not create it since it is not used anymore
due to the tunable read/write request size in fuse.

Signed-off-by: Mitsuo Hayasaka <***@hitachi.com>
Cc: Miklos Szeredi <***@szeredi.hu>
---

fs/fuse/dev.c | 21 +--------------------
1 files changed, 1 insertions(+), 20 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 511560b..4087ff4 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -23,8 +23,6 @@
MODULE_ALIAS_MISCDEV(FUSE_MINOR);
MODULE_ALIAS("devname:fuse");

-static struct kmem_cache *fuse_req_cachep;
-
static struct fuse_conn *fuse_get_conn(struct file *file)
{
/*
@@ -2075,27 +2073,10 @@ static struct miscdevice fuse_miscdevice = {

int __init fuse_dev_init(void)
{
- int err = -ENOMEM;
- fuse_req_cachep = kmem_cache_create("fuse_request",
- sizeof(struct fuse_req),
- 0, 0, NULL);
- if (!fuse_req_cachep)
- goto out;
-
- err = misc_register(&fuse_miscdevice);
- if (err)
- goto out_cache_clean;
-
- return 0;
-
- out_cache_clean:
- kmem_cache_destroy(fuse_req_cachep);
- out:
- return err;
+ return misc_register(&fuse_miscdevice);
}

void fuse_dev_cleanup(void)
{
misc_deregister(&fuse_miscdevice);
- kmem_cache_destroy(fuse_req_cachep);
}
Mitsuo Hayasaka
2012-07-05 10:51:15 UTC
Permalink
This post might be inappropriate. Click to display it.
Rob Landley
2012-07-06 12:54:22 UTC
Permalink
Post by Mitsuo Hayasaka
Add an explantion about the sysfs parameter to the limit
maximum read/write request size.
---
Documentation/filesystems/fuse.txt | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt
index 13af4a4..e6ffba3 100644
--- a/Documentation/filesystems/fuse.txt
+++ b/Documentation/filesystems/fuse.txt
@@ -108,13 +108,28 @@ Mount options
With this option the maximum size of read operations can be set.
The default is infinite. Note that the size of read requests is
- limited anyway to 32 pages (which is 128kbyte on i386).
+ limited to 32 pages (which is 128kbyte on i386) if direct_io
+ option is not specified. When direct_io option is specified,
+ the request size is limited to max_pages_per_req sysfs parameter.
"Note that the maximum size of read requests defaults to 32 pages (128k
on i386), use max_pages_per_req to change this default."

And then describe max_page_per_req sufficiently thoroughly below, all in
one place.

(By the way, has anybody actually tested it with a single page as the
limit? Does that work?)
Post by Mitsuo Hayasaka
'blksize=N'
Set the block size for the filesystem. The default is 512. This
option is only valid for 'fuseblk' type mounts.
+Sysfs parameter
+~~~~~~~~~~~~~~~
+
+The sysfs parameter max_pages_per_req limits the maximum page size per
+FUSE request.
No, it limits the maximum size of a data request and the units are
decimal number of pages. It doesn't change the size of memory pages in
the system.

Also, your first hunk implies this setting only takes effect if they
mounted with "-o direct_io", is that true?
Post by Mitsuo Hayasaka
+ /sys/fs/fuse/max_pages_per_req
+
+The default is 32 pages. It can be changed from 32 to 256 pages, which
+may improve the read/write throughput optimizing it. This change is
+effective per mount. Therefore, the re-mounting of FUSE filesystem
+is required after changing it.
I'd say "Changing it to 256 pages may improve read/write throguhput on
systems with enough memory. Existing FUSE mounts must be remounted for
this change to take effect."

I.E. don't imply 32 and 256 are the only options unless they are. (Is
there some requirement that it be a power of 2, or just a good idea?)

And per-mount sounds like you're setting it for a specific mount point,
so if I have three mounts there would be three entries under
/sys/fs/fuse, which does not seem to be the case. (Which is odd, because
you'd think there would be an "-o max_pages_per_req=128" that _would_
set this per-mount if the value actually used is cached in the
superblock, but I'm not seeing one...)

Rob
--
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation. Pick one.
HAYASAKA Mitsuo
2012-07-12 13:13:19 UTC
Permalink
Hi Rob,

Thank you for your comments.
Post by Rob Landley
Post by Mitsuo Hayasaka
Add an explantion about the sysfs parameter to the limit
maximum read/write request size.
---
Documentation/filesystems/fuse.txt | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt
index 13af4a4..e6ffba3 100644
--- a/Documentation/filesystems/fuse.txt
+++ b/Documentation/filesystems/fuse.txt
@@ -108,13 +108,28 @@ Mount options
With this option the maximum size of read operations can be set.
The default is infinite. Note that the size of read requests is
- limited anyway to 32 pages (which is 128kbyte on i386).
+ limited to 32 pages (which is 128kbyte on i386) if direct_io
+ option is not specified. When direct_io option is specified,
+ the request size is limited to max_pages_per_req sysfs parameter.
"Note that the maximum size of read requests defaults to 32 pages (128k
on i386), use max_pages_per_req to change this default."
And then describe max_page_per_req sufficiently thoroughly below, all in
one place.
OK, I will revise it.
Post by Rob Landley
(By the way, has anybody actually tested it with a single page as the
limit? Does that work?)
This patch series enables the maximum request size to change to arbitrary
number from 32 to 256, and cannot set it to less than 32 pages.
Post by Rob Landley
Post by Mitsuo Hayasaka
'blksize=N'
Set the block size for the filesystem. The default is 512. This
option is only valid for 'fuseblk' type mounts.
+Sysfs parameter
+~~~~~~~~~~~~~~~
+
+The sysfs parameter max_pages_per_req limits the maximum page size per
+FUSE request.
No, it limits the maximum size of a data request and the units are
decimal number of pages. It doesn't change the size of memory pages in
the system.
You are right. I will revise it.
Post by Rob Landley
Also, your first hunk implies this setting only takes effect if they
mounted with "-o direct_io", is that true?
The request size increases using max_pages_per_req for read operation w/
direct_io, and write operation w/ and w/o direct_io. But it is not changed
for read operation w/o direct_io. So, it is true if only focusing on read
operation.
Post by Rob Landley
Post by Mitsuo Hayasaka
+ /sys/fs/fuse/max_pages_per_req
+
+The default is 32 pages. It can be changed from 32 to 256 pages, which
+may improve the read/write throughput optimizing it. This change is
+effective per mount. Therefore, the re-mounting of FUSE filesystem
+is required after changing it.
I'd say "Changing it to 256 pages may improve read/write throguhput on
systems with enough memory. Existing FUSE mounts must be remounted for
this change to take effect."
I.E. don't imply 32 and 256 are the only options unless they are. (Is
there some requirement that it be a power of 2, or just a good idea?)
Here, I wanted to imply that the max_paegs_per_req can be changed to
arbitrary number from 32 to 256. I will revise it since this explanation
is misleading.

Also, there is no requirement that it be a power of 2 although it is a
good idea if only focusing on kmalloc(). One of the reasons to introduce
the max_pages_per_req sysfs parameter is to let the libfuse get the
current maximum request size and change the MIN_BUFSIZE limitation
according to it to avoid an waste of memory in userspace.
Post by Rob Landley
And per-mount sounds like you're setting it for a specific mount point,
so if I have three mounts there would be three entries under
/sys/fs/fuse, which does not seem to be the case. (Which is odd, because
you'd think there would be an "-o max_pages_per_req=128" that _would_
set this per-mount if the value actually used is cached in the
superblock, but I'm not seeing one...)
The max_pages_per_req is a system limitation controlled by the administrator.
The actual number of allocated pages per request can be changed using
max_read/max_write mount options below this system limitation.

I will revise and resubmit this patch series soon.

Thanks,
Post by Rob Landley
Rob
Mitsuo Hayasaka
2012-07-05 10:51:00 UTC
Permalink
The default global limits for congestion threshold and backgrounded
requests are calculated using size of fuse_req structure, which is
variable due to the tunable read/write request size. This patch sets
them to their minimum values by default in order to avoid the variable
and unstable limits per mount.

Signed-off-by: Mitsuo Hayasaka <***@hitachi.com>
Cc: Miklos Szeredi <***@szeredi.hu>
---

fs/fuse/fuse_i.h | 5 +++++
fs/fuse/inode.c | 2 +-
2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index c96dc5f..72210a8 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -29,6 +29,11 @@
/** Default number of pages that can be used in a single read/write request */
#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32

+/** Maximum size of struct fuse_req */
+#define FUSE_MAX_FUSE_REQ_SIZE (sizeof(struct fuse_req) + \
+ FUSE_MAX_PAGES_PER_REQ * \
+ sizeof(struct page *))
+
/** Bias for fi->writectr, meaning new writepages must not be sent */
#define FUSE_NOWRITE INT_MIN

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index aadf157..d8d302a 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -758,7 +758,7 @@ static void sanitize_global_limit(unsigned *limit)
{
if (*limit == 0)
*limit = ((num_physpages << PAGE_SHIFT) >> 13) /
- sizeof(struct fuse_req);
+ FUSE_MAX_FUSE_REQ_SIZE;

if (*limit >= 1 << 16)
*limit = (1 << 16) - 1;
Mitsuo Hayasaka
2012-07-05 10:50:17 UTC
Permalink
This post might be inappropriate. Click to display it.
Mitsuo Hayasaka
2012-07-05 10:51:07 UTC
Permalink
The tunable maximum read/write request size changes the size of
fuse request when max_read/max_write mount option is specified.

The libfuse should change the current MIN_BUFSIZE limitation
according to the current maximum request size. If not, the
libfuse must always set MIN_BUFSIZE to the maximum request limit
(= [256 pages * 4KB + 0x1000] in current implementation), which
leads to waste of memory. So, it is necessary to get it from
userspace.

This patch adds a sysfs parameter to achieve it. It can be
changed from 32 to 256 pages and the 32 pages are set by default.

When we want to increase the maximum request size, it is required
to change this parameter before mounting FUSE filesystems.

Signed-off-by: Mitsuo Hayasaka <***@hitachi.com>
Cc: Miklos Szeredi <***@szeredi.hu>
---

fs/fuse/inode.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index d8d302a..50b78c6 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -47,6 +47,13 @@ MODULE_PARM_DESC(max_user_congthresh,
"Global limit for the maximum congestion threshold an "
"unprivileged user can set");

+/**
+ * Maximum number of pages allocated for struct fuse_req.
+ * It can be changed via sysfs from FUSE_DEFAULT_MAX_PAGES_PER_REQ
+ * to FUSE_MAX_PAGES_PER_REQ.
+ */
+static unsigned sysfs_max_req_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ;
+
#define FUSE_SUPER_MAGIC 0x65735546

#define FUSE_DEFAULT_BLKSIZE 512
@@ -780,8 +787,7 @@ static int set_global_limit(const char *val, struct kernel_param *kp)
static void set_conn_max_pages(struct fuse_conn *fc, unsigned max_pages)
{
if (max_pages > fc->max_pages) {
- fc->max_pages = min_t(unsigned, FUSE_MAX_PAGES_PER_REQ,
- max_pages);
+ fc->max_pages = min_t(unsigned, sysfs_max_req_pages, max_pages);
fc->fuse_req_size = sizeof(struct fuse_req) +
fc->max_pages * sizeof(struct page *);
}
@@ -1203,6 +1209,43 @@ static void fuse_fs_cleanup(void)
static struct kobject *fuse_kobj;
static struct kobject *connections_kobj;

+static ssize_t max_req_pages_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%u\n", sysfs_max_req_pages);
+}
+
+static ssize_t max_req_pages_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ int err;
+ unsigned long t;
+
+ err = kstrtoul(skip_spaces(buf), 0, &t);
+ if (err)
+ return err;
+
+ t = max_t(unsigned long, t, FUSE_DEFAULT_MAX_PAGES_PER_REQ);
+ t = min_t(unsigned long, t, FUSE_MAX_PAGES_PER_REQ);
+
+ sysfs_max_req_pages = t;
+ return count;
+}
+
+static struct kobj_attribute max_req_pages_attr =
+ __ATTR(max_pages_per_req, 0644, max_req_pages_show,
+ max_req_pages_store);
+
+static struct attribute *fuse_attrs[] = {
+ &max_req_pages_attr.attr,
+ NULL,
+};
+
+static struct attribute_group fuse_attr_grp = {
+ .attrs = fuse_attrs,
+};
+
static int fuse_sysfs_init(void)
{
int err;
@@ -1219,8 +1262,14 @@ static int fuse_sysfs_init(void)
goto out_fuse_unregister;
}

+ err = sysfs_create_group(fuse_kobj, &fuse_attr_grp);
+ if (err)
+ goto out_conn_unregister;
+
return 0;

+ out_conn_unregister:
+ kobject_put(connections_kobj);
out_fuse_unregister:
kobject_put(fuse_kobj);
out_err:
Mitsuo Hayasaka
2012-07-05 10:50:36 UTC
Permalink
Currently, the maximum read/write request size is limited to
FUSE_MAX_PAGES_PER_REQ which is equal to 32 pages. It is required to
change it in order to maximize the throughput since the optimized value
depends on various factors such as type and version of local filesystems
used and hardware specs, etc.

In addition, recently FUSE is widely used as a gateway to connect
cloud storage services and distributed filesystems. Larger data might be
stored in them over networking via FUSE and the overhead might affect the
read/write throughput.

This patch makes it tunable from 32 to 256 pages per mount.
The mount options of max_read or max_write affects it. The 32 pages
are used by default without these options.

Signed-off-by: Mitsuo Hayasaka <***@hitachi.com>
Cc: Miklos Szeredi <***@szeredi.hu>
---

fs/fuse/dev.c | 27 ++++++++++++++-------------
fs/fuse/file.c | 32 +++++++++++++++++---------------
fs/fuse/fuse_i.h | 29 +++++++++++++++++++----------
fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++-------
4 files changed, 83 insertions(+), 45 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 7df2b5e..511560b 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -34,35 +34,36 @@ static struct fuse_conn *fuse_get_conn(struct file *file)
return file->private_data;
}

-static void fuse_request_init(struct fuse_req *req)
+static void fuse_request_init(struct fuse_conn *fc, struct fuse_req *req)
{
- memset(req, 0, sizeof(*req));
+ memset(req, 0, fc->fuse_req_size);
INIT_LIST_HEAD(&req->list);
INIT_LIST_HEAD(&req->intr_entry);
init_waitqueue_head(&req->waitq);
atomic_set(&req->count, 1);
}

-struct fuse_req *fuse_request_alloc(void)
+struct fuse_req *fuse_request_alloc(struct fuse_conn *fc)
{
- struct fuse_req *req = kmem_cache_alloc(fuse_req_cachep, GFP_KERNEL);
+ struct fuse_req *req = kmalloc(fc->fuse_req_size, GFP_KERNEL);
+
if (req)
- fuse_request_init(req);
+ fuse_request_init(fc, req);
return req;
}
EXPORT_SYMBOL_GPL(fuse_request_alloc);

-struct fuse_req *fuse_request_alloc_nofs(void)
+struct fuse_req *fuse_request_alloc_nofs(struct fuse_conn *fc)
{
- struct fuse_req *req = kmem_cache_alloc(fuse_req_cachep, GFP_NOFS);
+ struct fuse_req *req = kmalloc(fc->fuse_req_size, GFP_NOFS);
if (req)
- fuse_request_init(req);
+ fuse_request_init(fc, req);
return req;
}

void fuse_request_free(struct fuse_req *req)
{
- kmem_cache_free(fuse_req_cachep, req);
+ kfree(req);
}

static void block_sigs(sigset_t *oldset)
@@ -116,7 +117,7 @@ struct fuse_req *fuse_get_req(struct fuse_conn *fc)
if (!fc->connected)
goto out;

- req = fuse_request_alloc();
+ req = fuse_request_alloc(fc);
err = -ENOMEM;
if (!req)
goto out;
@@ -166,7 +167,7 @@ static void put_reserved_req(struct fuse_conn *fc, struct fuse_req *req)
struct fuse_file *ff = file->private_data;

spin_lock(&fc->lock);
- fuse_request_init(req);
+ fuse_request_init(fc, req);
BUG_ON(ff->reserved_req);
ff->reserved_req = req;
wake_up_all(&fc->reserved_req_waitq);
@@ -193,7 +194,7 @@ struct fuse_req *fuse_get_req_nofail(struct fuse_conn *fc, struct file *file)

atomic_inc(&fc->num_waiting);
wait_event(fc->blocked_waitq, !fc->blocked);
- req = fuse_request_alloc();
+ req = fuse_request_alloc(fc);
if (!req)
req = get_reserved_req(fc, file);

@@ -1564,7 +1565,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
else if (outarg->offset + num > file_size)
num = file_size - outarg->offset;

- while (num && req->num_pages < FUSE_MAX_PAGES_PER_REQ) {
+ while (num && req->num_pages < fc->max_pages) {
struct page *page;
unsigned int this_num;

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b321a68..7b96b00 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -57,7 +57,7 @@ struct fuse_file *fuse_file_alloc(struct fuse_conn *fc)
return NULL;

ff->fc = fc;
- ff->reserved_req = fuse_request_alloc();
+ ff->reserved_req = fuse_request_alloc(fc);
if (unlikely(!ff->reserved_req)) {
kfree(ff);
return NULL;
@@ -653,7 +653,7 @@ static int fuse_readpages_fill(void *_data, struct page *page)
fuse_wait_on_page_writeback(inode, page->index);

if (req->num_pages &&
- (req->num_pages == FUSE_MAX_PAGES_PER_REQ ||
+ (req->num_pages == fc->max_pages ||
(req->num_pages + 1) * PAGE_CACHE_SIZE > fc->max_read ||
req->pages[req->num_pages - 1]->index + 1 != page->index)) {
fuse_send_readpages(req, data->file);
@@ -866,7 +866,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_req *req,
if (!fc->big_writes)
break;
} while (iov_iter_count(ii) && count < fc->max_write &&
- req->num_pages < FUSE_MAX_PAGES_PER_REQ && offset == 0);
+ req->num_pages < fc->max_pages && offset == 0);

return count > 0 ? count : err;
}
@@ -1020,8 +1020,9 @@ static void fuse_release_user_pages(struct fuse_req *req, int write)
}
}

-static int fuse_get_user_pages(struct fuse_req *req, const char __user *buf,
- size_t *nbytesp, int write)
+static int fuse_get_user_pages(struct fuse_conn *fc, struct fuse_req *req,
+ const char __user *buf, size_t *nbytesp,
+ int write)
{
size_t nbytes = *nbytesp;
unsigned long user_addr = (unsigned long) buf;
@@ -1038,9 +1039,9 @@ static int fuse_get_user_pages(struct fuse_req *req, const char __user *buf,
return 0;
}

- nbytes = min_t(size_t, nbytes, FUSE_MAX_PAGES_PER_REQ << PAGE_SHIFT);
+ nbytes = min_t(size_t, nbytes, fc->max_pages << PAGE_SHIFT);
npages = (nbytes + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
- npages = clamp(npages, 1, FUSE_MAX_PAGES_PER_REQ);
+ npages = clamp(npages, 1, (int)fc->max_pages);
npages = get_user_pages_fast(user_addr, npages, !write, req->pages);
if (npages < 0)
return npages;
@@ -1077,7 +1078,7 @@ ssize_t fuse_direct_io(struct file *file, const char __user *buf,
size_t nres;
fl_owner_t owner = current->files;
size_t nbytes = min(count, nmax);
- int err = fuse_get_user_pages(req, buf, &nbytes, write);
+ int err = fuse_get_user_pages(fc, req, buf, &nbytes, write);
if (err) {
res = err;
break;
@@ -1269,7 +1270,7 @@ static int fuse_writepage_locked(struct page *page)

set_page_writeback(page);

- req = fuse_request_alloc_nofs();
+ req = fuse_request_alloc_nofs(fc);
if (!req)
goto err;

@@ -1695,10 +1696,11 @@ static int fuse_copy_ioctl_iovec_old(struct iovec *dst, void *src,
}

/* Make sure iov_length() won't overflow */
-static int fuse_verify_ioctl_iov(struct iovec *iov, size_t count)
+static int fuse_verify_ioctl_iov(struct fuse_conn *fc, struct iovec *iov,
+ size_t count)
{
size_t n;
- u32 max = FUSE_MAX_PAGES_PER_REQ << PAGE_SHIFT;
+ u32 max = fc->max_pages << PAGE_SHIFT;

for (n = 0; n < count; n++) {
if (iov->iov_len > (size_t) max)
@@ -1821,7 +1823,7 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg,
BUILD_BUG_ON(sizeof(struct fuse_ioctl_iovec) * FUSE_IOCTL_MAX_IOV > PAGE_SIZE);

err = -ENOMEM;
- pages = kcalloc(FUSE_MAX_PAGES_PER_REQ, sizeof(pages[0]), GFP_KERNEL);
+ pages = kcalloc(fc->max_pages, sizeof(pages[0]), GFP_KERNEL);
iov_page = (struct iovec *) __get_free_page(GFP_KERNEL);
if (!pages || !iov_page)
goto out;
@@ -1860,7 +1862,7 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg,

/* make sure there are enough buffer pages and init request with them */
err = -ENOMEM;
- if (max_pages > FUSE_MAX_PAGES_PER_REQ)
+ if (max_pages > fc->max_pages)
goto out;
while (num_pages < max_pages) {
pages[num_pages] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
@@ -1943,11 +1945,11 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg,
in_iov = iov_page;
out_iov = in_iov + in_iovs;

- err = fuse_verify_ioctl_iov(in_iov, in_iovs);
+ err = fuse_verify_ioctl_iov(fc, in_iov, in_iovs);
if (err)
goto out;

- err = fuse_verify_ioctl_iov(out_iov, out_iovs);
+ err = fuse_verify_ioctl_iov(fc, out_iov, out_iovs);
if (err)
goto out;

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 771fb63..c96dc5f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -23,8 +23,11 @@
#include <linux/poll.h>
#include <linux/workqueue.h>

-/** Max number of pages that can be used in a single read request */
-#define FUSE_MAX_PAGES_PER_REQ 32
+/** Maximum number of pages that can be used in a single read/write request */
+#define FUSE_MAX_PAGES_PER_REQ 256
+
+/** Default number of pages that can be used in a single read/write request */
+#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32

/** Bias for fi->writectr, meaning new writepages must not be sent */
#define FUSE_NOWRITE INT_MIN
@@ -290,12 +293,6 @@ struct fuse_req {
struct fuse_lk_in lk_in;
} misc;

- /** page vector */
- struct page *pages[FUSE_MAX_PAGES_PER_REQ];
-
- /** number of pages in vector */
- unsigned num_pages;
-
/** offset of data on first page */
unsigned page_offset;

@@ -313,6 +310,12 @@ struct fuse_req {

/** Request is stolen from fuse_file->reserved_req */
struct file *stolen_file;
+
+ /** number of pages in vector */
+ unsigned num_pages;
+
+ /** page vector */
+ struct page *pages[0];
};

/**
@@ -347,6 +350,12 @@ struct fuse_conn {
/** Maximum write size */
unsigned max_write;

+ /** Maximum number of pages per req */
+ unsigned max_pages;
+
+ /** fuse_req size per connection */
+ unsigned fuse_req_size;
+
/** Readers of the connection are waiting on this */
wait_queue_head_t waitq;

@@ -655,9 +664,9 @@ void fuse_ctl_cleanup(void);
/**
* Allocate a request
*/
-struct fuse_req *fuse_request_alloc(void);
+struct fuse_req *fuse_request_alloc(struct fuse_conn *fc);

-struct fuse_req *fuse_request_alloc_nofs(void);
+struct fuse_req *fuse_request_alloc_nofs(struct fuse_conn *fc);

/**
* Free a request
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 1cd6165..aadf157 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -550,6 +550,9 @@ void fuse_conn_init(struct fuse_conn *fc)
atomic_set(&fc->num_waiting, 0);
fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
+ fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ;
+ fc->fuse_req_size = sizeof(struct fuse_req) +
+ fc->max_pages * sizeof(struct page *);
fc->khctr = 0;
fc->polled_files = RB_ROOT;
fc->reqctr = 0;
@@ -774,6 +777,16 @@ static int set_global_limit(const char *val, struct kernel_param *kp)
return 0;
}

+static void set_conn_max_pages(struct fuse_conn *fc, unsigned max_pages)
+{
+ if (max_pages > fc->max_pages) {
+ fc->max_pages = min_t(unsigned, FUSE_MAX_PAGES_PER_REQ,
+ max_pages);
+ fc->fuse_req_size = sizeof(struct fuse_req) +
+ fc->max_pages * sizeof(struct page *);
+ }
+}
+
static void process_init_limits(struct fuse_conn *fc, struct fuse_init_out *arg)
{
int cap_sys_admin = capable(CAP_SYS_ADMIN);
@@ -807,6 +820,7 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req)
fc->conn_error = 1;
else {
unsigned long ra_pages;
+ unsigned max_pages;

process_init_limits(fc, arg);

@@ -844,6 +858,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req)
fc->minor = arg->minor;
fc->max_write = arg->minor < 5 ? 4096 : arg->max_write;
fc->max_write = max_t(unsigned, 4096, fc->max_write);
+ max_pages = DIV_ROUND_UP(fc->max_write, PAGE_SIZE);
+ set_conn_max_pages(fc, max_pages);
fc->conn_init = 1;
}
fc->blocked = 0;
@@ -880,6 +896,20 @@ static void fuse_free_conn(struct fuse_conn *fc)
kfree(fc);
}

+static void fuse_conn_setup(struct fuse_conn *fc,
+ struct fuse_mount_data *d)
+{
+ unsigned max_pages;
+
+ fc->release = fuse_free_conn;
+ fc->flags = d->flags;
+ fc->user_id = d->user_id;
+ fc->group_id = d->group_id;
+ fc->max_read = max_t(unsigned, 4096, d->max_read);
+ max_pages = DIV_ROUND_UP(fc->max_read, PAGE_SIZE);
+ set_conn_max_pages(fc, max_pages);
+}
+
static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
{
int err;
@@ -986,11 +1016,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
fc->dont_mask = 1;
sb->s_flags |= MS_POSIXACL;

- fc->release = fuse_free_conn;
- fc->flags = d.flags;
- fc->user_id = d.user_id;
- fc->group_id = d.group_id;
- fc->max_read = max_t(unsigned, 4096, d.max_read);
+ fuse_conn_setup(fc, &d);

/* Used by get_root_inode() */
sb->s_fs_info = fc;
@@ -1003,12 +1029,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
/* only now - we want root dentry with NULL ->d_op */
sb->s_d_op = &fuse_dentry_operations;

- init_req = fuse_request_alloc();
+ init_req = fuse_request_alloc(fc);
if (!init_req)
goto err_put_root;

if (is_bdev) {
- fc->destroy_req = fuse_request_alloc();
+ fc->destroy_req = fuse_request_alloc(fc);
if (!fc->destroy_req)
goto err_free_init_req;
}
Loading...