Discussion:
[fuse-devel] How could fuse support parallel opreation
Xing Jing
2011-12-17 09:06:24 UTC
Permalink
Hi, everyone, glad to meet you here.

I've adopted fuse to implement a distributed file system for about
three years. Fuse is really very good, it is easy to learn and use,
and user mode file system is much easier to develop.

At present, I met a problem about the performance of client in my
distributed file system. Although I used mulitple thread to read/write
or create files, only one request from the client can be submit to the
server at one time, which means even we config a very powerfull
computer as a client and mulitple servers to provide service, the
whole system still can only process request one by one as the
limitation of client.

And then, I checked the source code of fuse, I found that most of file
sysetm system calls which sent to user mode from fuse kernel are sent
in the block mode, could there be several queues to process the block
requests to provide a higher performance?

Does anyone else meet the same problem? If you have solutions to deal
with this, would you please tell me? Thank you very much.

Jing
Dec.17th
Goswin von Brederlow
2011-12-18 18:15:24 UTC
Permalink
Post by Xing Jing
Hi, everyone, glad to meet you here.
I've adopted fuse to implement a distributed file system for about
three years. Fuse is really very good, it is easy to learn and use,
and user mode file system is much easier to develop.
At present, I met a problem about the performance of client in my
distributed file system. Although I used mulitple thread to read/write
or create files, only one request from the client can be submit to the
server at one time, which means even we config a very powerfull
computer as a client and mulitple servers to provide service, the
whole system still can only process request one by one as the
limitation of client.
And then, I checked the source code of fuse, I found that most of file
sysetm system calls which sent to user mode from fuse kernel are sent
in the block mode, could there be several queues to process the block
requests to provide a higher performance?
Does anyone else meet the same problem? If you have solutions to deal
with this, would you please tell me? Thank you very much.
Jing
Dec.17th
Fusecan operate in 3 modes:

1) single threaded

Every request blocks until it is finished.

2) multi threaded

Libfuse starts (or reuses an idle one) a new thread for every request
and multiple requests can be processed in parallel, one per thread.
There is a limit on the number of pending requests and some operations
block others. The former can be changed iirc while the later is required
for correct behaviour and nothing can be done there.

3) asyncron

Libfuse doesn't have a read-to-use main loop for this. But you can
include the fuse FD in your select/poll/epoll loop and call the recieve
and process functions in libfuse yourself. You can read requests from
the FD and reply to them in any order. The kernel doesn't care. The
limit on the number of pending requests and locking between some
operation remain.

This is usefull with non-blocking IO like you have in a networked
filesystem, when your FS is just a repeater and doesn't need the cpu
power of multiple cores. The parallelity comes from interweaving the
requests in a single thread.


You also have the choice of using read/write or splice operations.
Esspecialy with splicing the kernel side will not be the bottleneck. A
single queue is way fast enough to splice all the requests you want in
no time. Even with read/write, which means memcpy() calls, I doubt you
can make your filesystem faster than the kernel can memcpy() from a
single queue.

In your case I bet splice would be applicable.

MfG
Goswin
Xing Jing
2011-12-19 16:06:34 UTC
Permalink
Dear Goswin,
Thank you very much for you help, your detailed answer help me
a lot. It sames that I can achieve my goals in the user mode and the
2rd and 3rd method will be easier for me. I will try them first and
report the result later. Thank you a million :)

Best Wishes

Jing
Dec.20th
Post by Goswin von Brederlow
Post by Xing Jing
Hi, everyone, glad to meet you here.
I've adopted fuse to implement a distributed file system for about
three years. Fuse is really very good, it is easy to learn and use,
and user mode file system is much easier to develop.
At present, I met a problem about the performance of client in my
distributed file system. Although I used mulitple thread to read/write
or create files, only one request from the client can be submit to the
server at one time, which means even we config a very powerfull
computer as a client and mulitple servers to provide service, the
whole system still can only process request one by one as the
limitation of client.
And then, I checked the source code of fuse, I found that most of file
sysetm system calls which sent to user mode from fuse kernel are sent
in the block mode, could there be several queues to process the block
requests to provide a higher performance?
Does anyone else meet the same problem? If you have solutions to deal
with this, would you please tell me? Thank you very much.
Jing
Dec.17th
1) single threaded
Every request blocks until it is finished.
2) multi threaded
Libfuse starts (or reuses an idle one) a new thread for every request
and multiple requests can be processed in parallel, one per thread.
There is a limit on the number of pending requests and some operations
block others. The former can be changed iirc while the later is required
for correct behaviour and nothing can be done there.
3) asyncron
Libfuse doesn't have a read-to-use main loop for this. But you can
include the fuse FD in your select/poll/epoll loop and call the recieve
and process functions in libfuse yourself. You can read requests from
the FD and reply to them in any order. The kernel doesn't care. The
limit on the number of pending requests and locking between some
operation remain.
This is usefull with non-blocking IO like you have in a networked
filesystem, when your FS is just a repeater and doesn't need the cpu
power of multiple cores. The parallelity comes from interweaving the
requests in a single thread.
You also have the choice of using read/write or splice operations.
Esspecialy with splicing the kernel side will not be the bottleneck. A
single queue is way fast enough to splice all the requests you want in
no time. Even with read/write, which means memcpy() calls, I doubt you
can make your filesystem faster than the kernel can memcpy() from a
single queue.
In your case I bet splice would be applicable.
MfG
       Goswin
Xing Jing
2011-12-25 09:44:10 UTC
Permalink
Hi, Goswin,
I've checked the source code of fuse. I find the default setting
of processing mode is multilple thread mode. However, it still can not
provide enough concurrency. As the dev.c and file.c in kernel/fs/fuse
shows, the submit of readpages requests is implemented as background
mode to provide a higher concurrency, and the write requests is
implemented as submitted one by one. Write request cann't be added
into the pending queue which is submitted in background. At first I
thought if I can submit write request in background I can have a
higher performance. So I change the function of fuse_send_write from
request_send to request_send_request_background way and down a sem in
the request to wait for the reply from the user mode. It can works.
But the concurrency isn't improved very much. There are at most 2
requests send to the usermode at one monment. What can I do next?

You have mentioned that I can use select/poll to receive the request
from kernel, do you means I have to create multilple channel in the
user mode to receive the requests? If so, that may means lots of work
to do.

Looking forward for your reply.

Best wishes and Merry Christmas

Jing
Post by Goswin von Brederlow
Post by Xing Jing
Hi, everyone, glad to meet you here.
I've adopted fuse to implement a distributed file system for about
three years. Fuse is really very good, it is easy to learn and use,
and user mode file system is much easier to develop.
At present, I met a problem about the performance of client in my
distributed file system. Although I used mulitple thread to read/write
or create files, only one request from the client can be submit to the
server at one time, which means even we config a very powerfull
computer as a client and mulitple servers to provide service, the
whole system still can only process request one by one as the
limitation of client.
And then, I checked the source code of fuse, I found that most of file
sysetm system calls which sent to user mode from fuse kernel are sent
in the block mode, could there be several queues to process the block
requests to provide a higher performance?
Does anyone else meet the same problem? If you have solutions to deal
with this, would you please tell me? Thank you very much.
Jing
Dec.17th
1) single threaded
Every request blocks until it is finished.
2) multi threaded
Libfuse starts (or reuses an idle one) a new thread for every request
and multiple requests can be processed in parallel, one per thread.
There is a limit on the number of pending requests and some operations
block others. The former can be changed iirc while the later is required
for correct behaviour and nothing can be done there.
3) asyncron
Libfuse doesn't have a read-to-use main loop for this. But you can
include the fuse FD in your select/poll/epoll loop and call the recieve
and process functions in libfuse yourself. You can read requests from
the FD and reply to them in any order. The kernel doesn't care. The
limit on the number of pending requests and locking between some
operation remain.
This is usefull with non-blocking IO like you have in a networked
filesystem, when your FS is just a repeater and doesn't need the cpu
power of multiple cores. The parallelity comes from interweaving the
requests in a single thread.
You also have the choice of using read/write or splice operations.
Esspecialy with splicing the kernel side will not be the bottleneck. A
single queue is way fast enough to splice all the requests you want in
no time. Even with read/write, which means memcpy() calls, I doubt you
can make your filesystem faster than the kernel can memcpy() from a
single queue.
In your case I bet splice would be applicable.
MfG
       Goswin
Miklos Szeredi
2012-01-02 15:52:02 UTC
Permalink
Post by Xing Jing
Hi, Goswin,
     I've checked the source code of fuse. I find the default setting
of processing mode is multilple thread mode. However, it still can not
provide enough concurrency. As the dev.c and file.c in kernel/fs/fuse
shows, the submit of readpages requests is implemented as background
mode to provide a higher concurrency, and the write requests is
implemented as submitted one by one. Write request cann't be added
into the pending queue which is submitted in background. At first I
thought if I can submit write request in background I can have a
higher performance. So I change the function of fuse_send_write from
request_send to  request_send_request_background way and down a sem in
the request to wait for the reply from the user mode. It can works.
But the concurrency isn't improved very much. There are at most 2
requests send to the usermode at one monment. What can I do next?
You have mentioned that I can use select/poll to receive the request
from kernel, do you means I have to create multilple channel in the
user mode to receive the requests? If so, that may means lots of work
to do.
You need not wait for the reply to the write request from the server,
which means many write requests can be going on in parallel. ->fsync()
needs to wait for outstanding write requests and I/O errors can be
returned from ->flush() or ->fsync()

Sshfs does such async writes (unless the -osshfs_sync option is
given). It shouldn't be difficult to implement in your filesystem and
definitely doesn't need any kernel changes.

Thanks,
Miklos
Xing Jing
2012-01-11 12:14:20 UTC
Permalink
Thank you very much for your reply.

Actually, I've modify the code as you said. I changed several lines in
the fs/fuse/file.c ,fs/fuse/dev.c and fs/fuse/fuse_i.h to let most
write requests sent in background way, and sem_down the sem to block
the thread until the reply is returned from the usespace. I've also
made some modifications on my client code to let multiple client mode
works together in a single machine, and it does help to improve the
throughput of a client node.

Best wishes
Jing
Post by Miklos Szeredi
Post by Xing Jing
Hi, Goswin,
     I've checked the source code of fuse. I find the default setting
of processing mode is multilple thread mode. However, it still can not
provide enough concurrency. As the dev.c and file.c in kernel/fs/fuse
shows, the submit of readpages requests is implemented as background
mode to provide a higher concurrency, and the write requests is
implemented as submitted one by one. Write request cann't be added
into the pending queue which is submitted in background. At first I
thought if I can submit write request in background I can have a
higher performance. So I change the function of fuse_send_write from
request_send to  request_send_request_background way and down a sem in
the request to wait for the reply from the user mode. It can works.
But the concurrency isn't improved very much. There are at most 2
requests send to the usermode at one monment. What can I do next?
You have mentioned that I can use select/poll to receive the request
from kernel, do you means I have to create multilple channel in the
user mode to receive the requests? If so, that may means lots of work
to do.
You need not wait for the reply to the write request from the server,
which means many write requests can be going on in parallel. ->fsync()
needs to wait for outstanding write requests and I/O errors can be
returned from ->flush() or ->fsync()
Sshfs does such async writes (unless the -osshfs_sync option is
given).  It shouldn't be difficult to implement in your filesystem and
definitely doesn't need any kernel changes.
Thanks,
Miklos
Loading...