Discussion:
Fuse File System Struck in the Read System Call when Write back Cache is Enable
(too old to reply)
Akil Raza
2017-03-07 04:22:01 UTC
Permalink
Hi
I am benchmarking the Read Write Performance using Iozone on the FUSE File
system using 3.0 Library. I could see the Fuse File system is struck in the
read /Splice Call.
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
buf=***@entry=0x7fd398000900, ch=0x0) at fuse_lowlevel.c:2649
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3980008e0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fd3dbfff700 (LWP 13334)):
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
buf=***@entry=0x7fd3bc000900, ch=0x0) at fuse_lowlevel.c:2649
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3bc0008e0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fd3db7fe700 (LWP 13336)):
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
buf=***@entry=0x7fd398000960, ch=0x0) at fuse_lowlevel.c:2649
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd398000940) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fd3da7fc700 (LWP 13338)):
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
buf=***@entry=0x7fd3c8000ed0, ch=0x0) at fuse_lowlevel.c:2649
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3c8000eb0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fd3b7fff700 (LWP 13339)):
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
buf=***@entry=0x7fd3cc000900, ch=0x0) at fuse_lowlevel.c:2649
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3cc0008e0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fd3f0e87700 (LWP 13342)):
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
buf=***@entry=0x7fd3b00009c0, ch=0x0) at fuse_lowlevel.c:2649
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3b00009a0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fd3f578c740 (LWP 12443)):
Underlying Kernel is 4.10.1.
This problem is seen when different file size is being kept for performance
measurement.
Fuse is initialize with the below call
#ifdef FUSE_3
/*conn->capable |= FUSE_CAP_ASYNC_READ;*/
conn->want |= FUSE_CAP_WRITEBACK_CACHE;
/*conn->want &= ~(FUSE_CAP_SPLICE_READ);*/
cfg->nullpath_ok = 1;
The Problem is not seen on Virtual Machine but clearly visible when run on
the Physical Machine .
Can I get some inputs to overcome this problem ??
--
Potentially Secured from threat!!!
Miklos Szeredi
2017-03-07 08:37:30 UTC
Permalink
Post by Akil Raza
Hi
I am benchmarking the Read Write Performance using Iozone on the FUSE File
system using 3.0 Library. I could see the Fuse File system is struck in the
read /Splice Call.
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3980008e0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3bc0008e0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd398000940) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3c8000eb0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3cc0008e0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
#0 0x00007fd3f4409043 in splice () from /lib64/libc.so.6
#1 0x00007fd3f4f1ef00 in fuse_session_receive_buf_int (se=0x211f810,
#2 0x00007fd3f4f1a1cf in fuse_do_work (data=0x7fd3b00009a0) at
fuse_loop_mt.c:123
#3 0x00007fd3f5148dc5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fd3f440873d in clone () from /lib64/libc.so.6
Underlying Kernel is 4.10.1.
This problem is seen when different file size is being kept for performance
measurement.
Fuse is initialize with the below call
#ifdef FUSE_3
/*conn->capable |= FUSE_CAP_ASYNC_READ;*/
conn->want |= FUSE_CAP_WRITEBACK_CACHE;
/*conn->want &= ~(FUSE_CAP_SPLICE_READ);*/
cfg->nullpath_ok = 1;
The Problem is not seen on Virtual Machine but clearly visible when run on
the Physical Machine .
Can I get some inputs to overcome this problem ??
What's the kernel version?
Try and get a kernel trace when things are stuck:

echo t > /proc/sysrq-trigger
dmesg > /tmp/dmesg

Thanks,
Miklos
Post by Akil Raza
--
Potentially Secured from threat!!!
------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
--
fuse-devel mailing list
To unsubscribe or subscribe, visit
https://lists.sourceforge.net/lists/listinfo/fuse-devel
------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Miklos Szeredi
2017-03-09 13:30:42 UTC
Permalink
Hi
I had detailed the System set up below
root 8954 8953 0 07:59 pts/0 00:00:00 iozone -i 0 -i 1 -i 2 -r
128k -s 8g -t 1 -b output_encrypted_8g_1t.xls
root 8955 8954 0 07:59 pts/0 00:00:00 iozone -i 0 -i 1 -i 2 -r
128k -s 8g -t 1 -b output_encrypted_8g_1t.xls
root 9346 999 0 08:14 pts/1 00:00:00 grep --color=auto iozone
Linux labcentos7base.localdomain 4.10.1-1.el7.elrepo.x86_64 #1 SMP Sun Feb
26 19:47:48 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
total used free shared buff/cache
available
Mem: 125G 978M 124G 66M 762M
123G
Swap: 4.0G 0B 4.0G
Attach is the dmesg file having the kernel trace
dfp_* are all user space daemon ( those instance are present in the dmesg
file )
I dint see any problem with the 2.9 Lib . The Problem is seen when the write
back cache is enabled in 3.0 Lib
Kernel side of things look fine in untruncated dmesg.

Possibly a libfuse-3 issue.

Can you please get a fuse debug log (-d)?

Thanks,
Miklos

------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Akil Raza
2017-05-23 13:18:39 UTC
Permalink
Write Back Cache Issue. Write Throttling issue seen
Kernel Version 3.18.4 . FUSE Lib 3.0
From the Stap Logs collected I could see the io_schedule_time is happening
. The BDI Queing is Proper , but the fuse_writepages is invoke with the
delay. Attached is the log (nws1) . System is having 128G of RAM .
One of the Write Back Related fix is delivered in Kernel 4.6 , using the
Kernel 4.10 I could see the same problem . Since I was facing some issue
while upgrading the kernel from 3.x to 4.x I am debugging the Problem on
kernel 3.x.
Hi
I had detailed the System set up below
root 8954 8953 0 07:59 pts/0 00:00:00 iozone -i 0 -i 1 -i 2 -r
128k -s 8g -t 1 -b output_encrypted_8g_1t.xls
root 8955 8954 0 07:59 pts/0 00:00:00 iozone -i 0 -i 1 -i 2 -r
128k -s 8g -t 1 -b output_encrypted_8g_1t.xls
root 9346 999 0 08:14 pts/1 00:00:00 grep --color=auto iozone
Linux labcentos7base.localdomain 4.10.1-1.el7.elrepo.x86_64 #1 SMP Sun
Feb
26 19:47:48 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
total used free shared buff/cache
available
Mem: 125G 978M 124G 66M 762M
123G
Swap: 4.0G 0B 4.0G
Attach is the dmesg file having the kernel trace
dfp_* are all user space daemon ( those instance are present in the dmesg
file )
I dint see any problem with the 2.9 Lib . The Problem is seen when the
write
back cache is enabled in 3.0 Lib
Kernel side of things look fine in untruncated dmesg.
Possibly a libfuse-3 issue.
Can you please get a fuse debug log (-d)?
Thanks,
Miklos
--
Potentially Secured from threat!!!
Akil Raza
2017-03-09 04:23:45 UTC
Permalink
More Detailed on the Problem
Below is the Stack Trace
root 8955 8954 0 Mar07 pts/0 00:00:44 iozone -i 0 -i 1 -i 2 -r
128k -s 8g -t 1 -b output_encrypted_8g_1t.xls
[***@labcentos7base ~]# cat /proc/8955/stack
[<ffffffff81232c7a>] balance_dirty_pages.isra.30+0x69e/0x764
[<ffffffff811b3bf7>] balance_dirty_pages_ratelimited+0x267/0x3b0
[<ffffffff811a5958>] generic_perform_write+0x138/0x1c0
[<ffffffff811a8a9b>] __generic_file_write_iter+0x18b/0x1e0
[<ffffffff811a8bca>] generic_file_write_iter+0xda/0x1c0
[<ffffffffa082d416>] fuse_file_write_iter+0x116/0x2d0 [fuse]
[<ffffffff81236ca2>] __vfs_write+0xe2/0x140
[<ffffffff81237ab2>] vfs_write+0xb2/0x1b0
[<ffffffff81238f35>] SyS_write+0x55/0xc0
[<ffffffff81003a47>] do_syscall_64+0x67/0x180
[<ffffffff817848ab>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

Kernel Version
[***@labcentos7base ~]# uname -a
Linux labcentos7base.localdomain 4.10.1-1.el7.elrepo.x86_64 #1 SMP Sun Feb
26 19:47:48 EST 2017 x86_64 x86_64 x86_64 GNU/L

[***@labcentos7base ~]# free -mh
total used free shared buff/cache
available
Mem: 125G 1.1G 120G 98M 4.6G
123G
Swap: 4.0G 0B 4.0G
--
Potentially Secured from threat!!!
Miklos Szeredi
2017-03-09 08:32:39 UTC
Permalink
Post by Akil Raza
More Detailed on the Problem
Below is the Stack Trace
root 8955 8954 0 Mar07 pts/0 00:00:44 iozone -i 0 -i 1 -i 2 -r
128k -s 8g -t 1 -b output_encrypted_8g_1t.xls
[<ffffffff81232c7a>] balance_dirty_pages.isra.30+0x69e/0x764
[<ffffffff811b3bf7>] balance_dirty_pages_ratelimited+0x267/0x3b0
[<ffffffff811a5958>] generic_perform_write+0x138/0x1c0
[<ffffffff811a8a9b>] __generic_file_write_iter+0x18b/0x1e0
[<ffffffff811a8bca>] generic_file_write_iter+0xda/0x1c0
[<ffffffffa082d416>] fuse_file_write_iter+0x116/0x2d0 [fuse]
[<ffffffff81236ca2>] __vfs_write+0xe2/0x140
[<ffffffff81237ab2>] vfs_write+0xb2/0x1b0
[<ffffffff81238f35>] SyS_write+0x55/0xc0
[<ffffffff81003a47>] do_syscall_64+0x67/0x180
[<ffffffff817848ab>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff
balance_dirty_pages() is stuck because writing stalled. Possibly
because the filesystem daemon is buggy and not answering a write
request.

Unfortunately the interesting part of the dmesg is truncated, because
of too small dmesg buffer size.

Please reboot with log_buf_len=4M added to the boot parameters.

Thanks,
Miklos

------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
Loading...