Need help troubleshooting slow I/O on Synology NAS device
Perhaps a bit of an unusual topic for this forum, but since a Synology NAS basically is a ‘stripped down’ Linux machine, I figured it wouldn’t hurt to try.
I’ve received a Synology DS218Play NAS from a friend of mine to fix. It died completely from one day to the other. Lights would blink, but he was unable to reach it via the web interface most of the time and if it worked, it was painfully slow.
The device has 2 HDDs (spinning rust) in RAID and I found one of these to be defective. It would basically hang up the entire system, so as soon as I removed that drive, it seemed to work fine. Until I tried to do some I/O intensive tasks (like downloading a file); I/O would cap out at 8 megabytes per second. Even for spinning rust on an ARM based machine that’s slow. So, I swapped the drive for a WD RED Plus (CMR) HDD. Still slow… now what?
I reset the NAS to factory settings, formatted the drive and started off fresh but still; as soon it needs to do some I/O intensive tasks (like restore a backup for example) it craps out. What could this be causing?
I ran some disk benchmarks using a script I found here: https://gist.github.com/gxfxyz/0d81429fc48a82ef6c2402b178a82566 but I’m having trouble interpreting the results:
sudo hdparm -Tt /dev/md3
/dev/md3:
Timing cached reads: 1114 MB in 2.00 seconds = 556.39 MB/sec
Timing buffered disk reads: 432 MB in 3.00 seconds = 143.82 MB/sec
# create a 1 GiB test file with random data (it can take minutes)
head -c 1G </dev/urandom >test
# clear cache
sync; echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
# read from test file, read 1 MiB each time, read 1024 times, total read 1 GiB
dd if=test of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 6.85157 s, 157 MB/s
# write 0 to test file, write 1 GiB each time, write 2 times, total write 2 GiB
dd if=/dev/zero of=test bs=1G count=2 oflag=dsync
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 67.5115 s, 31.8 MB/s
# write 0 to test file, write 128 MiB each time, write 8 times, total write 1 GiB
dd if=/dev/zero of=test bs=128M count=8 oflag=dsync
8+0 records in
8+0 records out
1073741824 bytes (1.1 GB) copied, 8.26623 s, 130 MB/s
# write 0 to test file, write 1 MiB each time, write 1024 times, total write 1 GiB
dd if=/dev/zero of=test bs=1M count=1024 oflag=dsync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 22.2873 s, 48.2 MB/s
# write 0 to test file, write 128 KiB each time, write 1024 times, total write 128 MiB
dd if=/dev/zero of=test bs=128k count=1024 oflag=dsync
1024+0 records in
1024+0 records out
134217728 bytes (134 MB) copied, 15.573 s, 8.6 MB/s
# write 0 to test file, write 4 KiB each time, write 1024 times, total write 4 MiB
dd if=/dev/zero of=test bs=4k count=1024 oflag=dsync
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 15.5282 s, 270 kB/s
# write 0 to test file, write 512 bytes each time, write 1024 times, total write 512 KiB
dd if=/dev/zero of=test bs=512 count=1024 oflag=dsync
1024+0 records in
1024+0 records out
524288 bytes (524 kB) copied, 17.2119 s, 30.5 kB/s
fio xfio.conf
seq-read-1m: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1
seq-write-1m: (g=1): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1
seq-read-4k: (g=2): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
seq-write-4k: (g=3): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
rand-read-1m: (g=4): rw=randread, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1
rand-write-1m: (g=5): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=psync, iodepth=1
rand-read-4k: (g=6): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
rand-write-4k: (g=7): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.13
Starting 8 processes
seq-read-1m: Laying out IO file(s) (1 file(s) / 5120MB)
seq-read-1m: (groupid=0, jobs=1): err= 0: pid=32244: Tue Jan 5 13:44:52 2021
read : io=5120.0MB, bw=162974KB/s, iops=159, runt= 32170msec
clat (msec): min=2, max=69, avg= 6.28, stdev= 2.14
lat (msec): min=2, max=69, avg= 6.28, stdev= 2.14
clat percentiles (usec):
| 1.00th=[ 5408], 5.00th=[ 5472], 10.00th=[ 5600], 20.00th=[ 5664],
| 30.00th=[ 5792], 40.00th=[ 6048], 50.00th=[ 6240], 60.00th=[ 6304],
| 70.00th=[ 6368], 80.00th=[ 6496], 90.00th=[ 6688], 95.00th=[ 6944],
| 99.00th=[ 8160], 99.50th=[12096], 99.90th=[47872], 99.95th=[50944],
| 99.99th=[69120]
bw (KB /s): min=117375, max=174384, per=98.83%, avg=161061.86, stdev=11020.53
lat (msec) : 4=0.16%, 10=99.34%, 20=0.20%, 50=0.25%, 100=0.06%
cpu : usr=0.09%, sys=4.48%, ctx=5128, majf=0, minf=274
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=5120/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
seq-write-1m: (groupid=1, jobs=1): err= 0: pid=32321: Tue Jan 5 13:44:52 2021
write: io=5120.0MB, bw=164819KB/s, iops=160, runt= 31810msec
clat (msec): min=2, max=48, avg= 6.05, stdev= 1.38
lat (msec): min=2, max=48, avg= 6.21, stdev= 1.38
clat percentiles (usec):
| 1.00th=[ 5280], 5.00th=[ 5344], 10.00th=[ 5408], 20.00th=[ 5472],
| 30.00th=[ 5600], 40.00th=[ 5920], 50.00th=[ 6048], 60.00th=[ 6176],
| 70.00th=[ 6240], 80.00th=[ 6368], 90.00th=[ 6560], 95.00th=[ 6816],
| 99.00th=[ 6944], 99.50th=[ 8256], 99.90th=[21632], 99.95th=[47872],
| 99.99th=[48384]
bw (KB /s): min=122529, max=164259, per=85.70%, avg=141244.06, stdev=6802.38
lat (msec) : 4=0.23%, 10=99.43%, 20=0.21%, 50=0.12%
cpu : usr=2.96%, sys=1.82%, ctx=5154, majf=0, minf=17
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=5120/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
seq-read-4k: (groupid=2, jobs=1): err= 0: pid=32375: Tue Jan 5 13:44:52 2021
read : io=1024.0MB, bw=76316KB/s, iops=19078, runt= 13740msec
clat (usec): min=45, max=41219, avg=48.55, stdev=159.47
lat (usec): min=45, max=41220, avg=49.08, stdev=159.47
clat percentiles (usec):
| 1.00th=[ 45], 5.00th=[ 45], 10.00th=[ 46], 20.00th=[ 46],
| 30.00th=[ 46], 40.00th=[ 46], 50.00th=[ 46], 60.00th=[ 46],
| 70.00th=[ 46], 80.00th=[ 47], 90.00th=[ 47], 95.00th=[ 49],
| 99.00th=[ 76], 99.50th=[ 78], 99.90th=[ 89], 99.95th=[ 95],
| 99.99th=[ 1512]
bw (KB /s): min=42861, max=74722, per=91.51%, avg=69832.15, stdev=6667.91
lat (usec) : 50=95.49%, 100=4.47%, 250=0.02%, 500=0.01%, 750=0.01%
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=9.10%, sys=39.67%, ctx=262179, majf=0, minf=19
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=262144/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
seq-write-4k: (groupid=3, jobs=1): err= 0: pid=32406: Tue Jan 5 13:44:52 2021
write: io=1024.0MB, bw=70097KB/s, iops=17524, runt= 14959msec
clat (usec): min=50, max=39594, avg=53.15, stdev=119.00
lat (usec): min=50, max=39596, avg=53.78, stdev=119.01
clat percentiles (usec):
| 1.00th=[ 50], 5.00th=[ 51], 10.00th=[ 51], 20.00th=[ 51],
| 30.00th=[ 51], 40.00th=[ 51], 50.00th=[ 51], 60.00th=[ 51],
| 70.00th=[ 52], 80.00th=[ 52], 90.00th=[ 53], 95.00th=[ 55],
| 99.00th=[ 81], 99.50th=[ 89], 99.90th=[ 131], 99.95th=[ 145],
| 99.99th=[ 1288]
bw (KB /s): min=53700, max=67940, per=93.76%, avg=65722.48, stdev=3553.70
lat (usec) : 100=99.83%, 250=0.15%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=9.09%, sys=26.41%, ctx=262191, majf=0, minf=23
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
rand-read-1m: (groupid=4, jobs=1): err= 0: pid=32439: Tue Jan 5 13:44:52 2021
read : io=1024.0MB, bw=76869KB/s, iops=75, runt= 13641msec
clat (msec): min=2, max=83, avg=13.31, stdev= 5.13
lat (msec): min=2, max=83, avg=13.31, stdev= 5.13
clat percentiles (usec):
| 1.00th=[ 2160], 5.00th=[ 6368], 10.00th=[ 8512], 20.00th=[ 9792],
| 30.00th=[11072], 40.00th=[12224], 50.00th=[13376], 60.00th=[14272],
| 70.00th=[15552], 80.00th=[16768], 90.00th=[17792], 95.00th=[18304],
| 99.00th=[25728], 99.50th=[36608], 99.90th=[58624], 99.95th=[83456],
| 99.99th=[83456]
bw (KB /s): min=61188, max=86383, per=88.10%, avg=67719.48, stdev=5093.78
lat (msec) : 4=2.83%, 10=19.53%, 20=76.17%, 50=1.17%, 100=0.29%
cpu : usr=0.22%, sys=1.91%, ctx=1027, majf=0, minf=272
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1024/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
rand-write-1m: (groupid=5, jobs=1): err= 0: pid=32465: Tue Jan 5 13:44:52 2021
write: io=1024.0MB, bw=119523KB/s, iops=116, runt= 8773msec
clat (msec): min=2, max=49, avg= 8.39, stdev= 5.32
lat (msec): min=2, max=49, avg= 8.56, stdev= 5.32
clat percentiles (usec):
| 1.00th=[ 2288], 5.00th=[ 2288], 10.00th=[ 2288], 20.00th=[ 2384],
| 30.00th=[ 6048], 40.00th=[ 7520], 50.00th=[ 7968], 60.00th=[ 8512],
| 70.00th=[ 9664], 80.00th=[11456], 90.00th=[15040], 95.00th=[17792],
| 99.00th=[24448], 99.50th=[27520], 99.90th=[48896], 99.95th=[49408],
| 99.99th=[49408]
bw (KB /s): min=76487, max=176390, per=77.51%, avg=92647.88, stdev=22157.86
lat (msec) : 4=24.32%, 10=48.05%, 20=25.00%, 50=2.64%
cpu : usr=2.05%, sys=1.60%, ctx=1029, majf=0, minf=17
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1024/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
rand-read-4k: (groupid=6, jobs=1): err= 0: pid=32489: Tue Jan 5 13:44:52 2021
read : io=102400KB, bw=1140.2KB/s, iops=285, runt= 89812msec
clat (usec): min=70, max=327343, avg=3502.41, stdev=4495.53
lat (usec): min=70, max=327345, avg=3503.04, stdev=4495.54
clat percentiles (usec):
| 1.00th=[ 74], 5.00th=[ 83], 10.00th=[ 87], 20.00th=[ 90],
| 30.00th=[ 92], 40.00th=[ 94], 50.00th=[ 1768], 60.00th=[ 3856],
| 70.00th=[ 5792], 80.00th=[ 7840], 90.00th=[ 9664], 95.00th=[10816],
| 99.00th=[11584], 99.50th=[11712], 99.90th=[12352], 99.95th=[36096],
| 99.99th=[52992]
bw (KB /s): min= 333, max= 4192, per=75.94%, avg=865.74, stdev=313.03
lat (usec) : 100=43.69%, 250=3.53%, 500=0.46%, 750=0.04%, 1000=0.04%
lat (msec) : 2=3.30%, 4=9.80%, 10=30.63%, 20=8.44%, 50=0.06%
lat (msec) : 100=0.01%, 500=0.01%
cpu : usr=0.21%, sys=0.80%, ctx=25613, majf=0, minf=20
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=25600/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
rand-write-4k: (groupid=7, jobs=1): err= 0: pid=32698: Tue Jan 5 13:44:52 2021
write: io=102400KB, bw=2853.5KB/s, iops=713, runt= 35887msec
clat (usec): min=77, max=66798, avg=1396.62, stdev=2198.08
lat (usec): min=78, max=66799, avg=1397.36, stdev=2198.08
clat percentiles (usec):
| 1.00th=[ 86], 5.00th=[ 94], 10.00th=[ 97], 20.00th=[ 104],
| 30.00th=[ 109], 40.00th=[ 118], 50.00th=[ 1128], 60.00th=[ 1288],
| 70.00th=[ 1496], 80.00th=[ 2008], 90.00th=[ 3248], 95.00th=[ 4512],
| 99.00th=[11456], 99.50th=[13888], 99.90th=[22400], 99.95th=[23936],
| 99.99th=[41728]
bw (KB /s): min= 2232, max= 5128, per=98.43%, avg=2808.11, stdev=400.27
lat (usec) : 100=14.48%, 250=30.50%, 500=0.16%, 750=0.06%, 1000=0.03%
lat (msec) : 2=34.76%, 4=13.58%, 10=5.06%, 20=1.18%, 50=0.18%
lat (msec) : 100=0.01%
cpu : usr=0.47%, sys=1.31%, ctx=25615, majf=0, minf=16
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=25600/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: io=5120.0MB, aggrb=162974KB/s, minb=162974KB/s, maxb=162974KB/s, mint=32170msec, maxt=32170msec
Run status group 1 (all jobs):
WRITE: io=5120.0MB, aggrb=164818KB/s, minb=164818KB/s, maxb=164818KB/s, mint=31810msec, maxt=31810msec
Run status group 2 (all jobs):
READ: io=1024.0MB, aggrb=76315KB/s, minb=76315KB/s, maxb=76315KB/s, mint=13740msec, maxt=13740msec
Run status group 3 (all jobs):
WRITE: io=1024.0MB, aggrb=70096KB/s, minb=70096KB/s, maxb=70096KB/s, mint=14959msec, maxt=14959msec
Run status group 4 (all jobs):
READ: io=1024.0MB, aggrb=76869KB/s, minb=76869KB/s, maxb=76869KB/s, mint=13641msec, maxt=13641msec
Run status group 5 (all jobs):
WRITE: io=1024.0MB, aggrb=119523KB/s, minb=119523KB/s, maxb=119523KB/s, mint=8773msec, maxt=8773msec
Run status group 6 (all jobs):
READ: io=102400KB, aggrb=1140KB/s, minb=1140KB/s, maxb=1140KB/s, mint=89812msec, maxt=89812msec
Run status group 7 (all jobs):
WRITE: io=102400KB, aggrb=2853KB/s, minb=2853KB/s, maxb=2853KB/s, mint=35887msec, maxt=35887msec
Disk stats (read/write):
md3: ios=300032/299963, merge=0/0, ticks=177520/109410, in_queue=285050, util=93.63%, aggrios=300056/300101, aggrmerge=4/21, aggrticks=178200/108570, aggrin_queue=286380, aggrutil=93.39%
sda: ios=300056/300101, merge=4/21, ticks=178200/108570, in_queue=286380, util=93.39%
For me, as a some-what noob, these look ‘fine’. Am I missing the bigger picture?
All help and pointers are welcome ?
Thanks!
Comments
Are you running ZFS? This has considerable overhead, especially if you are running it with compression and de-duplication.
Cheap dedis are my drug, and I'm too far gone to turn back.
Hey, What are the Temperatures of the Subject in pain?
I had one time, that the Dumb thing ran hot and down clock itself to a death machine.
After I blasted it with a compressor it ran like a charm again. It may or may not help it in this case.
Grab your vmWare toblerone today.
No, I'm afraid not. I'm running ext4.
Good one! But according to DSM the temps are 'Normal' (40 degrees celcius).