piektdiena, 2012. gada 3. augusts

RAIDi un LVMi

Testējamies un meklējam optimālos piegājienus darbam ar RAID'iem.

DOTS:
Serveris ar SRCSASPH16i adapteri, 8 diski SATA diski  2TB 7200rpm 32MB Cache.

Kā labāk lietot - RAID6, RAID10, vai 4 x RAID1 un uzlikt LVM. Kā vienmēr  ir gan bojājumpiecietība, gan ātrums.
Galu galā nosliecos uz 4 RAID1 sējumiem, kas katrs ir no diviem diskiem, un tam virsū - LVM ar straipu. Domāju, ka kāda atsevišķa diska nobrukšanas gadījumā, daudz vieglāk un ātrāk būs ar LVM līdzekļiem norādīt kurus masīvus izmantot, nekā cerēt ka rezerves disks sinhronizēsies RAID10 vai RAID6 gadījumos.

Diski RAID kontrolierī konfigurēti kā atsevišķi JBOD, ko OS rāda kā:
/dev/sd{e,f,g,h,m,n,o,p}

1. RAID10


mdadm --create --verbose /dev/md6 --level=raid6  --raid-devices=8 /dev/sd{e,f,g,h,m,n,o,p}
Gaidam, kad pabeidzas sinhronizācija (>10h)
mkfs.ext4 /dev/md6
mount /dev/md6 /mnt/

TIOTEST skritps (CentOS6.3 tiobench kārās ar kļūdas ziņojumu par dalīšanu ar nulli)

for i in 1 2 4 8 ; do echo ================================= ; echo THREADS  $i ; echo SIZE PER THREAD $((8000/$i)) ; tiotest -d /mnt/ -f $((8000/$i))  -t $i  ; done
=================================
THREADS 1
SIZE PER THREAD 8000
Tiotest results for 1 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8000 MBs |   46.2 s | 173.178 MB/s |   1.3 %  |  32.2 % |
| Random Write    4 MBs |    2.5 s |   1.536 MB/s |   0.0 %  |   0.9 % |
| Read         8000 MBs |   10.3 s | 774.033 MB/s |   4.1 %  |  59.6 % |
| Random Read     4 MBs |    3.6 s |   1.090 MB/s |   0.0 %  |   0.2 % |
`----------------------------------------------------------------------'
=================================
THREADS 2
SIZE PER THREAD 4000
Tiotest results for 2 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8000 MBs |   62.0 s | 129.005 MB/s |   2.0 %  |  70.6 % |
| Random Write    8 MBs |    5.5 s |   1.415 MB/s |   0.0 %  |   0.4 % |
| Read         8000 MBs |   15.9 s | 504.067 MB/s |   5.5 %  |  76.5 % |
| Random Read     8 MBs |    4.0 s |   1.933 MB/s |   0.1 %  |   0.0 % |
`----------------------------------------------------------------------'
=================================
THREADS 4
SIZE PER THREAD 2000
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8000 MBs |  158.1 s |  50.591 MB/s |   2.6 %  | 136.7 % |
| Random Write   16 MBs |   11.7 s |   1.333 MB/s |   0.1 %  |   0.0 % |
| Read         8000 MBs |   16.5 s | 484.436 MB/s |   9.8 %  | 143.8 % |
| Random Read    16 MBs |    4.6 s |   3.421 MB/s |   0.0 %  |   0.0 % |
`----------------------------------------------------------------------'

=================================
THREADS 8
SIZE PER THREAD 1000
Tiotest results for 8 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8000 MBs |  208.1 s |  38.449 MB/s |   4.4 %  | 351.5 % |
| Random Write   31 MBs |   21.5 s |   1.454 MB/s |   0.2 %  |   9.2 % |
| Read         8000 MBs |   16.3 s | 491.984 MB/s |  21.2 %  | 284.4 % |
| Random Read    31 MBs |    5.5 s |   5.714 MB/s |   0.4 %  |   0.0 % |
`----------------------------------------------------------------------'

========
========
========

time  for D in `seq 1000 1999` ; do echo $D ; mkdir $D ; for F in `seq 1000 1234 1000000` ; do echo $D $F ; dd if=/dev/zero  bs=$F count=1 of=/mnt/A/$D/$F.txt ; done ; done
real    50m29.980s
user    4m2.111s
sys     36m16.997s

========
========
========


rsync -a --stats /mnt/A /mnt/C
..
sent 405221629673 bytes  received 15394018 bytes  52844366.39 bytes/sec
..

========
========
========

time cp -al /mnt/A /mnt/B 

real    1m0.912s
user    0m3.396s
sys     0m43.982s

========
========
========

time rm -rf /mnt/A  ; time rm -rf /mnt/B ; time rm -rf /mnt/C

real    0m49.380s
user    0m0.330s
sys     0m8.737s

real    0m59.611s
user    0m0.517s
sys     0m27.437s

real    1m15.621s
user    0m0.545s
sys     0m35.994s
=========================

RAID 10 

Sagatavojam:

umount /mnt
mdadm -S /dev/md6
mdadm --remove /dev/md6
mdadm --zero-superblock /dev/sd{e,f,g,h,m,n,o,p}
mdadm --create --verbose /dev/md6 --level=raid10  --raid-devices=8 /dev/sd{e,f,g,h,m,n,o,p}
## Gaidām, kad sinhronizējas
mkfs.ext4 /dev/md6
mount /dev/md6 /mnt

Testējam:

=================================
THREADS 1
SIZE PER THREAD 8000
Tiotest results for 1 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8000 MBs |   56.3 s | 142.088 MB/s |   1.0 %  |  26.9 % |
| Random Write    4 MBs |    0.2 s |  18.616 MB/s |   0.5 %  |   6.7 % |
| Read         8000 MBs |   17.2 s | 466.319 MB/s |   2.7 %  |  37.4 % |
| Random Read     4 MBs |    4.0 s |   0.976 MB/s |   0.0 %  |   0.2 % |
`----------------------------------------------------------------------'
=================================
THREADS 2
SIZE PER THREAD 4000
Tiotest results for 2 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8000 MBs |   54.0 s | 148.037 MB/s |   2.4 %  |  67.7 % |
| Random Write    8 MBs |    0.7 s |  11.882 MB/s |   0.3 %  |   0.0 % |
| Read         8000 MBs |   14.8 s | 540.108 MB/s |   5.5 %  |  88.9 % |
| Random Read     8 MBs |    4.2 s |   1.882 MB/s |   0.0 %  |   0.0 % |
`----------------------------------------------------------------------'
=================================
THREADS 4
SIZE PER THREAD 2000
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8000 MBs |   54.2 s | 147.615 MB/s |   8.0 %  | 188.9 % |
| Random Write   16 MBs |    3.5 s |   4.501 MB/s |   0.5 %  |   0.0 % |
| Read         8000 MBs |   16.1 s | 496.289 MB/s |   7.2 %  | 158.6 % |
| Random Read    16 MBs |    4.8 s |   3.236 MB/s |   0.0 %  |   0.0 % |
`----------------------------------------------------------------------'
=================================
THREADS 8
SIZE PER THREAD 1000
Tiotest results for 8 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8000 MBs |   56.7 s | 141.097 MB/s |  20.4 %  | 490.5 % |
| Random Write   31 MBs |    5.6 s |   5.541 MB/s |   1.3 %  |   0.0 % |
| Read         8000 MBs |   16.3 s | 492.223 MB/s |  12.2 %  | 301.0 % |
| Random Read    31 MBs |    5.6 s |   5.559 MB/s |   1.4 %  |   0.0 % |
`----------------------------------------------------------------------'

Failu veidošana:
========
========
========

time  for D in `seq 1000 1999` ; do echo $D ; mkdir $D ; for F in `seq 1000 1234 1000000` ; do echo $D $F ; dd if=/dev/zero  bs=$F count=1 of=/mnt/A/$D/$F.txt ; done ; done

real    42m58.039s
user    3m52.933s
sys     26m18.912s

========
========
========
rsync -a --stats /mnt/A /mnt/C
...
sent 405221480087 bytes  received 15394018 bytes  51606096.67 bytes/sec
...

========
========
========
time cp -al /mnt/A /mnt/B 

real    1m21.329s
user    0m3.292s
sys     1m8.058s

========
========
========

for i in A B C ; do time rm -rf /mnt/$i ; done 

real    0m12.508s
user    0m0.362s
sys     0m7.445s

real    0m36.581s
user    0m0.463s
sys     0m25.834s

real    1m1.690s
user    0m0.570s
sys     0m47.764s

LVM uz RAID1 masīviem:
mdadm -S /dev/md6
mdadm --remove /dev/md6
mdadm --zero-superblock /dev/sd{e,f,g,h,m,n,o,p}
mdadm --create --verbose /dev/md11 --level=raid1 --raid-devices=2 /dev/sd{e,m}
mdadm --create --verbose /dev/md12 --level=raid1 --raid-devices=2 /dev/sd{f,n}
mdadm --create --verbose /dev/md13 --level=raid1 --raid-devices=2 /dev/sd{g,o}
mdadm --create --verbose /dev/md14 --level=raid1 --raid-devices=2 /dev/sd{h,p}
Izveidojam PV un VG:
pvcreate /dev/md11
pvcreate /dev/md12
pvcreate /dev/md13
pvcreate /dev/md14

vgcreate test /dev/md11 /dev/md12 /dev/md13 /dev/md14
Izveidojam 3 LV sējumus - bez un ar straipiem:
lvcreate      -L1T -n tests test
lvcreate -i 3 -L1T -n tests3 test
lvcreate -i 4 -L1T -n tests4 test
Testējam: Bez straipa:
mkfs.ext4 /dev/test/tests 

real    8m52.703s
user    0m1.248s
sys     0m17.725s


Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
2.6.32-279.2.1.el6.x86_64     8000  4096    1  133.91 12.61%     0.029     1308.61   0.00000  0.00000  1062
2.6.32-279.2.1.el6.x86_64     8000  4096    2  307.13 60.66%     0.025     1069.12   0.00000  0.00000   506
2.6.32-279.2.1.el6.x86_64     8000  4096    4  178.61 58.89%     0.080     1439.95   0.00000  0.00000   303
2.6.32-279.2.1.el6.x86_64     8000  4096    8  159.53 82.49%     0.159     2918.88   0.00005  0.00000   193

Random Reads
2.6.32-279.2.1.el6.x86_64     8000  4096    1    0.96 0.263%     4.076       17.82   0.00000  0.00000   363
2.6.32-279.2.1.el6.x86_64     8000  4096    2    1.94 0.074%     4.013       19.58   0.00000  0.00000  2604
2.6.32-279.2.1.el6.x86_64     8000  4096    4    2.31 0.340%     6.088       89.15   0.00000  0.00000   679
2.6.32-279.2.1.el6.x86_64     8000  4096    8    2.64 0.135%     8.743      144.99   0.00000  0.00000  1953

Sequential Writes
2.6.32-279.2.1.el6.x86_64     8000  4096    1   38.24 7.624%     0.095     1826.67   0.00000  0.00000   502
2.6.32-279.2.1.el6.x86_64     8000  4096    2   42.03 22.86%     0.172     8895.27   0.00093  0.00000   184
2.6.32-279.2.1.el6.x86_64     8000  4096    4   50.89 84.20%     0.268    12996.51   0.00200  0.00034    60
2.6.32-279.2.1.el6.x86_64     8000  4096    8   49.56 254.4%     0.541    18589.47   0.00474  0.00142    19

Random Writes
2.6.32-279.2.1.el6.x86_64     8000  4096    1    1.29 0.733%     0.005        0.03   0.00000  0.00000   176
2.6.32-279.2.1.el6.x86_64     8000  4096    2    1.44 0.312%     0.007        0.04   0.00000  0.00000   460
2.6.32-279.2.1.el6.x86_64     8000  4096    4    1.27 0.146%     0.009        0.05   0.00000  0.00000   868
2.6.32-279.2.1.el6.x86_64     8000  4096    8    1.46 0.223%     0.012       10.15   0.00000  0.00000   651
3 straipi:
time  mkfs.ext4 /dev/test/tests3

real    3m19.238s
user    0m1.241s
sys     0m17.049s


Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
2.6.32-279.2.1.el6.x86_64     8000  4096    1  287.61 28.51%     0.013      472.71   0.00000  0.00000  1009
2.6.32-279.2.1.el6.x86_64     8000  4096    2  292.54 61.72%     0.026      788.40   0.00000  0.00000   474
2.6.32-279.2.1.el6.x86_64     8000  4096    4  245.74 101.4%     0.063     1150.85   0.00000  0.00000   242
2.6.32-279.2.1.el6.x86_64     8000  4096    8  239.19 195.8%     0.125      597.14   0.00000  0.00000   122

Random Reads
2.6.32-279.2.1.el6.x86_64     8000  4096    1    1.04 0.434%     3.742       35.00   0.00000  0.00000   240
2.6.32-279.2.1.el6.x86_64     8000  4096    2    2.06 1.239%     3.668       30.53   0.00000  0.00000   166
2.6.32-279.2.1.el6.x86_64     8000  4096    4    3.68 0.895%     4.156      156.84   0.00000  0.00000   411
2.6.32-279.2.1.el6.x86_64     8000  4096    8    5.31 0.271%     5.207       60.89   0.00000  0.00000  1953

Sequential Writes
2.6.32-279.2.1.el6.x86_64     8000  4096    1  108.41 21.86%     0.033     2525.84   0.00015  0.00000   496
2.6.32-279.2.1.el6.x86_64     8000  4096    2   97.02 45.85%     0.073     3191.88   0.00015  0.00000   212
2.6.32-279.2.1.el6.x86_64     8000  4096    4  104.68 125.0%     0.134     6294.07   0.00103  0.00000    84
2.6.32-279.2.1.el6.x86_64     8000  4096    8   96.39 268.3%     0.272     9232.73   0.00229  0.00000    36

Random Writes
2.6.32-279.2.1.el6.x86_64     8000  4096    1    5.35 2.907%     0.005        0.03   0.00000  0.00000   184
2.6.32-279.2.1.el6.x86_64     8000  4096    2    5.31 5.636%     0.007        0.05   0.00000  0.00000    94
2.6.32-279.2.1.el6.x86_64     8000  4096    4    4.54 3.838%     0.009        0.06   0.00000  0.00000   118
2.6.32-279.2.1.el6.x86_64     8000  4096    8    5.40 1.106%     0.011        6.28   0.00000  0.00000   488
4 straipi
[root@bfsa ~]# cat t.4
mkfs.ext4 /dev/test/tests4

real    2m28.422s
user    0m1.211s
sys     0m17.627s


Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
2.6.32-279.2.1.el6.x86_64     8000  4096    1  352.56 34.89%     0.011      361.91   0.00000  0.00000  1010
2.6.32-279.2.1.el6.x86_64     8000  4096    2  300.66 61.63%     0.025      550.33   0.00000  0.00000   488
2.6.32-279.2.1.el6.x86_64     8000  4096    4  297.59 121.8%     0.052      594.57   0.00000  0.00000   244
2.6.32-279.2.1.el6.x86_64     8000  4096    8  297.59 242.5%     0.103      498.79   0.00000  0.00000   123

Random Reads
2.6.32-279.2.1.el6.x86_64     8000  4096    1    1.07 0.431%     3.651       37.60   0.00000  0.00000   248
2.6.32-279.2.1.el6.x86_64     8000  4096    2    2.02 0.710%     3.767       31.34   0.00000  0.00000   284
2.6.32-279.2.1.el6.x86_64     8000  4096    4    3.77 0.289%     3.991       38.33   0.00000  0.00000  1302
2.6.32-279.2.1.el6.x86_64     8000  4096    8    5.74 1.614%     5.079       68.38   0.00000  0.00000   355

Sequential Writes
2.6.32-279.2.1.el6.x86_64     8000  4096    1  140.81 28.79%     0.026     1226.49   0.00000  0.00000   489
2.6.32-279.2.1.el6.x86_64     8000  4096    2  160.22 76.74%     0.045     1535.99   0.00000  0.00000   209
2.6.32-279.2.1.el6.x86_64     8000  4096    4  160.26 208.7%     0.087     3242.12   0.00068  0.00000    77
2.6.32-279.2.1.el6.x86_64     8000  4096    8  143.09 462.8%     0.179     5060.46   0.00161  0.00000    31

Random Writes
2.6.32-279.2.1.el6.x86_64     8000  4096    1    7.26 3.252%     0.005        0.04   0.00000  0.00000   223
2.6.32-279.2.1.el6.x86_64     8000  4096    2    6.93 2.839%     0.007        0.04   0.00000  0.00000   244
2.6.32-279.2.1.el6.x86_64     8000  4096    4    7.16 0.549%     0.009        0.05   0.00000  0.00000  1302
2.6.32-279.2.1.el6.x86_64     8000  4096    8    5.77 0.590%     0.008        0.06   0.00000  0.00000   977
Failu veidošanas tests (veidojam 2x mazāk nekā iepriekšējos testos!):
for i in 4 3 1 ; do echo  ===== VEIDOJAM FAILUS $i ==== ; time  for D in `seq 1000 1499` ; do  mkdir /mnt/$i/A/$D ; for F in `seq 1000 1234 1000000` ; do  dd  status=noxfer if=/dev/zero  bs=$F count=1 of=/mnt/$i/A/$D/$F.txt 2>/dev/null ; done ; done ; done

===== VEIDOJAM FAILUS 4 (split 4) ====
real    22m9.756s
user    1m55.964s
sys     14m11.965s

===== VEIDOJAM FAILUS 3 (split 3) ====
real    30m37.752s
user    1m48.219s
sys     13m24.232s

===== VEIDOJAM FAILUS 1 (no split) ====
real    78m11.036s
user    1m45.851s
sys     12m59.008s
OK - šķiet LVM ar 3 vai 4 straipiem derētu. Pašreizējais sējumu izkārtojums:
lvs -o +seg_pe_ranges --segments 
  LV     VG   Attr     #Str Type    SSize PE Ranges                                                                              
  tests  test -wi-a---    1 linear  1,00t /dev/md11:0-262143                                                                     
  tests3 test -wi-a---    3 striped 1,00t /dev/md11:262144-349525 /dev/md12:0-87381 /dev/md13:0-87381                            
  tests4 test -wi-a---    4 striped 1,00t /dev/md11:349526-415061 /dev/md12:87382-152917 /dev/md13:87382-152917 /dev/md14:0-65535
Tātad provējam no VG izņemt vienu masīvu:

# vgreduce -d -v test /dev/md14
    Finding volume group "test"
    Using physical volume(s) on command line
  Physical volume "/dev/md14" still in use

:(
Skaidrs, seejums test/tests4 lieto /dev/md14 Provējam pārvietot:
# pvmove -v /dev/md14
    Finding volume group "test"
    Archiving volume group "test" metadata (seqno 4).
    Creating logical volume pvmove0
    Moving 65536 extents of logical volume test/tests4
  Insufficient suitable allocatable extents for logical volume : 65536 more required
  Unable to allocate mirror extents for pvmove0.
  Failed to convert pvmove LV to mirrored
Spriežot pēc visa, tā ir nevis kļūda, ka vairākus stripus nevar uzlikt uz viena PV, bet gan fīča, kas izlabota RHEL6/Centos6 ( https://bugzilla.redhat.com/show_bug.cgi?id=580155 ) Tātad ir jāmēģina samazināt LV sējumam tests4 straipus no 4 uz 3:
## Uzzinam lielumu extentos
lvdisplay /dev/test/tests4 | grep LE
  Current LE             262144

## Mainam stripu skaitu, saglabājot sējuma lielumu ?varbūt var uzreiz norādīt izmantojamos PV??
time lvextend -v -i 3 -l 262144 /dev/test/tests4 
    Finding volume group test
  New size (262144 extents) matches existing size (262144 extents)
  Run `lvextend --help' for more information.

real    0m2.186s
user    0m0.127s
sys     0m0.024s
[root@bfsa ~]# time lvextend -v -i 3 -l 262146 /dev/test/tests4 
    Finding volume group test
  Using stripesize of last segment 64,00 KiB
  Rounding size (262146 extents) up to stripe boundary size for segment (262147 extents)
    Archiving volume group "test" metadata (seqno 4).
  Extending logical volume tests4 to 1,00 TiB
    Found volume group "test"
    Found volume group "test"
    Loading test-tests4 table (253:2)
    Suspending test-tests4 (253:2) with device flush
    Found volume group "test"
    Resuming test-tests4 (253:2)
    Creating volume group backup "/etc/lvm/backup/test" (seqno 5).
  Logical volume tests4 successfully resized

real    0m2.626s
user    0m0.133s
sys     0m0.047s
Provējam pārvietot disku
pvmove -v /dev/md14 /dev/md13
    Finding volume group "test"
    Archiving volume group "test" metadata (seqno 5).
    Creating logical volume pvmove0
    Moving 65536 extents of logical volume test/tests4
  Insufficient suitable allocatable extents for logical volume : 65536 more required
  Unable to allocate mirror extents for pvmove0.
  Failed to convert pvmove LV to mirrored


:(
# Skatamies, kas notiek:
lvs -o +seg_pe_ranges --segments
  LV     VG   Attr     #Str Type    SSize  PE Ranges                                                                              
  tests  test -wi-a---    1 linear   1,00t /dev/md11:0-262143                                                                     
  tests3 test -wi-a---    3 striped  1,00t /dev/md11:262144-349525 /dev/md12:0-87381 /dev/md13:0-87381                            
  tests4 test -wi-a---    4 striped  1,00t /dev/md11:349526-415061 /dev/md12:87382-152917 /dev/md13:87382-152917 /dev/md14:0-65535
  tests4 test -wi-a---    3 striped 12,00m /dev/md11:415062-415062 /dev/md12:152918-152918 /dev/md13:152918-152918