Running a small set of benchmarks to find out what my ceph cluster is capable of.
Table of Contents
Motivation
I have been running a ceph cluster in my homelab for about 2 years now, but never properly benchmarked it, let alone wrote down my findings or any potential conclusions. I still don't know what kind of performance to expect or what would be considered good or expected for my setup, but having numbers without context is still better than no numbers at all.
Benchmark Setup
This covers all the steps I did to setup my benchmarking, so anyone could follow along and for me to reference later to repeat benchmarks properly.
Create Benchmark User in Ceph
On a machine in the ceph-cluster already:
- Generate a minimal ceph config using
ceph config generate-minimal-conf - Create a user for Benchmarking
- Create a new user
ceph auth add client.linux_pc - Edit the caps for my use-case
mgr:profile rbd pool=test-rbdmon:profile rbdosd:profile rbd pool=test-rbd
- Get the keyring configuration using
ceph auth get client.linux_pc
- Create a new user
Configure Ceph as a client on the Benchmark machine
On the client machine doing the Benchmarking:
- Install basic ceph tools
apt-get install -y ceph-common - Load the rbd kernel module
modprobe rbd - Setup local ceph config
- Copy the generated configuration to
/etc/ceph/ceph.conf chmod 644 /etc/ceph/ceph.conf
- Copy the generated configuration to
- Setup local ceph keyring
- Copy the keyring configuratio nto
/etc/ceph/ceph.client.linux_pc.keyring chmod 644 /etc/ceph/ceph.client.linux_pc.keyring
- Copy the keyring configuratio nto
- Confirm your configuration is working by running
ceph -s -n client.linux_pc
Setting up Benchmark on the Benchmark machine
Setup the benchmark itself:
rbd create -n client.linux_pc --size 10G --pool test-rbd bench-volumerbd -n client.linux_pc device map --pool test-rbd bench-volume(which should create a new block device, likely/dev/rbd0)mkfs.ext4 /dev/rbd0mkdir /mnt/benchmount /dev/rbd0 /mnt/bench
Benchmarks
All benchmarks are run with the same configuration, only changing the access patterns (read/write, random/sequential). Key configuration options are:
- using libaio
- direct io
- 1 job
fio config
[global]
ioengine=libaio
direct=1
size=4G
numjobs=1
runtime=60s
time_based
startdelay=5s
group_reporting
stonewall
name=write
rw=write
filename=bench
[1io_4k]
iodepth=1
bs=4k
[1io_8k]
iodepth=1
bs=8k
[1io_64k]
iodepth=1
bs=64k
[1io_4M]
iodepth=1
bs=4M
[32io_4k]
iodepth=32
bs=4k
[32io_8k]
iodepth=32
bs=8k
[32io_64k]
iodepth=32
bs=64k
[32io_4M]
iodepth=32
bs=4M
Results
Random Reads
Random Writes
Sequential Reads
Sequential Writes
Conclusion
- Overall I am satisfied with the performance of the cluster for my current use-case
- There is a lot of room for improvement in the low queue-depth range
- The network is not really a limiting factor currently
- None of the nodes in the cluster exceeded 500MiB/s of TX or RX, so there is plenty of room for growth
- My client used for testing was limited by the network, evident by the fact that the highest speed achieved is ~1.2GB/s (~10Gb/s)
- My smallest node (the embedded epyc) could be the limiting factor as in some benchmarks, it reached 100% cpu usage, while my other nodes never exceeded 40%
Extra Details
Cluster Hardware
- 10 GbE Networking between all nodes
- Node
- Ryzen 5 5500
- 64GB RAM
- 4x 480GB enterprise SSD
- Node
- Ryzen 5 3600
- 64GB RAM
- 4x 480GB enterprise SSD
- Node
- EPYC 3151
- 64GB RAM
- 4x 480GB enterprise SSD
Command to convert raw data into data for visualisation
jq '[.jobs[] | { iodepth: ."job options".iodepth, bs: ."job options".bs, operations: { iops: .write.iops, bw_bytes: .write.bw_bytes } }]' content/ceph-benchmarking/benchmarks/raw_random_write.json | jq '
# collect sorted unique labels
(map({key:.bs,value:1})|from_entries|keys_unsorted) as $labels
|
{
labels: $labels,
iodepths:
(
group_by(.iodepth)
| map(
. as $group
| {
iodepth: ($group[0].iodepth | tonumber),
iops: [
$labels[] as $l
| ($group[] | select(.bs == $l) | .operations.iops) // null
],
bw: [
$labels[] as $l
| ($group[] | select(.bs == $l) | .operations.bw_bytes) // null
]
}
)
)
}
'
Future Work
- Try directly on the block device
- Try this using xfs instead of ext4
- Try this with and without drive caches