5,387 Subscribers

setting up dovecot wto save vitualmailbox on a ceph cluster

do i just mount the cephfs in /mnt/maildir and set mail location to /mnt/maildir or there is additional configurations ?

mount -t ceph name@.fs_name=/ /mnt/maildir -o mon_addr=1.2.3.4
mail_location = maildir:/mnt/maildir

2 Comments

2024/11/02
22:13 UTC

Change smartctl

Is there a way to change the standard command to this value:

smartctl -d cciss,0 -x —json=o /dev/sdg

Thank you in advance

1 Comment

2024/10/31
10:37 UTC

Confusing 'ceph df' output

Hi All,

I am trying to understand the output of 'ceph df'.

All of these pools, with the exception of the "cephfs_data" are 3x replicated pools. But I am not understanding why does the 'STORED' and 'USED' values for the pools are exactly the same? We do have another cluster, which it does show around 3x the value, which is correct, but I'm not sure why this cluster shows exactly the same.

Secondly, I am confused why the USED in the "RAW STORAGE" section shows 24TiB, but if you see the USED/STORED section on the pools, it's like ~1.5 TiB summed up

Can someone please explain or mention if I am doing something wrong?

Thanks!

--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 894 TiB 873 TiB 21 TiB 21 TiB 2.35
ssd 265 TiB 262 TiB 3.3 TiB 3.3 TiB 1.26
TOTAL 1.1 PiB 1.1 PiB 24 TiB 24 TiB 2.10
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 263 MiB 148 263 MiB 0 83 TiB
vms 2 2048 902 GiB 163.61k 902 GiB 0.35 83 TiB
images 3 128 315 GiB 47.57k 315 GiB 0.12 83 TiB
backups 4 128 0 B 0 0 B 0 83 TiB
testbench 5 1024 0 B 0 0 B 0 83 TiB
cephfs_data 6 32 0 B 0 0 B 0 83 TiB
cephfs_metadata 7 32 5.4 KiB 22 5.4 KiB 0 83 TiB

To confirm, I can see for one pool that this is actually a 3x replicated pool

~# ceph osd pool get vms all
size: 3
min_size: 2
pg_num: 2048
pgp_num: 2048
crush_rule: SSD
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
pg_autoscale_mode: off
~#ceph osd crush rule dump SSD
{
"rule_id": 1,
"rule_name": "SSD",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

7 Comments

2024/10/30
21:24 UTC

Ceph - poor write speed - NVME

Hello,

I'm facing poor write (IOPS) performance (TPS as well) on Linux VM with MongoDB Apps.
Cluster:
Nodes: 3
Hardware: HP Gen11
Disks: 4 NVME PM1733 Enterprise NVME ## With latest firmware driver.
Network: Mellanox-connectx-6 25 gig
PVE Version: 8.2.4 , 6.8.8-2-pve

Ceph:
Version: 18.2.2 Reef.
4 OSD's per node.
PG: 512
Replica 2/1
Additional ceph config:
bluestore_min_alloc_size_ssd = 4096 ## tried also 8K
osd_memory_target = 8G
osd_op_num_threads_per_shard_ssd = 8
OSD disks cache configured as "write through" ## Ceph recommendation for better latency.
Apply \ Commit latency below 1MS.

Network:
MTU: 9000
TX \ RX Ring: 2046

VM:
Rocky 9 (tried also ubuntu 22):
boot: order=scsi0
cores: 32
cpu: host
memory: 4096
name: test-fio-2
net0: virtio=BC:24:11:F9:51:1A,bridge=vmbr2
numa: 0
ostype: l26
scsi0: Data-Pool-1:vm-102-disk-0,size=50G ## OS
scsihw: virtio-scsi-pci
smbios1: uuid=5cbef167-8339-4e76-b412-4fea905e87cd
sockets: 2
tags: templatae
virtio0: sa:vm-103-disk-0,backup=0,cache=writeback,discard=on,iothread=1,size=33G ### Local disk - same NVME
virtio2: db-pool:vm-103-disk-0,backup=0,cache=writeback,discard=on,iothread=1,size=34G ### Ceph - same NVME
virtio23 db-pool:vm-104-disk-0,backup=0,cache=unsafe,discard=on,iothread=1,size=35G ### Ceph - same NVME

Disk1: Local nvme with iothread
Disk2: Ceph disk with Write Cache with iothread
Disk3: Ceph disk with Write Cache Unsafe with iothread

I've made FIO test in one SSH session and IOSTAT on second session:

fio --filename=/dev/vda --sync=1 --rw=write --bs=64k --numjobs=1 --iodepth=1 --runtime=15 --time_based --name=fioa

Results:
Disk1 - Local nvme:
WRITE: bw=74.4MiB/s (78.0MB/s), 74.4MiB/s-74.4MiB/s (78.0MB/s-78.0MB/s), io=1116MiB (1170MB), run=15001-15001msec
TPS: 2500
DIsk2 - Ceph disk with Write Cache:
WRITE: bw=18.6MiB/s (19.5MB/s), 18.6MiB/s-18.6MiB/s (19.5MB/s-19.5MB/s), io=279MiB (292MB), run=15002-15002msec
TPS: 550-600
Disk3 - Ceph disk with Write Cache Unsafe:
WRITE: bw=177MiB/s (186MB/s), 177MiB/s-177MiB/s (186MB/s-186MB/s), io=2658MiB (2788MB), run=15001-15001msec
TPS: 5000-8000

The VM disk cache configured with "Write Cache"
The queue scheduler configured with "none" (Ceph OSD disk as well).

I'm also sharing rados bench results:
rados bench -p testpool 30 write --no-cleanup
Total time run: 30.0137
Total writes made: 28006
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 3732.42
Stddev Bandwidth: 166.574
Max bandwidth (MB/sec): 3892
Min bandwidth (MB/sec): 2900
Average IOPS: 933
Stddev IOPS: 41.6434
Max IOPS: 973
Min IOPS: 725
Average Latency(s): 0.0171387
Stddev Latency(s): 0.00626496
Max latency(s): 0.133125
Min latency(s): 0.00645552

I've also remove one of the OSD and made FIO test:

fio --filename=/dev/nvme4n1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=20 --time_based --name=fioaa
WRITE: bw=297MiB/s (312MB/s), 297MiB/s-297MiB/s (312MB/s-312MB/s), io=5948MiB (6237MB), run=20001-20001msec

Very good results.

Any suggestion please how to improve the write speed within the VM?
How can find the bottleneck?

Many Thanks.

36 Comments

2024/10/30
16:36 UTC

Yet another ghetto-style 3-node Proxmox/Ceph cluster—what would it take to make this viable?

Looks like I still haven't given up on using Ceph ...

goal

I'd be running it at home, on a home budget, and with home constraints on space, noise, power, etc. I also work from home, and it would be for my work data first and foremost.

I want something that is bulletproof and has as few SPoF as is reasonably achievable. And I'd like to get away from the concept of a single, large, complex, and expensive server. I have nightmares about that server failing while I'm on a project. Yes, I have backups, but not the time to build a new server and restore them.

Like, ZFS made the failure of a disk an absolute non-issue for me over a decade ago; to the point that I use recertified disks as one half of each mirror now—now I kind of want that for the entire hardware stack.

AFAICS, Ceph's kind of the only game in town.

The base will be Debian/Proxmox, not so much because running VMs is the primary use case, but because it supports Ceph OOTB. There will be the usual home/homelab VMs, but the main event is file server duty, really. Not sure about cephfs vs an RBD-backed conventional file server VM running Samba or ksmbd or whatever; opinions welcome.

basic hardware

3 nodes to start with, 5 may be achievable, but not short term:

AMD 5700X-ish
64 GB ECC
Gigabyte MC12-LE0
2x25G NIC
MikroTik CRS309 to connect the nodes

Medium term, I want two 10G switches for the redundancy, but for now 10G+1G will have to do. If the switch fails, it'll be slow, but it won't burn—or will it?

Samsung PM883 960GB
Seagate Exos or similar

This far I can't change much, because I can get this stuff for cheap. If it isn't possible to build something sane with this, then I'll just scrap the plan and stick to ZFS with replication.

disks

The board has 6x SATA and a measly PCIe 3.0 x1 M.2 slot. Could add an HBA in the x16, move the NIC to the x4, but then I'd need a bigger case to hold them as well, so it'd drive up the non-drive costs/node quite a bit—may still be worth it, you tell me.

I'd like to have an HDD pool as well, maybe 1–2 disks/node; this is just for archival. And because I get the impression that having the DB/WAL on something faster doesn't do that much on Bluestore, especially if it's just store (large) file / retrieve (large) file, I thought I'd try doing without that, keep it "simple".

The rest would go to an SSD pool of Samsung PM883 960GB. Best GB/€ right now, well, after the 8 TB models, but going with those would mean 1 OSD/node, and from what I've read, that's not a good idea. ^^

Usually I'd mirror the boot device, but I don't see how I can spare two ports just to boot. Does it make sense to have a third pool on the single possible NVMe? I'm not sure about segmenting the storage so much, it's small as it is, but booting from an NVMe feels like a waste.

How many SSDs can I get away with, to start with? I.e., is there any sweet spot in a 3-node cluster? Capacity-wise 6 SSD OSDs total would probably be enough, and the fewer 1 TB SSDs I buy now, the sooner I can buy a set of larger ones.
But no clue about performance. As long as the Samba shares and the VMs don't feel slow, I'll be happy.

Much as I like to play around with cool tech—and Ceph is that—if I'm going to do this, I need it to be viable, not just cool. Is this viable? (How) can it be made viable?

P.S. Sorry, not sorry for another 3-node thread. I figure if there's enough visible demand, maybe, just maybe, such tiny deployments will get some love as well.

21 Comments

2024/10/29
02:13 UTC

Performance question

Hello,

I have a 5-node Ceph cluster that has NVMe and "spinner" drives. I have the clitser connected to a 3-node Proxmox cluster. The Proxmox cluster connects via RBD.

I also have a 3-node VMware cluster that connects to a Dell SAN via iSCSI. The Dell SAN also have SSD and "spinner" drives.

I ran Crystalmark test on a Windows VM that is on the Proxmox cluster on the "HDD" (or "spinner") pool and then also ran it on a Windows VM that is on the non-SSD storage on the VMware cluster. Here are the results:

Proxmox + Ceph (Writeback enabled):

https://preview.redd.it/ds0mviv0kkxd1.png?width=410&format=png&auto=webp&s=0792cd7e330e38d1ac4eb2398e304aa984e4928f

though it did take a *long* while for it to complete (I didn;t time it, but it was at least 20 minutes, quite possibly more).

VMware + iSCSI:

https://preview.redd.it/1phi2b18kkxd1.png?width=502&format=png&auto=webp&s=d506a3d150bf7ed5a51a998aa8efe60c9c32062d

This one finished fairly quickly (within 5 minutes or so)

I was expecting the Proxmox + Ceph result to be "worse" than then VMware + iSCSI if for no other reason than the Ceph cluster needs to write three copies of any changes to the VM files but despiet taking longer to complete the tests, the resutls are *much* better than the VMware + iSCSI.

How trustworthy are these results?

3 Comments

2024/10/28
22:03 UTC

Creating a Simple Local Dev Environment for Ceph Object Storage Testing

I’m working on a POC to use Ceph as a storage provider and have no issues setting it up either standalone or within a k8s cluster using Rook. My challenge is creating a straightforward, local “development environment” for each developer.

Ideally, I’d like a setup where developers can quickly test object storage (e.g., uploading and retrieving an image) with minimal overhead—ideally just one command to start, without requiring multiple nodes, VMs, or large storage capacities. Is it possible to achieve this with a real Ceph instance (not a limited or emulated version)?

Any tips or recommendations would be appreciated!

10 Comments

2024/10/28
17:09 UTC

Ceph squid no disks available

Hi, I'm trying to create a new ceph cluster for testing purposes in vmware but ceph cannot find any disks.

sdb is the disk I'm trying add as an OSD

Any help would be appreciated. This is driving me insane.

System:

Ubuntu 24.04

Ceph Squid 19.2.0

root@ceph:~$ sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 50G 0 disk
├─sda1 8:1 0 1G 0 part /boot/efi
├─sda2 8:2 0 2G 0 part /boot
└─sda3 8:3 0 21.9G 0 part
└─ubuntu--vg-ubuntu--lv 252:0 0 21.9G 0 lvm /
sdb 8:16 0 120G 0 disk
root@ceph:~$ sudo ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...
root@ceph:~$ sudo cephadm ceph-volume inventory
Inferring fsid e894abeb-9541-34gh-67fg-005056baa4cf
Using ceph image with id '37996728e013' and tag 'v19' created on 2024-09-27 18:08:21 -0400 EDT
quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a
Device Path Size Device nodes rotates available Model name
/dev/sdb 120.00 GB sdb True True Virtual disk
/dev/sda 50.00 GB sda True False Virtual disk
root@ceph:~$ sudo ceph device ls
DEVICE HOST:DEV DAEMONS WEAR LIFE EXPECTANCY

6 Comments

2024/10/28
15:44 UTC

Proxmox CEPH 3 node - all NVMe setup - slow disk speeds

Hi all,

i just created my first 3 node cluster primarily for CEPH.

Nodes: Asus S14NA-U12 MBO, AMD EPYC 8534P CPU, 256GB DDR5 RAM

CEPH NICs: 2 x 100G Mellanox ConnectX-4 - one for public and one for backend connection

I planned to use 4 NVMe drives but ended with 2 per node for testing. I tested drives speed in Windows VM as I need to use 2 Windows VMs (RDS server and SQL server) in production

ZFS, writeback cache DISABLED, mirror of 2 x Micron 7400 PRO 1.92TB (declared 4400/2000MB/s)

https://preview.redd.it/2tjyw5cknbxd1.png?width=647&format=png&auto=webp&s=4214f4aacb9edc111a8a18116e165befe5967de9

CEPH, writeback cache DISABLED, 6 x Micron 7400 PRO 1.92TB (declared 4400/2000MB/s) OSD total on 3 nodes (2 per node), 100G public network, 100G backend, KRBD disabled – after config tweaks

https://preview.redd.it/2aawgvxtnbxd1.png?width=650&format=png&auto=webp&s=13c846ae991a89a490a29444d1e2bf6448fa0686

CEPH, writeback cache ENABLED, 6 x Micron 7400 PRO 1.92TB (declared 4400/2000MB/s) OSD total on 3 nodes (2 per node), 100G public network, 100G backend, KRBD disabled – after config tweaks

https://preview.redd.it/85irbuhvnbxd1.png?width=648&format=png&auto=webp&s=4f500856aeea213bd4344844960e1a73a91a114f

As you see, without writeback cache, the same drives are 3 times slower on CEPH then on ZFS. I don't know if this is normal but I expected much faster speeds.

https://preview.redd.it/v1svhsw1pbxd1.png?width=1187&format=png&auto=webp&s=1720df5506a691d2b77dc949daa1d3d1a51d3a92

I tried a lot of tweaks, reinstalled 3 times, set MTU 9000, disabled auth and debug... but the CEPH disk speed is always around the same result.

What do you think might be the problem ?

15 Comments

2024/10/27
16:09 UTC

Is this Crazy? Raid 0 smaller disks to match size of new hdds.

I have a 3 node homelab cluster that is and had been great however I need more capacity. I have 6x 4tb hdds per node. Experience has shown that matching drive sizes (and specs) is very important to maintain performance especially with a small cluster like mine.

I'm considering purchasing 3x 12tb hdds of similar class as the drives I currently have. Adding them to the cluster would 'work' however I'd end up with unbalanced #pgs per osd and limit performance significantly. This would be especially painful because hdds+ceph is pain ☺️. Unless the new drives are significantly more performant vrs existing they would get a disproportionate share of io.

What I'm considering. Create raid0 mdraid array out of 3x of my 4tb drives, giving me a 12tb volume, then use that for an osd. From a ceph perspective this device would be 'matched'. I'd actually have 2x 12tb arrays per node (using all 6 drives). The 12tb hdd would still be the bottleneck but mostly because the array would more performant rather than pg inbalance.

The big downsides I see is undue complexity (layering storage) and the 3x increase if risk of failure of the raided osd. With 3x replication I can safely lose a node so I'm less concerned (and I have backups) with the later.

At some future date I could purchase additional 12tb drives and decomm the arrays as I do resulting in having only 12tb hdds (and more bays available).

Some limitations. I can't reasonably do more than 3 nodes or purchase more than a couple 12tb drives at a time for $$, networking and space reasons. So I can't scale out or mass replace my existing drives. I do have time so letting crush to it's thing as I orchestrate this operation isn't a problem.

9 Comments

2024/10/27
06:23 UTC

What are the CEPH bottlenecks? CPU? Storage type? Network speed?

Hello, I am interested in a Ceph cluster on bare metal. Probably, in the 1 PB+ range in a server cabinet. I haven't found a clear explanation of where and when Ceph cluster bottlenecks occur.

I suppose the bottlenecks are as follows:

Ceph OSD: storage type (U.3 NVMe, SATA, SAS, etc.), CPU, network card. When does the CPU become the bottleneck? When the PCIe lanes are saturated? At what point is that? Where can I learn the basics about that?
Ceph MON: CPU speed and network card?
Ceph MDS: CPU speed and network card?
Ceph internal switches: which speed to select is based on the PCIe lanes of the Ceph OSDs, CPU speed of the Ceph MON or Ceph MDS?
Ceph external switches: which speed to select is based on the internal switches, something else?

Are there some simple guidelines to follow in finding the bottlenecks to clarify which components to select for the bare metal cluster build?

Edit: CEPH -> Ceph

17 Comments

2024/10/26
10:27 UTC

Requests for experiments on CephFS testing cluster?

We recently made a testing cluster for Ceph with 3 nodes. Do you all have any experiments you were curious about that I could look at?

Spec for 3 nodes

100g networking

2.85->3.1ghz cpu

512g ram

24tb ssd

900tb hdd

5 Comments

2024/10/26
02:54 UTC

Ceph cluster design choices that are permanent or hard-set?

Have been trying to design a new ~20PB cluster for a few months and things are well underway. Planning to just use Ceph Radosgw (S3), then fill it to the brim straight after it's released. We are therefore wondering:

What design choices regarding ceph/ephadm/osds/mons/mgrs/pools/crush-profiles/ec-profiles etc are either irreversible, or just a real pain in the a#$ to undo, after your cluster is loaded with data?

I imagine EC profiles, Crush rules and Pool settings should be optimised, but what else might a noob forget?

25 Comments

2024/10/25
04:20 UTC

OptiPlex 5040 SFF as a node?

I've been looking and I'm thinking an OptiPlex 5040 SFF plus 10GbE and a SATA card might make a decent node for homelab use. i5-6500, 16GB, and x16/x4 PCIe v3.0, just right for the cards. Setup 3 or 5 of these for the cluster along with one of the cheaper 10GbE switches. Not sure how many drives each, maybe three to start?

There are several things I want to use it for, but I think the most demanding would be to use it as the underlying storage to boot and run a diskless Windows machine and have it feel roughly like it's running from an attached SATA III drive.

Is this arrangement likely to behave how I hope?

4 Comments

2024/10/25
02:40 UTC

Cannot change pool layout of empty dir and file

immich-web-68bb98dc66-k8988:/cephfs# getfattr -n ceph.file.layout.pool check-cephfs-file
# file: check-cephfs-file
ceph.file.layout.pool="ceph-filesystem-unspecified"
immich-web-68bb98dc66-k8988:/cephfs# getfattr -n ceph.file.layout.pool_id check-cephfs-file
# file: check-cephfs-file
ceph.file.layout.pool_id="20"
immich-web-68bb98dc66-k8988:/cephfs# getfattr -n ceph.file.layout.pool_name check-cephfs-file
# file: check-cephfs-file
ceph.file.layout.pool_name="ceph-filesystem-unspecified"
immich-web-68bb98dc66-k8988:/cephfs# setfattr -n ceph.file.layout.pool_name -v ceph-filesystem-unspecified check-cephfs-file
setfattr: check-cephfs-file: Permission denied
immich-web-68bb98dc66-k8988:/cephfs# du -hd1 check-ceph
0 check-ceph
immich-web-68bb98dc66-k8988:/cephfs# du -hd1 check-ceph
immich-web-68bb98dc66-k8988:/cephfs# ceph-filesystem-unspecified
immich-web-68bb98dc66-k8988:/cephfs# setfattr -n ceph.dir.layout.pool -v ceph-filesystem-unspecified check-ceph
setfattr: check-ceph: Permission denied
immich-web-68bb98dc66-k8988:/cephfs#

I keep getting Permission denied. I already checked:

- Pool exsist (I even set the pool name to the one that they already reside on)

- Dir and file are empty

- Ceph cannot be mounted with usr_xattr

9 Comments

2024/10/24
18:57 UTC

not convinced ceph is using my 10gb nics, seems like its using them at 1gb speed

Installed ceph via proxmox. Everything is the latest version as of a couple weeks ago.

At first, I thought everything was 10gb but performance was slow and I discovered the 10gb were actually showing as 1gb. With a couple commands the 10gb nics were performing as expected and I verified the speed between all three of my hosts on the storage network with perf3 as transferring almost 10gb as expected.

However, cephfs seems to be performing about the same.

I originally detected an issue when certain pods in kubernetes clusters restarted at the same time as cephfs did a write at about 50MiB. Seems the bandwidth was all used up slowing vms, causing the pods to crash.

After increasing the nics to 10gb I thought the issue would resolve but it hasn't and cephfs performance seems to be about the same. I'm not doing any special testing of the speed, I was just expecting the pods not to restart in response to a cephfs write.

Reads and writes seem to max out at aout 100MiB between them (same as when it was all 1gb before).

It seems like everything is the same as when the nics were running at 1gb. Is it possible ceph still thinks they are at 1gb for some reason? Is there something ceph related that needs to be restarted / updated?

[global]
	auth_client_required = cephx
	auth_cluster_required = cephx
	auth_service_required = cephx
	cluster_network = 10.0.1.21/24
	fsid = 1c3f0c6f-696c-4cc0-b29c-e68f73ba9e4b
	mon_allow_pool_delete = true
	mon_host = 10.0.0.21 10.0.0.23 10.0.0.22
	ms_bind_ipv4 = true
	ms_bind_ipv6 = false
	osd_pool_default_min_size = 2
	osd_pool_default_size = 3
	public_network = 10.0.0.21/24

googled how to run a speed test (does this look like 10gb nic w/ nvmes performance?):

root@pve-a:~# ceph osd pool create testbench 100 100
pool 'testbench' created
root@pve-a:~# rados bench -p testbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve-a_423981
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        35        19    75.996        76    0.894092    0.348493
    2      16        51        35   69.9949        64    0.828046    0.621269
    3      16        58        42   55.9958        28     1.33079    0.738915
    4      16        70        54   53.9958        48    0.267747    0.900903
    5      16        77        61   48.7961        28   0.0664892    0.956315
    6      16        82        66   43.9964        20    0.265331    0.930378
    7      16        87        71   40.5681        20    0.331954     1.07364
    8      16       100        84   41.9965        52     3.22042     1.38887
    9      16       105        89   39.5522        20     1.93728     1.44535
   10      16       106        90   35.9969         4     1.93704     1.45081
   11      14       106        92   33.4517         8     1.52379     1.46059
Total time run:         11.5055
Total writes made:      106
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     36.8518
Stddev Bandwidth:       23.2739
Max bandwidth (MB/sec): 76
Min bandwidth (MB/sec): 4
Average IOPS:           9
Stddev IOPS:            5.81847
Max IOPS:               19
Min IOPS:               1
Average Latency(s):     1.70953
Stddev Latency(s):      1.33644
Max latency(s):         4.55411
Min latency(s):         0.026747

44 Comments

2024/10/23
17:48 UTC

Need Help. Replace nodes when quorum have died!

Hello, I need help with my Ceph cluster setup. I have a 6-node cluster with a replica size of 6 and a minimum size of 3. The problem started when 3 nodes, including Node 0, were placed in the basement, which flooded and damaged all 3.

I replaced the damaged nodes with new hardware and expected that giving them the same names would allow Proxmox to automatically reintegrate them into the cluster. However, this didn’t work. The cluster’s quorum is broken, and my efforts to restore it haven't been successful.

While I managed to re-establish quorum and add a new node, I still can't restore the original quorum with the replaced nodes. I find this process with Proxmox and Ceph to be more complicated than expected. I understand the need for a halt when quorum is broken, but I assumed that replacing nodes with similar hardware and the same server name would allow for seamless reintegration.

Where am I going wrong, and what steps can I take to fix this issue?

3 Comments

2024/10/23
15:04 UTC

Advice on structure with this gear

Hi All,

About to build my first 3 node Proxmox cluster and will be looking to use Ceph on the storage front. Each node will have a Mellanox ConnectX-4 10Gigabit Ethernet Card direct connected to each other in a mesh. Each node will have an LSI 9200-8E controller in IT mode.

For storage, each node will have 2 x Intel 1.6TB DC S3510 Series SATA SSDs connected via motherboard SATA ports and 8 x 1TB 7200RPM 2.5 inch drives. I also have some Micron 512GB SSDs which I had thought I might be able to use as a R/W cache for the spinning disk's, however not sure if that is possible.

My requirements won't be extreme, so was wondering about setting the 1.6TB SSDs as a mirror for my performance VMs/Containers and then have the 8 x 1TB drives for lesser I/O intensive tasks like archiving, email, etc.

What would be my best approach for configuration of this storage? Are there other questions I should be asking myself first?

0 Comments

2024/10/23
09:47 UTC

One of the most annoying Health_Warn messages that won't go away, client failing to respond to cache pressure.

How do I deal with this without a) rebooting the client b) restarting the MDS daemon?

HEALTH_WARN 1 clients failing to respond to cache pressure
[WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
    mds.cxxxvolume.cxxx-m18-33.lwbjtt(mds.4): Client ip113.xxxx failing to respond to cache pressure client_id: 413354

I know if I reboot the host, this error message will go away, but I can't really reboot it.

There are 15 users currently on this machine connecting to it via some RDP software.
unmounting the ceph cluster and remounting didn't help
restarting the MDS daemon has bitten me in the ass a lot. One of the biggest problems I will have is the MDS daemon will restart, so then another MDS daemon picks up as primary; all good so far. But the MDS that took over goes into a weird run away memory cache mode and crashes the daemon, OOMs the host and OUTs all of the OSDs in that host. This is a nightmare, because once the MDS host goes offline, another MDS host picks up, and rinse repeat..

The hosts have 256 gigs of ram, 24 CPU threads, 21 OSDS, 10 gig nics for public and cluster network.

ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)

Cephfs kernel driver

What I've tried so far is to unmount and remount, clear cache "echo 3 >/proc/sys/vm/drop_caches", blocked the IP (from the client) of the MDS host, hoping to timeout and clear the cache (no joy).

How do I prevent future warning messages like this? I want to make sure that I'm not experiencing some sort of networking issue or HBA (IT mode 12GB/SAS )
Thoughts?

9 Comments

2024/10/20
02:23 UTC

Achieving Single-Node Survivability in a 4-Node Storage Cluster

Hi,

I've done some research but unfortunately without success.
I'm asking you if it's possible to have a 4-node cluster that can continue to provide storage service even if only one node remains active.

I did a quick test with microceph, on four machines, but as soon as I turned off two of them, the cluster was no longer available.

Would it theoretically be possible to configure a system like this?

Thanks

19 Comments

2024/10/19
10:15 UTC

Having issues getting a ceph cluster off the ground. OSD failing to add.

Hey all. I'm trying to get ceph running on three ubuntu servers, and am following along with the guide here.

I start by installing cephadm

apt install cephadm -y

It installs successfully. I think bootstrap a monitor and manager daemon to the same host:

cephadm bootstrap --mon-ip [host IP]

I copy the /etc/ceph/ceph.pub key to the osd host, and am able to add the osd host to (ceph-osd01) to the cluster:

ceph orch host add ceph-osd01 192.168.0.10

But I cannot seem to deploy an osd daemon to the host.

Running "ceph orch daemon add osd ceph-osd01:/dev/sdb" results in the following:

root@ceph-mon01:/home/thing# ceph orch daemon add osd ceph-osd01:/dev/sdb
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1862, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 184, in handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 499, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 120, in <lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 109, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 1374, in _daemon_add_osd
    raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 241, in raise_if_exception
    raise e
RuntimeError: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/mon.ceph-osd01/config
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/__main__.py", line 5579, in <module>
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/__main__.py", line 5567, in main
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/__main__.py", line 409, in _infer_config
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/__main__.py", line 324, in _infer_fsid
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/__main__.py", line 437, in _infer_image
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/__main__.py", line 311, in _validate_fsid
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/__main__.py", line 3288, in command_ceph_volume
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/__main__.py", line 918, in get_container_mounts_for_type
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/cephadmlib/daemons/ceph.py", line 422, in get_ceph_mounts_for_type
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/cephadmlib/host_facts.py", line 760, in selinux_enabled
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/cephadmlib/host_facts.py", line 743, in kernel_security
  File "/var/lib/ceph/e6c69d42-8d67-11ef-bbe0-005056aa68a2/cephadm.a58127a8eed242cae13849ddbebcb9931d7a5410f406f2d264e3b1ed31d9605e/cephadmlib/host_facts.py", line 722, in _fetch_apparmor
ValueError: too many values to unpack (expected 2)

I am able to see host lists:

root@ceph-mon01:/home/thing# ceph orch host ls
HOST        ADDR           LABELS       STATUS  
ceph-mon01  192.168.0.1  _admin               
ceph-osd01  192.168.0.10   mon,mgr,osd          
ceph-osd02  192.168.0.11   mon,mgr,osd          
3 hosts in cluster

but not device lists:

root@ceph-mon01:/# ceph orch device ls
root@ceph-mon01:/#

wtf is going on here? :(

7 Comments

2024/10/18
17:55 UTC

Inconsistent pg -> failed repair -> pg down -> OSDs restart during backfilling

Hello community,

After 4 years of using Ceph, there is a first serious problem with data consistency. After some deep-scrubing one pg had inconsistent status. I tried to repairing it and deep scrubbing many times but it always failed. I noticed that the primary OSD for this pg (4.e4) is osd.21. Restarting this OSD does not help me. I checked dmesg and noticed that there are a lot of write errors. My next idea was to change crush weight to 0. After that, at the end of recovery/backfilling all, 3 OSDs with this placement group (12,25,21) restarted, and the process started again. Below I attach some logs that I hope describe a problem.

osd.21 :

  -3> 2024-10-18T11:25:25.005+0200 7fbb2e797700 10 osd.21 pg_epoch: 304540 pg[4.e4( v 304540'35368884 (304019'35365884,304540'35368884] local-lis/les=304539/304540 n=13702 ec=12199/43 lis/c=304539/303049 les/c/f=304540/303064/140355 sis=304539) [0,25]/[21,25] backfill=[0] r=0 lpr=304539 pi=[303049,304539)/7 crt=304540'35368884 lcod 304540'35368883 mlcod 304540'35368883 active+undersized+degraded+remapped+backfilling rops=1 mbc={}] get_object_context: 0x55f7729bdb80 4:274f1d06:::rbd_data.04e53058991b67.00000000000006da:151 rwstate(read n=1 w=0) oi: 4:274f1d06:::rbd_data.04e53058991b67.00000000000006da:151(6030'16132894 osd.6.0:93038977 dirty|data_digest|omap_digest s 4194304 uv 14398660 dd 14217e41 od ffffffff alloc_hint [0 0 0]) exists: 1 ssc: 0x55f75f7011e0 snapset: 4796=[]:{4796=[4796,4789,477b,476f,475b,4741,4733,3b5d,3b51,3b3e,3375,336b,3bc,13c]}
    -2> 2024-10-18T11:25:25.005+0200 7fbb2e797700 10 osd.21 pg_epoch: 304540 pg[4.e4( v 304540'35368884 (304019'35365884,304540'35368884] local-lis/les=304539/304540 n=13702 ec=12199/43 lis/c=304539/303049 les/c/f=304540/303064/140355 sis=304539) [0,25]/[21,25] backfill=[0] r=0 lpr=304539 pi=[303049,304539)/7 crt=304540'35368884 lcod 304540'35368883 mlcod 304540'35368883 active+undersized+degraded+remapped+backfilling rops=1 mbc={}] add_object_context_to_pg_stat 4:274f1d06:::rbd_data.04e53058991b67.00000000000006da:151
    -1> 2024-10-18T11:25:25.021+0200 7fbb2e797700 -1 ./src/osd/osd_types.cc: In function 'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread 7fbb2e797700 time 2024-10-18T11:25:25.008828+0200
./src/osd/osd_types.cc: 5888: FAILED ceph_assert(clone_overlap.count(clone))
 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x55f74c7c4fe8]
 2: /usr/bin/ceph-osd(+0xc25186) [0x55f74c7c5186]
 3: (SnapSet::get_clone_bytes(snapid_t) const+0xe3) [0x55f74cb08bc3]
 4: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>, pg_stat_t*)+0x23e) [0x55f74c9b2d6e]
 5: (PrimaryLogPG::recover_backfill(unsigned long, ThreadPool::TPHandle&, bool*)+0x19f3) [0x55f74ca1d963]
 6: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0xf2a) [0x55f74ca2384a]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x295) [0x55f74c8914f5]
 8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x19) [0x55f74cb4ce79]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xad0) [0x55f74c8b2a80]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x55f74cf99f3a]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55f74cf9c510]
 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7fbb61ed3ea7]
 13: clone()

osd.12

 ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7fe0cab71140]
 2: signal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17e) [0x5615e5acd77a]
 5: /usr/bin/ceph-osd(+0xc278be) [0x5615e5acd8be]
 6: (SnapSet::get_clone_bytes(snapid_t) const+0xe3) [0x5615e5e19113]
 7: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>, pg_stat_t*)+0x23e) [0x5615e5cbce2e]
 8: (PrimaryLogPG::recover_backfill(unsigned long, ThreadPool::TPHandle&, bool*)+0x19f4) [0x5615e5d27af4]
 9: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0xf2a) [0x5615e5d2da3a]
 10: (OSD::do_recovery(PG*, unsigned int, unsigned long, int, ThreadPool::TPHandle&)+0x2a5) [0x5615e5b9b445]
 11: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0xcb) [0x5615e5e5d6db]
 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xaa8) [0x5615e5bba138]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x5615e62aac1a]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5615e62ad1f0]
 15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7fe0cab65ea7]
 16: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I googled this problem and found this (translated from German) and this

I am afraid because as I know using the ceph-objectstore-tool may cause damage and lost data. Has anyone had the same problem and resolved it or could confirm that the information in one of the upper articles is correct? Is the any way to prevent losing data? Maybe backup pg from 3 OSDs with 4.e4 pg?

14 Comments

2024/10/18
11:45 UTC

I marked osd as lost. Can I readd the OSD?

As per title, while trying to recover from a degraded cluster I marked one osd as lost because I lost its Wal and db. Since then no writes have been made to the cluster, just cluster backfills and recover. My question is: if I manage to recover Wal/db device is there a chance to get that data again into the cluster?

4 Comments

2024/10/17
09:23 UTC

Migrated to Ceph, server by server now all done. Is this setup enough to setup EC 4 + 2?

Hello everyone

I found out about Ceph 2 months ago via Proxmox and everything was amazing, especially with the live migrate function.

So I decided to empty server by server, add it to the ceph cluster and keep on going until all the disks were setup as OSD instead of RAID-10 local storage with VM.

Now I'm done here's the current result:

I use P4510 / P4610 & other enterprise disks only with PLP.
I read having a lot of ram and fast CPU was good. I put 1-2 TB ram per server and used the EPYC Milan CPU just to be sure. Should be 32 cores free at all times per server.

https://preview.redd.it/t1hfexmzr8vd1.png?width=2320&format=png&auto=webp&s=8354be43298f6021a013746927782ab4a5cd885b

I didn't have enough servers to begin with to start with EC 4 + 2. As I read it requires a minimum of 6 servers, 7 really because you want to have one spare in case of failure. Sooo when migrating the VM from local storage to Ceph, I just put them on the standard 3x REP.

However now we're there. I have 7 servers, finally!

There are around 600 VM running in the cluster on the 3x replication. It's just small VPN servers so as you see they don't use that much storage on 3x, and not a lot of IOPS either. Should be perfect for EC?

Here are the performance stats:

https://preview.redd.it/qnw05a16s8vd1.png?width=3058&format=png&auto=webp&s=717b756fcb2f2c7ff44316a4832027153a496f2f

Does everything look good you think? I tried to follow as much as possible of what was "recommended" such as trying to keep storage balanced between nodes, using enterprise only disks, have + 1-2 extra spares, LACP bond to two switches, 25G network for latency (I really don't need the 100G throughput unless there's a rebuild).

Anything I should think about when going from 3x REP to 4 + 2 EC for my VM?

Is 7 servers enough or do I need to add an 8th server before going to 4 + 2?

What is my next step?

I'm thinking about relying on RDB writeback cache for any bursts if needed. All servers have A/B power, UPS.

I don't mind about keeping current VM on the 3x replication if it's hard to migrate but at least deploy new VM on the EC setup would be great so I don't blow through all of this nvme.

Thanks!

11 Comments

2024/10/17
04:18 UTC

Where can I find the Ceph Grafana dashboards that come by default with Cephadm?

The built-in Grafana dashboards that Cephadm come with are excellent.

I am wondering though, if I want to put these onto another Grafana instance. Where would be a good place to download them from? (Ideally for my specific Cephadm version too!)

I've located a bunch of copies on the host that containers installed on, but copying them out of here just feels like a messy way to do this:

root@storage-13-09002:~# find / -name "*.json" | xargs grep -l "grafana"
/var/lib/docker/overlay2/d43fe8e11f978ce76013c7354fa545e8fbd87f27f3a03463b2c57f10f6540d90/merged/etc/grafana/dashboards/ceph-dashboard/osds-overview.json
/var/lib/docker/overlay2/d43fe8e11f978ce76013c7354fa545e8fbd87f27f3a03463b2c57f10f6540d90/merged/etc/grafana/dashboards/ceph-dashboard/cephfs-overview.json
/var/lib/docker/overlay2/d43fe8e11f978ce76013c7354fa545e8fbd87f27f3a03463b2c57f10f6540d90/merged/etc/grafana/dashboards/ceph-dashboard/radosgw-detail.json
/var/lib/docker/overlay2/d43fe8e11f978ce76013c7354fa545e8fbd87f27f3a03463b2c57f10f6540d90/merged/etc/grafana/dashboards/ceph-dashboard/pool-detail.json
...

Answer: This seems like a good place to fetch them from: https://github.com/ceph/ceph/tree/main/monitoring/ceph-mixin/dashboards_out

3 Comments

2024/10/17
03:29 UTC

brand new setup ceph

Dear All,

Any recommended supermicro chassis to recommend for a brand new ceph setup? I would like to use nvme u.2 for cost efficiency and 2x100g ports for all the bandwidth needs

single cpu with 128GB ram.

4 Comments

2024/10/16
23:07 UTC

Taming ceph logging -- Journal priorities out of whack?

We have many issues with our ceph cluster, but what I'm struggling with the most is finding the useful data from the logs. We're running a stock setup logging-wise, yet I'm finding numerous logs that Ceph marks as [DBG] which sure look like debug logs to me (billions and billions of them) being sent to Journal at priority 3 (ERROR) or 5 (NOTICE) level.

The logging pages cat docs.ceph.com only talk about increasing log level, and I've confirmed that debug logs are disabled for every daemon. Can anyone point me at better docs, or share how they have tamed ceph logging so that debug logs are not reported at high levels?

EtA: Specifically concerned with logs submitted to Journald. I really need to be able to tune these down to appropriate priorties.

Examples:

{ "PRIORITY":"3", "MESSAGE":"system:0\n", "_CMDLINE":"/usr/bin/conmon --api-version 1 [...]", ...}

Really. You're telling me system:0 at priority level WARNING ? Not useful.

{ "PRIORITY":"4", "MESSAGE":"log_channel(cluster) log [DBG] : fsmap [...]" }

These fsmap messages come by the thousands, and they don't say anything of use. They are even marked as DEBUG messages. So why are they logged at NOTICE level?

5 Comments

2024/10/15
20:28 UTC

CRUSH rule resulted in duplicated OSD for PG.

My goal is to have primary on a specific host (due to read-replicas not an option for non-RBD), and replicas on any host (including the host already chosen), but not the primary OSD.

My current CRUSH rule is

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class nvme
device 1 osd.1 class ssd
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class ssd
device 5 osd.5 class nvme
device 6 osd.6 class ssd
device 7 osd.7 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host nanopc-cm3588-nas {
id -3 # do not change unnecessarily
id -4 class nvme # do not change unnecessarily
id -5 class ssd # do not change unnecessarily
id -26 class hdd # do not change unnecessarily
# weight 3.06104
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.23288
item osd.2 weight 0.23288
item osd.5 weight 1.81940
item osd.7 weight 0.77588
}
host mbpcp {
id -7 # do not change unnecessarily
id -8 class nvme # do not change unnecessarily
id -9 class ssd # do not change unnecessarily
id -22 class hdd # do not change unnecessarily
# weight 0.37560
alg straw2
hash 0 # rjenkins1
item osd.3 weight 0.37560
}
host mba {
id -10 # do not change unnecessarily
id -11 class nvme # do not change unnecessarily
id -12 class ssd # do not change unnecessarily
id -23 class hdd # do not change unnecessarily
# weight 0.20340
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.20340
}
host mbpsp {
id -13 # do not change unnecessarily
id -14 class nvme # do not change unnecessarily
id -15 class ssd # do not change unnecessarily
id -24 class hdd # do not change unnecessarily
# weight 0.37155
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.18578
item osd.6 weight 0.18578
}
root default {
id -1 # do not change unnecessarily
id -2 class nvme # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
id -28 class hdd # do not change unnecessarily
# weight 4.01160
alg straw2
hash 0 # rjenkins1
item nanopc-cm3588-nas weight 3.06104
item mbpcp weight 0.37560
item mba weight 0.20340
item mbpsp weight 0.37157
}
chassis chassis-nanopc {
id -16 # do not change unnecessarily
id -20 class nvme # do not change unnecessarily
id -21 class ssd # do not change unnecessarily
id -27 class hdd # do not change unnecessarily
# weight 3.06104
alg straw2
hash 0 # rjenkins1
item nanopc-cm3588-nas weight 3.06104
}
chassis chassis-others {
id -17 # do not change unnecessarily
id -18 class nvme # do not change unnecessarily
id -19 class ssd # do not change unnecessarily
id -25 class hdd # do not change unnecessarily
# weight 0.95056
alg straw2
hash 0 # rjenkins1
item mbpcp weight 0.37560
item mba weight 0.20340
item mbpsp weight 0.37157
}

# rules
rule replicated_rule {
id 0
type replicated
step take chassis-nanopc
step chooseleaf firstn 1 type host
step emit
step take default
step chooseleaf firstn 0 type osd
step emit
}

However, it resulted in pg dump like this:

version 14099

stamp 2024-10-13T11:46:25.490783+0000

last_osdmap_epoch 0

last_pg_scan 0

PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN LAST_SCRUB_DURATION SCRUB_SCHEDULING OBJECTS_SCRUBBED OBJECTS_TRIMMED

6.3f 3385 0 0 3385 0 8216139409 0 0 1732 3000 1732 active+clean+remapped 2024-10-13T02:21:07.580486+0000 5024'13409 5027:39551 [5,5] 5 [5,4] 5 4373'10387 2024-10-12T09:46:54.412039+0000 1599'106 2024-10-09T15:41:52.360255+0000 0 2 periodic scrub scheduled @ 2024-10-13T17:41:52.579122+0000 2245 0

6.3e 3217 0 0 3217 0 7806374402 0 0 1819 1345 1819 active+clean+remapped 2024-10-13T03:36:53.629380+0000 5025'13549 5027:36882 [7,7] 7 [7,4] 7 4373'10667 2024-10-12T09:46:51.075549+0000 0'0 2024-10-08T07:13:08.545820+0000 0 2 periodic scrub scheduled @ 2024-10-13T13:27:11.454963+0000 2132 0

6.3d 3256 0 0 3256 0 7780755159 0 0 1733 3000 1733 active+clean+remapped 2024-10-13T02:21:46.947129+0000 5024'13609 5027:28986 [5,5] 5 [5,4] 5 4371'11218 2024-10-12T09:39:44.502516+0000 0'0 2024-10-08T07:13:08.545820+0000 0 2 periodic scrub scheduled @ 2024-10-13T14:12:17.856811+0000 2202 0

See [5,5]. Thus my cluster remains in remapping state. Is there anyway I can achieve my goal stated above?

8 Comments

2024/10/14
17:32 UTC

Gui block images --> Restart mgr service

When I try to open in GUI block devices mgr service is restarting (18.2.4) . I cannot access crash logs after container restart

13:46:38 hostname bash[27409]: debug -17> 2024-10-11T11:46:38.401+0000 7f4e2f823640 5 librbd::io::Dispatcher: 0x55ff0ea65000 register_dispatch: dispatch_layer=6

Oct 11 13:46:38 hostname bash[27409]: debug -16> 2024-10-11T11:46:38.401+0000 7f4e1a47a640 5 asok(0x55ff051a8000) register_command rbd cache flush Images/10f3af9e-1766-4dbf-9cdb-416436027b23 hook 0x55ff13050f00

Oct 11 13:46:38 hostname bash[27409]: debug -15> 2024-10-11T11:46:38.401+0000 7f4e1a47a640 5 asok(0x55ff051a8000) register_command rbd cache invalidate Images/10f3af9e-1766-4dbf-9cdb-416436027b23 hook 0x55ff13050f00

Oct 11 13:46:38 hostname bash[27409]: debug -14> 2024-10-11T11:46:38.401+0000 7f4e1a47a640 5 librbd::ImageCtx: 0x55ff11696000: disabling zero-copy writes

Oct 11 13:46:38 hostname bash[27409]: debug -12> 2024-10-11T11:46:38.401+0000 7f4e1a47a640 5 librbd::cache::WriteAroundObjectDispatch: 0x55ff1253a900 init:

Oct 11 13:46:38 hostname bash[27409]: debug -11> 2024-10-11T11:46:38.401+0000 7f4e1a47a640 5 librbd::io::Dispatcher: 0x55ff0ea65000 register_dispatch: dispatch_layer=1

Oct 11 13:46:38 hostname bash[27409]: debug -10> 2024-10-11T11:46:38.405+0000 7f4e1ac7b640 5 librbd::io::SimpleSchedulerObjectDispatch: 0x55ff1304c6c0 SimpleSchedulerObjectDispatch: ictx=0x55ff11696000

Oct 11 13:46:38 hostname bash[27409]: debug -9> 2024-10-11T11:46:38.405+0000 7f4e1ac7b640 5 librbd::io::SimpleSchedulerObjectDispatch: 0x55ff1304c6c0 init:

Oct 11 13:46:38 hostname bash[27409]: debug -8> 2024-10-11T11:46:38.405+0000 7f4e1ac7b640 5 librbd::io::Dispatcher: 0x55ff0ea65000 register_dispatch: dispatch_layer=5

Oct 11 13:46:38 hostname bash[27409]: debug -6> 2024-10-11T11:46:38.405+0000 7f4e1ac7b640 5 librbd::io::Dispatcher: 0x55ff13076090 shut_down_dispatch: dispatch_layer=3

Oct 11 13:46:38 hostname bash[27409]: debug -5> 2024-10-11T11:46:38.405+0000 7f4e1a47a640 5 librbd::io::WriteBlockImageDispatch: 0x55ff0e6540a0 unblock_writes: 0x55ff11696000, num=0

Oct 11 13:46:38 hostname bash[27409]: debug -3> 2024-10-11T11:46:38.409+0000 7f4e1a47a640 5 librbd::io::WriteBlockImageDispatch: 0x55ff0e6540a0 unblock_writes: 0x55ff11696000, num=0

Oct 11 13:46:38 hostname bash[27409]: debug -2> 2024-10-11T11:46:38.409+0000 7f4e2f823640 5 librbd::DiffIterate: fast diff enabled

Oct 11 13:46:38 hostname bash[27409]: debug -1> 2024-10-11T11:46:38.409+0000 7f4e2f823640 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.4/rpm/el9/BUILD/ceph-18.2.4/src/librbd/api/DiffIterate.cc: In function 'int librbd::api::DiffIterate<ImageCtxT>::execute() [with ImageCtxT = librbd::ImageCtx]' thread 7f4e2f823640 time 2024-10-11T11:46:38.414077+0000

Oct 11 13:46:38 hostname bash[27409]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.4/rpm/el9/BUILD/ceph-18.2.4/src/librbd/api/DiffIterate.cc: 341: FAILED ceph_assert(object_diff_state.size() == end_object_no - start_object_no)

Oct 11 13:46:38 hostname bash[27409]: ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)

Oct 11 13:46:38 hostname bash[27409]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7f51910e504d]

Oct 11 13:46:38 hostname bash[27409]: 4: /lib64/librbd.so.1(+0x51ada7) [0x7f5181bf1da7]

Oct 11 13:46:38 hostname bash[27409]: 6: /lib64/python3.9/site-packages/rbd.cpython-39-x86_64-linux-gnu.so(+0x630bc) [0x7f5181e7c0bc]

Oct 11 13:46:38 hostname bash[27409]: 8: PyVectorcall_Call()

Oct 11 13:46:38 hostname bash[27409]: 9: /lib64/python3.9/site-packages/rbd.cpython-39-x86_64-linux-gnu.so(+0x44d50) [0x7f5181e5dd50]

Oct 11 13:46:38 hostname bash[27409]: 10: _PyObject_MakeTpCall()

Oct 11 13:46:38 hostname bash[27409]: 11: /lib64/libpython3.9.so.1.0(+0x125133) [0x7f5191c0a133]

Oct 11 13:46:38 hostname bash[27409]: 12: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 14: _PyFunction_Vectorcall()

Oct 11 13:46:38 hostname bash[27409]: 17: /lib64/libpython3.9.so.1.0(+0x11cb73) [0x7f5191c01b73]

Oct 11 13:46:38 hostname bash[27409]: 18: /lib64/libpython3.9.so.1.0(+0x125031) [0x7f5191c0a031]

Oct 11 13:46:38 hostname bash[27409]: 19: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 20: /lib64/libpython3.9.so.1.0(+0x11cb73) [0x7f5191c01b73]

Oct 11 13:46:38 hostname bash[27409]: 21: /lib64/libpython3.9.so.1.0(+0x125031) [0x7f5191c0a031]

Oct 11 13:46:38 hostname bash[27409]: 22: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 23: /lib64/libpython3.9.so.1.0(+0x10ec35) [0x7f5191bf3c35]

Oct 11 13:46:38 hostname bash[27409]: 25: /lib64/libpython3.9.so.1.0(+0x125031) [0x7f5191c0a031]

Oct 11 13:46:38 hostname bash[27409]: 26: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 27: /lib64/libpython3.9.so.1.0(+0x10ec35) [0x7f5191bf3c35]

Oct 11 13:46:38 hostname bash[27409]: 29: /lib64/libpython3.9.so.1.0(+0x125031) [0x7f5191c0a031]

Oct 11 13:46:38 hostname bash[27409]: 30: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 31: /lib64/libpython3.9.so.1.0(+0x10ec35) [0x7f5191bf3c35]

Oct 11 13:46:38 hostname bash[27409]: debug 0> 2024-10-11T11:46:38.413+0000 7f4e2f823640 -1 *** Caught signal (Aborted) **

Oct 11 13:46:38 hostname bash[27409]: 1: /lib64/libc.so.6(+0x3e6f0) [0x7f5190a8e6f0]

Oct 11 13:46:38 hostname bash[27409]: 2: /lib64/libc.so.6(+0x8b94c) [0x7f5190adb94c]

Oct 11 13:46:38 hostname bash[27409]: 3: raise()

Oct 11 13:46:38 hostname bash[27409]: 4: abort()

Oct 11 13:46:38 hostname bash[27409]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7f51910e50a7]

Oct 11 13:46:38 hostname bash[27409]: 6: /usr/lib64/ceph/libceph-common.so.2(+0x16b20b) [0x7f51910e520b]

Oct 11 13:46:38 hostname bash[27409]: 7: /lib64/librbd.so.1(+0x193403) [0x7f518186a403]

Oct 11 13:46:38 hostname bash[27409]: 9: rbd_diff_iterate2()

Oct 11 13:46:38 hostname bash[27409]: 11: /lib64/libpython3.9.so.1.0(+0x11d7a1) [0x7f5191c027a1]

Oct 11 13:46:38 hostname bash[27409]: 13: /lib64/python3.9/site-packages/rbd.cpython-39-x86_64-linux-gnu.so(+0x44d50) [0x7f5181e5dd50]

Oct 11 13:46:38 hostname bash[27409]: 14: _PyObject_MakeTpCall()

Oct 11 13:46:38 hostname bash[27409]: 16: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 17: /lib64/libpython3.9.so.1.0(+0x10ec35) [0x7f5191bf3c35]

Oct 11 13:46:38 hostname bash[27409]: 18: _PyFunction_Vectorcall()

Oct 11 13:46:38 hostname bash[27409]: 19: /lib64/libpython3.9.so.1.0(+0x125031) [0x7f5191c0a031]

Oct 11 13:46:38 hostname bash[27409]: 20: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 21: /lib64/libpython3.9.so.1.0(+0x11cb73) [0x7f5191c01b73]

Oct 11 13:46:38 hostname bash[27409]: 22: /lib64/libpython3.9.so.1.0(+0x125031) [0x7f5191c0a031]

Oct 11 13:46:38 hostname bash[27409]: 24: /lib64/libpython3.9.so.1.0(+0x11cb73) [0x7f5191c01b73]

Oct 11 13:46:38 hostname bash[27409]: 26: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 27: /lib64/libpython3.9.so.1.0(+0x10ec35) [0x7f5191bf3c35]

Oct 11 13:46:38 hostname bash[27409]: 29: /lib64/libpython3.9.so.1.0(+0x125031) [0x7f5191c0a031]

Oct 11 13:46:38 hostname bash[27409]: 30: _PyEval_EvalFrameDefault()

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 none

Oct 11 13:46:38 hostname bash[27409]: 0/ 1 context

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 mds_balancer

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 mds_log

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 mds_log_expire

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 mds_migrator

Oct 11 13:46:38 hostname bash[27409]: 0/ 1 buffer

Oct 11 13:46:38 hostname bash[27409]: 0/ 1 timer

Oct 11 13:46:38 hostname bash[27409]: 0/ 1 objecter

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 rados

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 rbd_mirror

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 rbd_replay

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 rbd_pwl

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 journaler

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 immutable_obj_cache

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 osd

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 objclass

Oct 11 13:46:38 hostname bash[27409]: 0/ 0 ms

Oct 11 13:46:38 hostname bash[27409]: 0/10 monc

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 paxos

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 tp

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 crypto

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 heartbeatmap

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 rgw_sync

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 rgw_datacache

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 rgw_flight

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 asok

Oct 11 13:46:38 hostname bash[27409]: 1/ 1 throttle

Oct 11 13:46:38 hostname bash[27409]: 0/ 0 refs

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 compressor

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 bluestore

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 kstore

Oct 11 13:46:38 hostname bash[27409]: 4/ 5 rocksdb

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 fuse

Oct 11 13:46:38 hostname bash[27409]: 2/ 5 mgr

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 test

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_onode

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_odata

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_t

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_cleaner

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_epm

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_lba

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_fixedkv_tree

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_cache

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 seastore_device

Oct 11 13:46:38 hostname bash[27409]: 0/ 5 cyanstore

Oct 11 13:46:38 hostname bash[27409]: 1/ 5 ceph_exporter

Oct 11 13:46:38 hostname bash[27409]: -2/-2 (syslog threshold)

Oct 11 13:46:38 hostname bash[27409]: 99/99 (stderr threshold)

Oct 11 13:46:38 hostname bash[27409]: 7f4e1a47a640 / io_context_pool

Oct 11 13:46:38 hostname bash[27409]: 7f4e1b47c640 / safe_timer

Oct 11 13:46:38 hostname bash[27409]: 7f4e1e000640 / ms_dispatch

Oct 11 13:46:38 hostname bash[27409]: 7f4e1f803640 / io_context_pool

Oct 11 13:46:38 hostname bash[27409]: 7f4e2b01a640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e2c01c640 / dashboard

Oct 11 13:46:38 hostname bash[27409]: 7f4e2c81d640 / dashboard

Oct 11 13:46:38 hostname bash[27409]: 7f4e2e020640 / dashboard

Oct 11 13:46:38 hostname bash[27409]: 7f4e2e821640 / dashboard

Oct 11 13:46:38 hostname bash[27409]: 7f4e2f022640 / dashboard

Oct 11 13:46:38 hostname bash[27409]: 7f4e2f823640 / dashboard

Oct 11 13:46:38 hostname bash[27409]: 7f4e31827640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e32829640 / prometheus

Oct 11 13:46:38 hostname bash[27409]: 7f4e3402c640 / prometheus

Oct 11 13:46:38 hostname bash[27409]: 7f4e36030640 / prometheus

Oct 11 13:46:38 hostname bash[27409]: 7f4e36831640 / prometheus

Oct 11 13:46:38 hostname bash[27409]: 7f4e38034640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e3a9b8640 /

Oct 11 13:46:38 hostname bash[27409]: 7f4e3e1bf640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e42b08640 / safe_timer

Oct 11 13:46:38 hostname bash[27409]: 7f4e43b0a640 / ms_dispatch

Oct 11 13:46:38 hostname bash[27409]: 7f4e453cd640 / io_context_pool

Oct 11 13:46:38 hostname bash[27409]: 7f4e45c0e640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e47c12640 /

Oct 11 13:46:38 hostname bash[27409]: 7f4e4a417640 /

Oct 11 13:46:38 hostname bash[27409]: 7f4e4ac18640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e4c41b640 / safe_timer

Oct 11 13:46:38 hostname bash[27409]: 7f4e4d41d640 / ms_dispatch

Oct 11 13:46:38 hostname bash[27409]: 7f4e4ec20640 / io_context_pool

Oct 11 13:46:38 hostname bash[27409]: 7f4e51465640 / prometheus

Oct 11 13:46:38 hostname bash[27409]: 7f4e5552d640 / pg_autoscaler

Oct 11 13:46:38 hostname bash[27409]: 7f4e5652f640 /

Oct 11 13:46:38 hostname bash[27409]: 7f4e58d34640 /

Oct 11 13:46:38 hostname bash[27409]: 7f4e5a537640 /

Oct 11 13:46:38 hostname bash[27409]: 7f4e5bd7a640 / devicehealth

Oct 11 13:46:38 hostname bash[27409]: 7f4e5fd82640 / crash

Oct 11 13:46:38 hostname bash[27409]: 7f4e60d84640 / cephadm

Oct 11 13:46:38 hostname bash[27409]: 7f4e62587640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e64e0c640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e65e0e640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e66e10640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e67611640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e68613640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e68e14640 / mgr-fin

Oct 11 13:46:38 hostname bash[27409]: 7f4e69e96640 / balancer

Oct 11 13:46:38 hostname bash[27409]: 7f4e6dede640 / cmdfin

Oct 11 13:46:38 hostname bash[27409]: 7f4e6f6e1640 / ms_dispatch

Oct 11 13:46:38 hostname bash[27409]: 7f51864f4640 / safe_timer

Oct 11 13:46:38 hostname bash[27409]: 7f518acfd640 / ms_dispatch

Oct 11 13:46:38 hostname bash[27409]: 7f518e504640 / msgr-worker-1

Oct 11 13:46:38 hostname bash[27409]: 7f518ed05640 / msgr-worker-0

Oct 11 13:46:38 hostname bash[27409]: max_recent 10000

Oct 11 13:46:38 hostname bash[27409]: max_new 1000

Oct 11 13:46:38 hostname bash[27409]: log_file /var/lib/ceph/crash/2024-10-11T11:46:38.415833Z_b3978f24-6697-44f5-80dc-4915b5ec144d/log

Oct 11 13:46:38 hostname bash[27409]: --- end dump of recent events ---

1 Comment

2024/10/11
12:22 UTC

Erasure coding 2 + 4 scheme

Very rudimentary question, does erasure coding scheme of 2 data chunks (k) and 4 coding chunks (m) able to withstand loss of any 4 chunks irrespective of both data chunks loss i.e with just two coding chunks. If yes, what kind of data would be stored in the coding chunks that it can reconstruct the original data chunks.

21 Comments

2024/10/10
14:57 UTC

5,387 Subscribers

setting up dovecot wto save vitualmailbox on a ceph cluster

Change smartctl

Confusing 'ceph df' output

Ceph - poor write speed - NVME

Yet another ghetto-style 3-node Proxmox/Ceph cluster—what would it take to make this viable?

goal

basic hardware

disks

Performance question

Creating a Simple Local Dev Environment for Ceph Object Storage Testing

Ceph squid no disks available

Proxmox CEPH 3 node - all NVMe setup - slow disk speeds

Is this Crazy? Raid 0 smaller disks to match size of new hdds.

What are the CEPH bottlenecks? CPU? Storage type? Network speed?

Requests for experiments on CephFS testing cluster?

Ceph cluster design choices that are permanent or hard-set?

OptiPlex 5040 SFF as a node?

Cannot change pool layout of *empty* dir and file

not convinced ceph is using my 10gb nics, seems like its using them at 1gb speed

Need Help. Replace nodes when quorum have died!

Advice on structure with this gear

One of the most annoying Health_Warn messages that won't go away, client failing to respond to cache pressure.

Achieving Single-Node Survivability in a 4-Node Storage Cluster

Having issues getting a ceph cluster off the ground. OSD failing to add.

Inconsistent pg -> failed repair -> pg down -> OSDs restart during backfilling

I marked osd as lost. Can I readd the OSD?

Migrated to Ceph, server by server now all done. Is this setup enough to setup EC 4 + 2?

Where can I find the Ceph Grafana dashboards that come by default with Cephadm?

brand new setup ceph

Taming ceph logging -- Journal priorities out of whack?

CRUSH rule resulted in duplicated OSD for PG.

Gui block images --> Restart mgr service

Erasure coding 2 + 4 scheme

Cannot change pool layout of empty dir and file