lvmlocked sanlock

LVM + lvmlockd (clvm replacement)

source code

1
2
3
4
5
6
7
8
9
10
https://sourceware.org/git/?p=lvm2.git
https://gitlab.com/lvmteam/lvm2
https://github.com/lvmteam/lvm2

https://android.googlesource.com/platform/external/lvm2/+/d44af0be2c6f4652eafd90a70e7ba5f24c0f6d5a/lib/locking/lvmlockd.c

# how lvmlockd  talks to  sanlock , look at the  daemons/lvmlockd/  directory in the source tree.
• Core Logic:  daemons/lvmlockd/lvmlockd-core.c 
• Sanlock Adapter:  daemons/lvmlockd/lvmlockd-sanlock.c 
• This file contains the C functions that translate LVM lock requests ( lock_lv ,  lock_vg ) into  sanlock_acquire and sanlock_release calls.

🚨prepare

  • ❗操作之后需要把之前冗余的 activate 的 lv 都设置成 inactive

    1
    2
    3
    4
    5
    # flite
    lvscan | grep '/csi-lvm/' | grep -v inactive

    # deactivate duplicate pv
    lvchange -an /dev/csi-lvm/pvc-xxx

install

1
2
# sanlock and lvmlocked
apt install lvm2 sanlock lvm2-lockd

lvmlocked

  • enable locked

    1
    2
    # vim /etc/lvm/lvm.conf
    use_lvmlockd = 1
  • check

    1
    lvmconfig --type current global/use_lvmlockd
  • restart lvmlockd

    1
    systemctl restart lvmlockd

sanlock

  • config host id

    1
    2
    3
    4
    5
    # vim /etc/lvm/lvmlocal.conf
    local {
    # Replace <nodeNum> with the unique integer for this node (e.g., 1, 2, 3...)
    host_id = <nodeNum>
    }
  • Restart

    1
    2
    3
    systemctl restart sanlock
    sleep 8
    systemctl restart lvmlockd

first node

Convert VG to sanlock

  • config sanlock

    1
    (optional) lvchange -an csi-lvm
  • skip global lockspace

    1
    2
    3
    4
    5
    6
    7
    vgchange --lock-type sanlock --lockopt skipgl csi-lvm

    # print
    WARNING: skipping global lock in lvmlockd.
    Logical volume "lvmlock" created.
    device-mapper: remove ioctl on (253:9) failed: Device or resource busy
    Volume group "csi-lvm" successfully changed

Start the Lockspace

Activate the lockspace for newly converted VG.This allows the local lvmlockd to manage it.

1
2
3
4
5
6
vgchange --lock-start csi-lvm

# print
Skipping global lock: lockspace not found or started
VG csi-lvm starting sanlock lockspace
Starting locking. Waiting for sanlock may take 20 sec to 3 min...

Enable the Global Lock

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# enable 
lvmlockctl --gl-enable csi-lvm

# verify
lvmlockctl -i

# 正常打印
VG csi-lvm lock_type=sanlock UShLSg-JLRf-Agkb-YRQK-asNV-GSFn-IRqbDH
LS sanlock lvm_csi-lvm
LK VG un ver 0 🚨 初始化之后,本来为0,等创建lv之后(lvcreate -L 10G -n lv-temp-1 csi-lvm)会出现数字
LK GL un ver 0

# Verify the Global Lock
sanlock client status

# verify
vgs -o vg_name,lock_type
or
vgs -o+lock_type,lock_args

other node

Start the Lockspace

1
2
3
4
5
6
vgchange --lock-start csi-lvm

# Verify the Global Lock
sanlock client status

vgs -o vg_name,lock_type

init lock

1
lvcreate -L 10G -n lv-temp-1 csi-lvm

best practices

查看当前节点持有的锁

1
sanlock client status

check lvmlockd status

1
systemctl status lvmlockd

配置新增节点

  • step1: 配置 nvmeof-connect 服务
  • step2: 配置 sanlocklvmlockd 服务
  • step3: 启动 csi-lvm locking

troubleshooting

lvscan 提示 Skipping global lock: storage failed for sanlock leases

issue:服务端nvmeof配置重置之后,且客户端在运行sanlock和lvmlockd服务前提下, 执行 lvscan提示

1
2
Skipping global lock: storage failed for sanlock leases
VG csi-lvm lock skipped: storage failed for sanlock leases

issue:check sanlock serivce by systemctl status sanlock

1
2
Feb 28 10:38:28 ccit-k8s-worker-51 sanlock[2977932]: 2026-02-28 10:38:28 8280483 [2978414]: s1 delta_renew read rv -2 offset 0 /dev/mapper/csi--lvm-lvmlock
Feb 28 10:38:28 ccit-k8s-worker-51 sanlock[2977932]: 2026-02-28 10:38:28 8280483 [2978414]: s1 renewal error -2 delta_length 10 last_success 7122795

solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# stop sanlock service
systemctl stop sanlock # 往往停不掉
# 强制杀掉sanlock进程,进程pid,可通过 ps -ef | grep sanlock 或 systemctl status sanlock 进行查看
kill -9 <sanlockPid>

# start sanlock service
systemctl start sanlock

# List DM devices
dmsetup ls

# Force Remove
dmsetup remove --force csi--lvm-lvmlock

# Drop the Stale Lockspace
lvmlockctl --drop csi-lvm

# Enable Locking
vgchange --lock-start csi-lvm

lvscan 提示 Global lock failed: error -221

1
2
3
4
5
6
7
8
9
# systemctl restart sanlock lvmlockd, 报错解决参考下文 “restart sanlock lvmlockd 服务 failed”
systemctl restart sanlock
sleep 5
systemctl restart lvmlockd

systemctl status sanlock lvmlockd

# lock-start 报错解决参考下文 “lock-start 提示 lockspace not found or started...”
vgchange --lock-start csi-lvm

trigger troubleshooting

restart sanlock lvmlockd 服务 failed

issue

1
2
3
4
5
Feb 10 16:25:33 ccit-k8s-worker-7 systemd[1]: sanlock.service: Unit process 1116433 (sanlock) remains running after unit stopped.
Feb 10 16:25:33 ccit-k8s-worker-7 systemd[1]: Failed to start Shared Storage Lease Manager.

Feb 10 16:30:29 ccit-k8s-worker-51 systemd[1]: lvmlockd.service: Unit process 1122393 (lvmlockd) remains running after unit stopped.
Feb 10 16:30:29 ccit-k8s-worker-51 systemd[1]: Failed to start LVM lock daemon.

solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# kill sanlock and lvmlockd process
kill -9 <lvmlockdPid>
kill -9 <sanlockPid>

# reset failed
systemctl reset-failed sanlock lvmlockd
# check
systemctl status sanlock lvmlockd

# start sanlock lvmlockd
systemctl start sanlock
sleep 8
systemctl start lvmlockd

# Enable Locking
vgchange --lock-start csi-lvm

lock-start 提示 lockspace not found or started…

issue: Skipping global lock: lockspace not found or started, VG csi-lvm start failed: lock manager sanlock is not running

solution: Clean Up Stale Device Mapper Entries

1
2
3
4
5
6
7
8
9
10
11
# List DM devices
dmsetup ls

# Force Remove
dmsetup remove --force csi--lvm-lvmlock

# Drop the Stale Lockspace
lvmlockctl --drop csi-lvm

# Enable Locking
vgchange --lock-start csi-lvm