nvme-of

prepare

1
2
3
4
5
6
7
8
apt install nvme-cli

# 如果遇到 modprobe: FATAL: Module nvme_tcp not found in directory /lib/modules/5.15.0-122-generic
# 需要安装 apt install linux-modules-extra-5.15.0-122-generic
modprobe nvme_tcp

modprobe nvme_fabrics
# nvme connect -t tcp -a 10.220.8.57 -s 4420 -n nqn.2025-11.io.spdk:cnode-test-volume

install

system service

  • step1: config nvmeof info

    vim /etc/nvme/discovery.conf

    1
    -t tcp -a 10.220.32.14 -s 4420 -n nqn.2025-11.io.spdk:cnode-test-volume
  • step2: create systemd unit

    vim /etc/systemd/system/nvmeof-connect.service

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    [Unit]
    Description=NVMe-oF TCP Auto Connect
    After=network-online.target
    Wants=network-online.target

    [Service]
    Type=oneshot
    RemainAfterExit=yes
    ExecStart=/usr/sbin/nvme connect \
    -t tcp \
    -a 10.220.32.14 \
    -s 4420 \
    -n nqn.2025-11.io.spdk:cnode-test-volume \
    --keep-alive-tmo=30
    ExecStop=/usr/sbin/nvme disconnect \
    -n nqn.2025-11.io.spdk:cnode-test-volume
    TimeoutSec=60

    [Install]
    WantedBy=multi-user.target
  • step3: enable service

    1
    2
    3
    systemctl daemon-reload
    systemctl enable nvmeof-connect.service
    systemctl start nvmeof-connect.service

command

pv

1
2
3
# flush cache
pvscan --cache
lvscan

tags

1
2
3
4
5
6
7
8
# Show LV name + tags for all LVs in VG , add  --noheadings  if you don’t want the header line).
lvs csi-lvm -o lv_name,lv_tags

# Check specify lv tag
lvs -o tags /dev/vg_name/lv_name

# delete tag
lvchange --deltag node/xxx vg_name/lv_name

active

1
2
3
4
5
# activate specific multiple LVs explicitly
lvchange -ay vg_data/lv1 vg_data/lv2 vg_data/lv3

# disactivate specific multiple LVs explicitly
lvchange -an vg_data/lv1 vg_data/lv2 vg_data/lv3
  • Inactive LV → no device-mapper node → no /dev/mapper/* file

expand

1
2
3
4
5
6
7
8
9
10
# device
# tell LVM to use the new space. This updates the PV metadata to claim the new free space.
sudo pvresize /dev/nvmexxx

# lv
# adds 10GB to the logical volume
lvextend -L +10G /dev/vg_name/lv_name

# expands the LV to exactly xxx total
lvextend -L xxxG /dev/vg_name/lv_name

troubleshooting

节点lv mapper没有清理干净

虽然 lvscan 不再显示lv,但是lsblk -f 还是有显示 lv的信息

原因是 deletelv pod 是随机调度到任意一个节点上的,虽然内有 dmsetup remove 的逻辑,但也仅限于作用于当前被调度到的节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 现象
nvme0n1
│ LVM2_m LVM2 7CKk4S-JMin-iUPj-iTmS-1Qqc-pNDf-UkRH6E
└─csi--lvm-pvc--115f834a--c2bd--4206--852d--0141f6ee4937
ext4 1.0 40c1ff24-7a27-4085-9ed3-9a1b2e5daacb

# 正确操作(抹除之前lv的元数据)
# • Recreation: When you recreate the LV, LVM allocates physical blocks on the disk. If it allocates the exact same blocks (which is very likely if you just deleted and recreated it), the new LV points to the exact same physical location.

# The actual data—including the filesystem Superblock which stores the UUID—is still physically written on the storage disk.
# 1. Wipe the filesystem signature (removing the UUID)
wipefs -a /dev/csi-lvm/pvc-115f834a...
# 2. Now remove the mapping
dmsetup remove csi--lvm-pvc--115f834a...
1
2
3
4
# 错误操作 (wrong operate)
# 'dmsetup remove' only removes the “mapping” (the virtual device path in the kernel) that allows you to access the data. This is like unplugging a USB cable, It disconnects the device from the OS view, but the data stays on the “stick.”
# dmsetup remove : Only removes the kernel device mapper entry. The Logical Volume definition typically still exists in LVM metadata, and the data is definitely still on disk.
dmsetup remove csi--lvm-pvc--115f834a--c2bd--4206--852d--0141f6ee4937

lsblk -f 无法显示新的设备

Force Client to See the New Volume

  1. Find the NVMe Device Name:
    Run ls /dev/nvme*. You are looking for the controller character device, likely /dev/nvme0 or /dev/nvme1 (not ending in n1).

  2. Rescan:

    1
    2
    # Replace with your actual controller device
    nvme ns-rescan /dev/nvmex
  3. Check Again:
    Now run lsblk. The device (e.g., nvmex) should appear.

lvs Checksum error

由于nvmeof 只能同时只有一个节点可以操作数据,如果多个节点同时操作数据,则会造成冲突

issue

1
2
3
4
5
6
7
/dev/nvme4n4: Checksum error at offset 372224
WARNING: invalid metadata text from /dev/nvme4n4 at 372224.
WARNING: metadata on /dev/nvme4n4 at 372224 has invalid summary for VG.
WARNING: bad metadata text on /dev/nvme4n4 in mda1
WARNING: scanning /dev/nvme4n4 mda1 failed to read metadata summary.
WARNING: repair VG metadata on /dev/nvme4n4 with vgck --updatemetadata.
WARNING: scan failed to get metadata summary from /dev/nvme4n4 PVID 0y2OB6itmv7xauOycVUpvAiAarPLUI6o

solution

  • step1: Dump the Metadata to a File

    1
    2
    # Since the system cannot read the VG, use the low-level `pvck` tool to extract the configuration text stored on the drive.
    pvck --dump metadata_search /dev/nvme4n4 > /tmp/csi-lvm_search.vg

    csi-lvm_search.vg as below

    1
    2
    3
    4
    5
    6
    Searching for metadata at offset 4096 size 1044480
    metadata at 17408 length 31448 crc 9fb93ed6 vg csi-lvm seqno 31081 id UShLSg-JLRf-Agkb-YRQK-asNV-GSFn-IRqbDH
    ......
    metadata at 339456 length 32289 crc 4cc1a315 vg csi-lvm seqno 31091 id UShLSg-JLRf-Agkb-YRQK-asNV-GSFn-IRqbDH
    🚨metadata at 372224 length 32281 crc 3ab9a3d8 vg csi-lvm seqno 31092 id UShLSg-JLRf-Agkb-YRQK-asNV-GSFn-IRqbDH
    ......

    Key finding:

    • The corrupted one is at 372224.
    • A newer valid one (higher seqno 31093) exists at 404992.
    • We must extract 404992.
  • step2: Stop all I/O to this NVMe-oF LUN

    Make sure no pod/host is using the LV/VG, and this NVMe namespace is not attached read-write from two nodes at the same time.

  • step3: Extract the Raw Metadata Text

    1
    2
    # Run this command to copy exactly 31,840 bytes from offset 404992:
    dd if=/dev/nvme4n4 bs=1 skip=404992 count=31840 of=/tmp/csi-lvm_fixed.vg

    csi-lvm_fixed.vg as below

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    csi-lvm {
    id = "UShLSg-JLRf-Agkb-YRQK-asNV-GSFn-IRqbDH"
    seqno = 31093
    format = "lvm2"
    status = ["RESIZEABLE", "READ", "WRITE"]
    ...

    physical_volumes {

    pv0 {
    id = "0y2OB6-itmv-7xau-OycV-UpvA-iAar-PLUI6o"
    device = "/dev/nvme4n4"
    ...
    }
    }

    logical_volumes {

    pvc-a75bd3c3-b2f8-4ae1-904b-af95c5ed1ad6 {
    id = "RwMGsf-R7GE-GiyI-jw3V-J4To-S8aX-wrrFTr"
    status = ["READ", "WRITE", "VISIBLE"]
    ...

    pvc-6cee2171-b630-4756-8f05-2727f2f53d09 {
    id = "jg1OEJ-eLzj-FS41-3Z2L-3kyT-IJtV-Ur7x59"
    status = ["READ", "WRITE", "VISIBLE"]
    ...
  • step4: Wipe the Corrupted Signature

    We will wipe the first 1MB of the disk to remove the broken LVM label and metadata. Do not worry, this only removes the LVM headers (which are already broken); your data is safely located further down (at PE start 2048 sectors = 1MB).

    1
    dd if=/dev/zero of=/dev/nvme4n4 bs=1M count=1
  • step5: Re-create the PV Header

    This fixes the “No metadata areas” error by creating a new, valid metadata area.
    This rewrites the header (label) and creates a fresh metadata area without touching your data.

    1
    2
    pvcreate --uuid "<YOUR-UUID>" --restorefile /tmp/csi-lvm_fixed.vg /dev/nvme4n4
    # actual: pvcreate --uuid "0y2OB6-itmv-7xau-OycV-UpvA-iAar-PLUI6o" --restorefile /tmp/csi-lvm_fixed.vg /dev/nvme4n4
  • step6: Restore the VG Configuration

    1
    vgcfgrestore -f /tmp/csi-lvm_fixed.vg csi-lvm
  • step7: Verify

    1
    lvs

best practices

discard the space after removing the LV

method1: Automatic (after execute lvremove /dev/vg_name/lv_name)

  1. Edit the config file /etc/lvm/lvm.conf

  2. Change the value, set issue_discards = 1 (The default is usually 0).

method2: Manual

1
2
3
# manually sends the trim command while the volume still exists, then removes it. This achieves the same result without changing  lvm.conf
blkdiscard /dev/vg_name/lv_name
lvremove /dev/vg_name/lv_name

wipefs vs desetup vs lvremove

  • wipefs : Just removes labels. Data remains. Space is not freed on server.
  • blkdiscard : Wipes data. Sends TRIM. Space is freed on server (if supported).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
│ (mount, read/write files) │
└─────────────────────────────────────────────────────────────┘


┌─────────┴─────────┐
│ wipefs -a xxx │ ← Erases filesystem signature
└─────────┬─────────┘ (XFS/ext4 magic bytes)

┌─────────────────────────────────────────────────────────────┐
│ Filesystem Layer │
│ (XFS superblock, UUID, LABEL, etc.) │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Block Device Layer │
│ /dev/mapper/vg-lv │
│ /dev/dm-X │
└─────────────────────────────────────────────────────────────┘


┌─────────┴─────────┐
│ dmsetup remove xxx│ ← Removes device mapper entry
└─────────┬─────────┘ (kernel memory cleanup)

┌─────────────────────────────────────────────────────────────┐
│ Device Mapper Layer │
│ (Kernel creates /dev/mapper devices) │
└─────────────────────────────────────────────────────────────┘


┌─────────┴─────────┐
│ lvremove xxx │ ← Removes LVM metadata
└─────────┬─────────┘ (VG metadata update)

┌─────────────────────────────────────────────────────────────┐
│ LVM Metadata Layer │
│ (PV/VG/LV configuration on disk) │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Physical Storage Layer │
│ (NVMe-oF device, disk sectors) │
└─────────────────────────────────────────────────────────────┘