linux file command

basic

awk

  • awk -v FS="输入分隔符" -v OFS='输出分隔符' '{if($1==$5) print $1,$5,$10}' filename

    查找filename文件(文件中列的分隔符为“输入分隔符”)中,每一行第一列和第五列相等的行,并输出第一列、第五列、第十列,切输出字段分隔符为“输出分隔符”。如果不配置FS和OFS,那么输入输出分隔符均默认为空

  • exclude a column with awk, 比如打印除第5列的其它所有列

    awk ‘{ $5=””; print }’ file

base64

  • 解码

    echo [base64-encoded-string] | base64 --decode

  • 编码

    echo "your string" | base64

chmod

使文件可以直接执行的命令:chmod +x filename

使所有用户对目录都有读写权限:sudo chmod ugo+rw /opt

1
2
3
4
r=4,w=2,x=1
若要rwx属性则4+2+1=7;
若要rw-属性则4+2=6;
若要r-x属性则4+1=7

copy

1
2
3
rsync -aP <sourceDir> <targetDir>

cp -r <sourceDir> <targetDir>

cut

1
echo "hello_world" | cut -c1-5   # Output: hello

rclone

Built-in retry and performance options

By default, rclone copy only transfers files and does not create empty directories at the destination. To ensure all directories (including empty ones) are copied, you need to add a specific flag --create-empty-src-dirs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# dir1 to dir2 注意传输的是 dir1中的文件到 dir2目录下(而不是传输dir1目录到dir2目录)
rclone copy dir1 dir2 \
--create-empty-src-dirs \
--transfers=32 --checkers=32 \
--fast-list --no-traverse \
--log-file=rclone_copy.log --progress

# one big file
rclone copy \
--progress \
--create-empty-src-dirs \# Monitor transfer
--buffer-size=512M \ # Increase buffer size (adjust based on RAM)
--transfers=10 \ # Parallel file transfers (uses CPU cores)
--multi-thread-streams=10 \ # Split file into concurrent streams
--retries=10 \ # Retry on failures
--low-level-retries=20 \ # Retry on transient errors
--fast-list \ # Speed up directory listings (if applicable)
/path/to/source/file \ # Source file path
remote:destination/path # Target remote

# e.g. big file
rclone copy \
--progress \
--create-empty-src-dirs \
--buffer-size=512M \
--transfers=5 \
--multi-thread-streams=5 \
--retries=5 \
--low-level-retries=10 \
--fast-list \
/mnt/tsinghua/newzip/omniobject3d.zip \
/mnt/disk2/tsinghua

# large small files
rclone copy /mnt/source/ /mnt/destination/ \
--progress \
--create-empty-src-dirs \
--transfers=30 \ # Max parallel file transfers
--checkers=30 \ # Parallel file checks/listing
--buffer-size=16M \ # Smaller buffers for small files
--multi-thread-streams=0 \ # Disable chunking (small files)
--fast-list \ # Reduce memory for directory listings
--no-traverse \ # Skip scanning destination
--max-backlog=100000 \ # Keep the pipeline full
--retries=10 \
--low-level-retries=20 \
--log-level=INFO \
--log-file=rclone.log \
--drive-use-trash=false \ # If using Google Drive (optional)
--checksum # Keep for integrity (or remove for speed)

parallel

1
find /path/to/source/ -maxdepth 1 -type d | parallel -j 8 'rclone copy {} /path/to/target/{/} --transfers=32 --checkers=32 --fast-list --buffer-size=512M --max-backlog=100000 --use-mmap --no-traverse --stats=30s --log-file=rclone_{/}.log'
  • find Part

    1
    -maxdepth 1 -type d: Finds top-level directories under point3r/ (excludes files at this level and the root itself).
  • parallel Part

    1
    2
    3
    -j 8: Runs 8 parallel rclone jobs, one per subdirectory.
    {}: Placeholder for each directory from find.
    {/}: Basename of the directory (e.g., subdir1 from the full path).
  • rclone copy Part

    1
    2
    3
    4
    5
    6
    7
    8
    9
    --transfers=16: 16 concurrent file transfers per rclone job.
    --checkers=32: 32 concurrent file checks (e.g., verifying existence or integrity).
    --fast-list: Uses a single API call to list files (faster for remote filesystems like CephFS).
    --buffer-size=512M: 512 MB buffer per transfer (reduces disk I/O waits).
    --max-backlog=100000: Allows up to 100,000 files to queue before processing.
    --use-mmap: Uses memory-mapped I/O for reads (can speed up large files).
    --no-traverse: Skips directory traversal for remote sources, relying on listings.
    --stats=30s: Shows progress every 30 seconds.
    --log-file=rclone_{/}.log: Logs each job to a file named after the subdirectory (e.g., rclone_subdir1.log).

delete

1. To delete all files in a directory except filename, type the command below:

1
rm -v !("filename")

2. To delete all files with the exception of filename1 and filename2:

1
rm -v !("filename1"|"filename2") 

3. The example below shows how to remove all files other than all .zip files interactively:

1
rm -i !(*.zip)

4. Next, you can delete all files in a directory apart from all .zip and .odt files as follows, while displaying what is being done:

1
rm -v !(*.zip|*.odt)

5. 删除指定目录下指定日期的目录,可以使用 findrm 命令来删除指定目录下指定日期的目录

1
find /path/to/directory -type d -mtime +365 -exec rm -rf {} \;

6. 删除指定目录下前一个星期的文件,可以使用 findrm 命令来删除指定目录下指定日期的文件

1
find /path/to/directory -type f -mtime +7 -exec rm {} \;

or

1
find /path/to/directory -type f -mtime +7 -delete
  • 可指定相关名称

    1
    find /var/log -name "*.log" -type f -mtime +30 

du

1
2
3
4
du -h --max-depth=1 --exclude='proc' --exclude='home' --exclude='mnt'

# Check actual disk usage
du -sh /path/to/mountpoint

dd

1
dd if=<input> of=<output> [options]

​ • if: Specifies the input file or device (e.g., /dev/sda or /path/to/file).

​ • of: Specifies the output file or device (e.g., /dev/sdb or /path/to/file).

​ • bs=SIZE: Block size, defining how much data to read and write at a time. It can be set to values like 512, 4M, etc. Example: bs=1M reads and writes 1 megabyte at a time.

​ • count=N: Limits the number of blocks copied. For example, count=100 copies 100 blocks of the size specified by bs.

​ • status=progress: Shows real-time progress while copying.

​ • conv=notrunc: Prevents truncation of the output file (useful when appending).

​ • conv=sync: Pads the input to the full block size with null bytes if necessary.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Copy specify file to specify path
sudo dd if=/path/to/image.iso of=/dev/sdX bs=4M status=progress
• if=/path/to/image.iso: Input is an ISO file.
• of=/dev/sdX: Output is the target USB device (e.g., /dev/sdb).
• bs=4M: Sets block size to 4MB for faster copying.
• status=progress: Shows copy progress.

# Clone a disk or partition to an image file
sudo dd if=/dev/sda of=/path/to/backup.img bs=1M status=progress
• This creates a raw image backup of /dev/sda (entire disk).

# Restore the image created in the above example
sudo dd if=/path/to/backup.img of=/dev/sda bs=1M status=progress

# Erase a Disk (Overwrite with Zeros)
sudo dd if=/dev/zero of=/dev/sdX bs=1M status=progress
• if=/dev/zero: Fills the disk with zeros.
• of=/dev/sdX: Target disk to be wiped.

# Create a File of a Specific Size
dd if=/dev/urandom of=randomfile.bin bs=1M count=1024 # Create a 1GB file of random data
• if=/dev/urandom: Input is random data.
• of=randomfile.bin: Output file.
• bs=1M count=1024: Creates a 1GB file (1024 blocks of 1MB each).

# Test Disk Write Speed
dd if=/dev/zero of=testfile bs=1M count=1024 conv=fdatasync status=progress # Measure disk performance by writing a 1GB file
• conv=fdatasync: Ensures data is fully written to disk before timing stops.

diff

1
2
# 省略相同内容的展示
diff -y --suppress-common-lines original.txt modified.txt

find

  • 查找具体文件
1
find / -name 文件名称
  • 统计目录下所有文件数量
1
find /path/to/directory -type f | wc -l
  • 查找目录
1
find </your/start/path> -type d -name <name> -exec dirname {} \;
  • 查找指定用户的文件
1
find ./* -user 用户名
  • 查找指定用户组的文件
1
find ./* -group 用户组
  • 匹配查找除了某个特定文件类型以外的所有文件,并将结果传递给 rm 命令进行删除

    1
    find . ! -name "*.txt" -delete
  • 匹配多个

    1
    find . ! \( -name "log4j*" -o -name "flink*" \)
  • 指定天数数据

    1
    2
    3
    4
    5
    # 查找30天之前的文件
    find <directory> -type f -name "*.tar.gz" -mtime +30

    # 查找30天之内的文件
    find <directory> -type f -name "*.tar.gz" -mtime -30

findmnt

列举所有挂载盘

1
2
3
4
5
# list all
findmnt

# check specify mnt
findmnt <path/to/mnt>

grep

  • 查找指定目录下的文件内容
1
2
3
4
grep -rn "info" *

# 指定检索目录,指定排出目录(exclude-dir可多次使用)
grep -rn "info" /home/rdx --exclude-dir="log"
  • 查询大文件里面的内容
1
2
3
4
5
// 使用管道符可以实现过滤既满足时间又满足ip的行。
grep -n -e “10.198.2.133” prometheus.log |grep -e “2019-09-24”|head -n 3

# -n 参数的作用是显示查找结果的所在行号
# -e 参数表示我们需要搜索的关键字,多个关键字就用多个 -e 参数
  • logical
1
2
3
4
5
6
7
8
# Match any of multiple strings (logical OR)
grep -E 'pattern1|pattern2|pattern3' filename

# Match lines containing all keywords (logical AND)
grep 'word1' file.txt | grep 'word2' | grep 'word3'

# Match lines exclude keywords (logical NOT)
grep -v "pattern" inputfile
  • 查看周围行数的内容

    1
    2
    3
    -A n # Show n lines after the match
    -B n # Show n lines before the match
    -C n # Show n lines around the match

ls

ls -lh以可读性G、M查看文件的大小

  • 创建软连接

    1
    ln  -s  [源文件或目录]  [目标文件或目录]
  • 查找指定目录的软连接文件

    1
    ls -alR | grep ^l
  • relink

    1
    2
    3
    ln -sf <new_target> <link_name>
    # -s: Create a symbolic link.
    # -f: Force overwriting an existing link.

nbsp

显示 nbsp

1
2
3
4
5
6
7
# 方式一
xxd <file> #

# 方式二
vim 模式
:set list
:set listchars+=nbsp:⦸ # 这个实际对应输入 :set listchars+=nbsp:^Vu29b8
Character Normal Space Non-Breaking Space
Unicode codepoint U+0020 U+00A0
UTF-8 representation 0x20 0xC2A0
Visible? No (looks like a space) No (looks like a space)
Breaks line? ✅ Yes ❌ No
YAML-safe? ✅ Yes ❌ No — may cause parse errors

sed

  • 替换字符

    linux环境:

    1
    sed -i 's/Search_String/Replacement_String/g' Input_File

    mac环境(需要设置备份,以防文件损坏)

    1
    sed -i .bak 's/Search_String/Replacement_String/g' Input_File
  • 删除指定多行

    1
    sed -i '1,5d' example.txt

sort

1
sort --parallel=8 -S 4G -T /data -k2,3 largefile.txt > sorted_file.txt

使用了8个线程并行排序,并且sort命令在排序过程中最多使用4GB的内存缓冲区。我们还使用了-T /data选项,指定sort命令使用/data目录来存储临时文件,而不是默认路径。

“-k1,2”表示先按照第1列排序,若第1列相同则按照第2列排序。

tar

  • c – Creates a new .tar archive file.

  • x — to untar or extract a tar file

  • v – Verbosely show the .tar file progress.

  • f – File name type of the archive file.

  • z — gzip archive file

  • j — bz2 feature compress and create archive file

  • t — to list the contents of tar archive file

1
2
# extract files to a specific destination directory
tar -xvf <archive.tar> -C <destination_directory>

tr

tr – translate or delete characters

  • 大小写转换

    1
    2
    cat file | tr A-Z a-z 
    cat file | tr a-z A-Z

wc

  • 语法
1
2
3
4
语法:wc [选项] 文件…
- c 统计字节数。
- l 统计行数。
- w 统计字数。
1
2
# 统计目录下所有文件数量
find /path/to/directory -type f | wc -l

wget

  • 下载指定目录
1
wget -r --no-parent http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/

senario

文件系统

1
2
# Check filesystem type
df -T /path/to/mountpoint

创建文件

1
2
# method 2
fallocate -l 1G fileName

dd

1
2
3
dd if=/dev/zero of=/path/to/directory/filename bs=block_size count=number_of_blocks
# Create a 100MB File in /tmp/
dd if=/dev/zero of=/tmp/testfile.img bs=1M count=100
  • if=/dev/zero: Input file (/dev/zero provides null bytes; use /dev/urandom for random data).
  • of=/path/to/directory/filename: Output file path and name.
  • bs: Block size (e.g., 1K, 1M, 1G).
  • count: Number of blocks to write

文件分割

1
2
3
4
5
6
7
split [-a] [-b] [-C] [-l] [要分割的文件名] [分割后的文件名前缀]
–version 显示版本信息
– 或者-l,指定每多少行切割一次,用于文本文件分割
-b 指定切割文件大小,单位 m 或 k
-C 与-b类似,但尽量维持每行完整性
-d 使用数字而不是字母作为后缀名
-a 指定后缀名的长度,默认为2位

将多个分割的文件进行合并

1
cat files_name_1 files_name_2 files_name_3 > files_name
  • 按行数分割

    1
    split -l 10000 bigfile.txt smallfile

    分割之后的文件不影响读取

  • 统计某个文件中的字符数,需要注意的是,如果文件中包含多字节字符(如中文),则每个字符将被视为多个字符来计算。

    1
    wc -c /path/to/file

    在这基础上,统计内容所占KB

    1
    wc -c /path/to/file | awk '{print $1/1024}'
  • awk对文件按照指定多列的内容进行排序

    1
    awk '{print $0}' head_100.csv | sort -t ',' -k2,3 > head_100_sort.csv

    并用sort命令根据指定列的内容进行排序。-t选项表示使用制表符作为字段分隔符,[列数]是你要排序的那一列,“-k1,2”表示先按照第1列排序,若第1列相同则按照第2列排序。

  • 统计字符的长度

    1
    echo 字符 | wc -m

批量替换文件名

1
rename -n -e 's/待替换字符串/替换字符串/'  *.png

转换文件编码格式

  • 查看编码

    1
    2
    3
    4
    5
    # vim
    :set fileencoding

    # file
    file -I filename
  • 转换编码

然后使用 iconv 进行编码格式的转换,比如将一个 utf-8 编码的文件转换成 GBK 编码,命令如下:

1
$ iconv -f UTF-8 -t GBK input.file -o output.file
  • 如果遇到]iconv: 未知xxxx处的非法输入序列,一种解决方法是加入 -c选项:忽略无效字符

    1
    iconv -c  -f gb2312 -t utf8 test.txt -o output.file
1
2
3
iconv -f gb18030 -t UTF-8 input.file -o output.file

gb18030

格式化json

1
echo '{"kind": "Service", "apiVersion": "v1", "status": {"loadBalancer": true}}'|jq .

加密

用zip命令对文件加密压缩和解压

1
2
zip -re filename.zip filename 
回车,输入2次密码

markdown转word

1
pandoc -o output.docx -f markdown -t docx filename.md

troubleshooting

磁盘展示空间和实际空间不符

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# check deleted process
sudo lsof | grep deleted

# Analyzing the Output
wsssr_def 1368270 1368507 wsssr_def root 7w REG 259,3 0 1180232 /usr/local/wsssr_defence_agent/agent_init (deleted)

# serivce
systemctl status wsssr.service
systemctl status wsssr_guard.service

# the role of wsssr_defence_service
The `wsssr_defence_service` is a Linux security agent process responsible for system protection, monitoring, and logging, running under root with broad visibility and control over system activities

Breaking it down:
Command: wsssr_def (shortened process name, likely wsssr_defence_agent or similar).
PID: 1368270 (process ID).
PPID: 1368507 (parent process ID).
User: root (running as root).
FD: 7w (file descriptor 7, open for writing).
Type: REG (regular file).
Device: 259,3 (device number, matching /dev/nvme0n1p2—your root partition).
Size: 0 (reported size is 0 bytes—see below for why this is misleading).
Node: 1180232 (inode number).
File: /usr/local/wsssr_defence_agent/agent_init (deleted) (the deleted file).

# resolve
kill -9 1368270

拷贝之后,空格编码不一致

替换 no-breaking spaces(nbsp) 为 正常空格

1
sed -i  's/\xC2\xA0/ /g' <file>

清理挂载盘之前相同位置的数据

比如之前数据存放在 /mnt/disk0/data (使用了系统盘/ ,挂载了磁盘vda) ,但是后来挂载了新磁盘vdb到 /mnt/disk0。这样,原来 /mnt/disk0/data的数据还存在在系统盘中。

1
2
3
4
5
sudo mkdir /mnt/old_disk0
sudo mount --bind / /mnt/old_disk0
sudo rm -rf /mnt/old_disk0/mnt/disk0/*
sudo umount /mnt/old_disk0
sudo rmdir /mnt/old_disk0

tree命令报错

  • error

    1
    .  [error opening dir]
  • locate

    1
    2
    3
    4
    5
    dmesg

    # print
    [5309728.436569] audit: type=1400 audit(1754363848.580:151): apparmor="DENIED" operation="open" profile="snap.tree.tree" name="/mnt/dingofs/dingospeed/data/repos/repos/files/models
    /" pid=945507 comm="tree" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
  • resolve

    1
    2
    3
    4
    snap remove tree
    apt install tree
    # Refresh Bash’s command cache for the current shell
    hash -r