集群节点管理
目前集群系统可进行的节点操作如下:
info: 显示集群节点状态
del: 删除计算节点
[root@sonmi ~]# sonmictl node -h
Manage compute nodes
Usage:
sonmictl node [command]
Available Commands:
del Delete a compute node
info Display the status of cluster nodes
Flags:
-h, --help help for node
Use "sonmictl node [command] --help" for more information about a command.
[root@sonmi ~]# sonmictl node -h
Manage compute nodes
Usage:
sonmictl node [command]
Available Commands:
del Delete a compute node
info Display the status of cluster nodes
Flags:
-h, --help help for node
Use "sonmictl node [command] --help" for more information about a command.
查看集群节点状态
通过 sonmictl node info
可以查看集群中所有节点的状态,如下所示:
各列含义如下:
- NODENAME:节点名称
- CPU ALLOC/TOTAL:每个节点已分配给用户的CPU核心数/总核心数
- MEMORY FREE/TOTAL:每个节点空闲的内存/总内存
- CPU TEMPERATURE:对应节点的CPU温度
- CPU LOAD:对应节点的CPU负载
- CPU STATE:对应节点的状态,跟slurm中节点的状态对应
删除节点
通过执行sonmictl node del <COMPUTE-NODE>
命令就可以从集群中移除对应的节点 COMPUTE-NODE。在示例中通过以下命令从集群中移除compute-0-0节点:
sonmictl node del compute-0-0
sonmictl node del compute-0-0
重新注册节点
在将节点从集群中移除之后,如果想要将该节点重新注册到集群,可以登陆到已经删除的节点之上,使用 root 用户执行以下命令,重新注册到集群中:
sonmid-register
sonmid-register
下面例子中,从集群中移除节点compute-0-0之后,通过SSH登陆到compute-0-0节点,然后重新通过sonmid-register
注册回集群中。
[root@sonmi ~]# sonmictl node info
+-------------+-----------------+-------------------+-----------------+----------+-------+
| NODENAME | CPU ALLOC/TOTAL | MEMORY FREE/TOTAL | CPU TEMPERATURE | CPU LOAD | STATE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
| sonmi | 0 / 8 | 2.2GB / 3.6GB | 34.0°C|35.0°C | 0.00 | IDLE |
| compute-0-0 | 0 / 8 | 571MB / 1.7GB | 34.0°C|35.0°C | 2.00 | IDLE |
| compute-0-1 | 0 / 8 | 700MB / 1.7GB | 34.0°C|35.0°C | 2.00 | IDLE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
[root@sonmi ~]# sonmictl node del compute-0-0
Delete compute node: compute-0-0
[root@sonmi ~]# sonmictl node info
+-------------+-----------------+-------------------+-----------------+----------+-------+
| NODENAME | CPU ALLOC/TOTAL | MEMORY FREE/TOTAL | CPU TEMPERATURE | CPU LOAD | STATE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
| sonmi | 0 / 8 | 2.2GB / 3.6GB | 33.0°C|35.0°C | 0.00 | IDLE |
| compute-0-1 | 0 / 8 | 700MB / 1.7GB | 33.0°C|35.0°C | 2.00 | IDLE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
[root@sonmi ~]# ssh 10.1.1.2
╔╗╔╗╔╗──╔╗──────────────╔╗────╔═══╗─────────╔╗─╔╦═══╦═══╗
║║║║║║──║║─────────────╔╝╚╗───║╔═╗║─────────║║─║║╔═╗║╔═╗║
║║║║║╠══╣║╔══╦══╦╗╔╦══╗╚╗╔╬══╗║╚══╦══╦═╗╔╗╔╦╣╚═╝║╚═╝║║─╚╝
║╚╝╚╝║║═╣║║╔═╣╔╗║╚╝║║═╣─║║║╔╗║╚══╗║╔╗║╔╗╣╚╝╠╣╔═╗║╔══╣║─╔╗
╚╗╔╗╔╣║═╣╚╣╚═╣╚╝║║║║║═╣─║╚╣╚╝║║╚═╝║╚╝║║║║║║║║║─║║║──║╚═╝║
─╚╝╚╝╚══╩═╩══╩══╩╩╩╩══╝─╚═╩══╝╚═══╩══╩╝╚╩╩╩╩╩╝─╚╩╝──╚═══╝
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Tue Sep 26 14:37:09 2023 from 10.1.1.1
[root@compute-0-0 ~]# sonmid-register
[root@compute-0-0 ~]# exit
logout
Connection to 10.1.1.2 closed.
[root@sonmi ~]# sonmictl node info
+-------------+-----------------+-------------------+-----------------+----------+-------+
| NODENAME | CPU ALLOC/TOTAL | MEMORY FREE/TOTAL | CPU TEMPERATURE | CPU LOAD | STATE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
| sonmi | 0 / 8 | 2.2GB / 3.6GB | 34.0°C|36.0°C | 0.00 | IDLE |
| compute-0-0 | 0 / 8 | 573MB / 1.7GB | 34.0°C|36.0°C | 2.00 | IDLE |
| compute-0-1 | 0 / 8 | 700MB / 1.7GB | 34.0°C|36.0°C | 2.00 | IDLE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
[root@sonmi ~]# sonmictl node info
+-------------+-----------------+-------------------+-----------------+----------+-------+
| NODENAME | CPU ALLOC/TOTAL | MEMORY FREE/TOTAL | CPU TEMPERATURE | CPU LOAD | STATE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
| sonmi | 0 / 8 | 2.2GB / 3.6GB | 34.0°C|35.0°C | 0.00 | IDLE |
| compute-0-0 | 0 / 8 | 571MB / 1.7GB | 34.0°C|35.0°C | 2.00 | IDLE |
| compute-0-1 | 0 / 8 | 700MB / 1.7GB | 34.0°C|35.0°C | 2.00 | IDLE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
[root@sonmi ~]# sonmictl node del compute-0-0
Delete compute node: compute-0-0
[root@sonmi ~]# sonmictl node info
+-------------+-----------------+-------------------+-----------------+----------+-------+
| NODENAME | CPU ALLOC/TOTAL | MEMORY FREE/TOTAL | CPU TEMPERATURE | CPU LOAD | STATE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
| sonmi | 0 / 8 | 2.2GB / 3.6GB | 33.0°C|35.0°C | 0.00 | IDLE |
| compute-0-1 | 0 / 8 | 700MB / 1.7GB | 33.0°C|35.0°C | 2.00 | IDLE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
[root@sonmi ~]# ssh 10.1.1.2
╔╗╔╗╔╗──╔╗──────────────╔╗────╔═══╗─────────╔╗─╔╦═══╦═══╗
║║║║║║──║║─────────────╔╝╚╗───║╔═╗║─────────║║─║║╔═╗║╔═╗║
║║║║║╠══╣║╔══╦══╦╗╔╦══╗╚╗╔╬══╗║╚══╦══╦═╗╔╗╔╦╣╚═╝║╚═╝║║─╚╝
║╚╝╚╝║║═╣║║╔═╣╔╗║╚╝║║═╣─║║║╔╗║╚══╗║╔╗║╔╗╣╚╝╠╣╔═╗║╔══╣║─╔╗
╚╗╔╗╔╣║═╣╚╣╚═╣╚╝║║║║║═╣─║╚╣╚╝║║╚═╝║╚╝║║║║║║║║║─║║║──║╚═╝║
─╚╝╚╝╚══╩═╩══╩══╩╩╩╩══╝─╚═╩══╝╚═══╩══╩╝╚╩╩╩╩╩╝─╚╩╝──╚═══╝
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Tue Sep 26 14:37:09 2023 from 10.1.1.1
[root@compute-0-0 ~]# sonmid-register
[root@compute-0-0 ~]# exit
logout
Connection to 10.1.1.2 closed.
[root@sonmi ~]# sonmictl node info
+-------------+-----------------+-------------------+-----------------+----------+-------+
| NODENAME | CPU ALLOC/TOTAL | MEMORY FREE/TOTAL | CPU TEMPERATURE | CPU LOAD | STATE |
+-------------+-----------------+-------------------+-----------------+----------+-------+
| sonmi | 0 / 8 | 2.2GB / 3.6GB | 34.0°C|36.0°C | 0.00 | IDLE |
| compute-0-0 | 0 / 8 | 573MB / 1.7GB | 34.0°C|36.0°C | 2.00 | IDLE |
| compute-0-1 | 0 / 8 | 700MB / 1.7GB | 34.0°C|36.0°C | 2.00 | IDLE |
+-------------+-----------------+-------------------+-----------------+----------+-------+