Would like to move to to https://github.com/rug-cit-hpc/pg-playbooks but has large files...
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Egon Rijpkema 5dc4274e96 Added new prometheus cert for knyft. 6 days ago
documentation Forgot role.. 3 years ago
files modified lustre.conf 2 years ago
group_vars/all Added a slurmdbd storage pass 6 months ago
promtools Made build work again. 1 month ago
roles Added new prometheus cert for knyft. 6 days ago
.gitignore don't commit pyc files 4 years ago
ansible.cfg Using a prometheus server on knyft instead of prox 3 years ago
apps.yml Added nhc and /apps 2 years ago
common.yml These tasks are for all peregrine hosts. 4 years ago
compute_node.yml change node 2 years ago
disablerepo.yml update 2 years ago
etc_hosts.yml '/etc/hosts' should only be put on the peregrine parts. 2 years ago
firmware.yml add network firmware 2 years ago
firmware630.yml update 2 years ago
firmware920.yml update 2 years ago
firmware7425.yml update 2 years ago
gpu.yml Switched to singular. 4 years ago
gpu_detector.yml initial commit of gpu_detector 2 years ago
haswell_sym.yml Create directory if not present 2 years ago
hosts update 2 years ago
hosts-dev recipy to build slurm docker 4 years ago
inefficient_jobs_detector.yml Add inefficient jobs detector 1 year ago
interactive.yml Refactored the touch alert to own role. 4 years ago
ipmi_exporter.yml Added ipmi monitoring 3 years ago
kernel.yml update 2 years ago
kill_memory_hogs.yml Changed pg-login to pg-node003 2 years ago
ldap_client.yml ldap and lustre-client roles. 3 years ago
login.yml Refactored the touch alert to own role. 4 years ago
lustre_client.yml ldap and lustre-client roles. 3 years ago
lustre_exporter.yml Also on metadata 3 years ago
metadata.yml This playbook is still needed for the metadara role. 3 years ago
node_exporter.yml no longher using proxy 3 years ago
nvidia-exporter.yml '/etc/hosts' should only be put on the peregrine parts. 2 years ago
nvidia_smi_exporter.yml Added nvidia_smi_exporter for prometheus 3 years ago
pg-packages.yml Modified name of Euclid CVMFS package 1 year ago
pg-tools.yml remove old packages 2 years ago
prom_sql.yml Added role for prometheus sql exporter. 2 years ago
prometheus.yml Using a prometheus server on knyft instead of prox 3 years ago
readme.md Explaned a little bit more. 3 years ago
remount_apps.py When is say 10 seconds it should be 10 seconds. 2 years ago
sandybridge_sym.yml Renamed file to make it similar to other files 2 years ago
site.yml Added skylake_sym.yml, renamed sandybridge file 2 years ago
skylake_sym.yml Make symlinks in /software for Skylake nodes 2 years ago
slurm.yml Corrected SLURM role for scheduler in slurm.yml 2 years ago
slurm_client.yml only run slurm client playbook on non-storage nodes 1 year ago
slurm_exporter.yml NO proxy client needed anymore. 3 years ago
sudo_lecture.yml Added more appropriate message 2 years ago
test-hosts '/etc/hosts' should only be put on the peregrine parts. 2 years ago
tmp '/etc/hosts' should only be put on the peregrine parts. 2 years ago
umount.yml update 2 years ago
update.yml update 2 years ago
updateeth.yml update 2 years ago

readme.md

ansible playbooks for peregrine

This repository contains an inventory and ansible playbooks for the peregrine cluster.

Install slurm.

To install slurm server:

ansible-playbook  --vault-password-file=.vault_pass.txt  slurm.yml

Skip building of docker images.

The building of docker images takes al lot of time and is only nessecary when the docker file has been changed. You can skip this with the following command.

ansible-playbook --vault-password-file=.vault_pass.txt slurm.yml  --skip-tags build

Furthermore, you can prevent the services from starting inmediately by providing the --skip-tags start-service flag.

Setting the state of a single node.

If you want to bring a node's configuration up to date. For example after it has been rolled out via xcat, you can run the following command.
This will configure all state for that node. (node exporter for prometheus, if it is a gpu node, gpu monitoring etc)

ansible-playbook --limit pg-node023 site.yml