Egon Rijpkema
|
9b3d8093a8
|
Changed minimum storage for trigger.
|
1 month ago |
Egon Rijpkema
|
bf0d61b21c
|
Added pg-lustre.yml
|
1 month ago |
Egon Rijpkema
|
c751276347
|
Added node exporter of pg lustre components.
|
1 month ago |
E.M.A. Rijpkema
|
493afa29ae
|
Merge pull request 'Remove ESX vulture nodes from Ansible hosts file' (#28) from fix/remove_vcpu_nodes into master
Reviewed-on: #28
|
4 months ago |
B.E. Droge
|
8833b57a68
|
remove esx nodes from vulture
|
4 months ago |
B.E. Droge
|
efe46a94f0
|
Merge pull request 'Removed vulture nodes that have been terminated.' (#27) from fix/remove-vulture into master
Reviewed-on: #27
|
4 months ago |
Egon Rijpkema
|
9790dc00ae
|
Removed vulture nodes that have been terminated.
|
4 months ago |
H. Meijering
|
002d629276
|
Merge pull request 'Remove ESX pg-nodes' (#26) from remove_esx_nodes into master
Reviewed-on: #26
|
5 months ago |
B.E. Droge
|
d06255b17f
|
remove esx pg-nodes
|
5 months ago |
Egon Rijpkema
|
ffd0540f0d
|
Removed nodes that are now longer in use.
|
5 months ago |
B.E. Droge
|
7da275e513
|
Merge pull request 'Added additional GPU nodes with ssd (labeled as nvme) disks.' (#25) from feature/gpu_nvme into master
Reviewed-on: #25
|
5 months ago |
F. Dijkstra
|
54657daab0
|
Added additional GPU nodes with ssd (labeled as nvme) disks.
|
5 months ago |
B.E. Droge
|
50539b1b2e
|
Merge pull request 'Changed login_checks.sh to the version used in production and modified the quota check.' (#24) from feature/login_script_default_quota into master
Reviewed-on: #24
|
5 months ago |
F. Dijkstra
|
4cfa01b162
|
Changed the quota check to also set quota when the quota are very
small. This allows for setting small default quota.
|
5 months ago |
F. Dijkstra
|
7ec294f30b
|
This is the actual login_checks.sh script which is in use on the
Peregrine cluster. It is unclear where the previous version came
from.
|
5 months ago |
Egon Rijpkema
|
53f9a22938
|
New extreme load alert.
|
5 months ago |
E.M.A. Rijpkema
|
b1c883dbd1
|
Merge pull request 'Moved nodes with broken IB to vulture partition.' (#23) from slurm_21.08 into master
Reviewed-on: #23
|
6 months ago |
F. Dijkstra
|
e7d0ac6708
|
Moved nodes with broken IB to vulture partition.
|
6 months ago |
B.E. Droge
|
df5dabb454
|
reqmem is now specified per job
|
6 months ago |
G.J.C. Strikwerda
|
86b74f07ad
|
Merge pull request 'Removed settings that are no longer available in Slurm 21.08' (#22) from slurm_21.08 into master
Reviewed-on: #22
|
7 months ago |
F. Dijkstra
|
697ac013b7
|
Removed users from PrivateData, as this affects the coordinator role.
|
7 months ago |
F. Dijkstra
|
120a0c4150
|
Removed settings that have been removed from Slurm 21.08
|
7 months ago |
B.E. Droge
|
3320c4d570
|
Merge pull request 'Increased timeout for not using the GPU to 4 hours' (#20) from feature/increased_gpu_timeout into master
Reviewed-on: #20
|
8 months ago |
F. Dijkstra
|
0e1fc73cca
|
Increased timeout for not using the GPU to 4 hours, since
AlphaFold needs several hours to initialize when reading its
data from Lustre, and we don't have alternative faster storage.
|
8 months ago |
Egon Rijpkema
|
700c7fd0a6
|
Updated prometheus documentation a little.
|
8 months ago |
E.M.A. Rijpkema
|
454061659a
|
Merge pull request 'Add PrivateData setting to slurmdbd.conf and slurm.conf' (#18) from feature/privatedata into master
Reviewed-on: #18
|
8 months ago |
F. Dijkstra
|
63d5c01d59
|
Added PrivateData setting to slurm.conf as setting it only in
slurmdbd.conf was not sufficient.
|
8 months ago |
F. Dijkstra
|
fe09b7faf5
|
Added users to PrivateData, as usage on itself did not have the
required effect.
|
8 months ago |
F. Dijkstra
|
80c0533eb6
|
Added the parameter PrivateData to prevent regular users from seeing
the cluster accounting data of other users.
|
8 months ago |
G.J.C. Strikwerda
|
32dc935e4c
|
Merge pull request 'Added tree to the list of tools.' (#17) from feature/tree into master
Reviewed-on: #17
|
9 months ago |
F. Dijkstra
|
2b0c012502
|
Added tree to the list of tools.
|
9 months ago |
Egon Rijpkema
|
5dc4274e96
|
Added new prometheus cert for knyft.
Not used in playbook.... yet...
|
10 months ago |
Egon Rijpkema
|
210c8a6911
|
Made build work again.
TODO: Find a better fork of lustre-exporter
|
11 months ago |
Egon Rijpkema
|
c70e4a4af9
|
Lustre exporter is extremely verbose.
we removed all the stdout logging.
|
11 months ago |
B.E. Droge
|
7e43402cb0
|
set pg-node247 and 269 to FUTURE
|
1 year ago |
B.E. Droge
|
3e775df7a7
|
remove dh-node11 and 19
|
1 year ago |
root
|
5074348f17
|
slurmd_restart should actually restart (not reload) slurmd
|
1 year ago |
root
|
6735ac1e69
|
add config tag to config-related steps, do restart of slurmd
|
1 year ago |
B.E. Droge
|
a4cb09cd33
|
Merge branch 'master' of ssh://git.web.rug.nl:222/HPC/pg-playbooks
|
1 year ago |
B.E. Droge
|
ecc56268c4
|
remove xdmod scripts
|
1 year ago |
B.E. Droge
|
39e4b8ad77
|
disable task affinity for cgroups
|
1 year ago |
B.E. Droge
|
df5090ca69
|
decrease tmpdisk values to a close power of 10
|
1 year ago |
B.E. Droge
|
c29670ceaf
|
Add TmpFS=/local and TmpDisk values for nodes
|
1 year ago |
root
|
41f075af42
|
fix syntax error
|
1 year ago |
root
|
1bb6ca0329
|
update db password
|
1 year ago |
root
|
b965d07018
|
split single slurm logrotate setting into two separate ones
|
1 year ago |
root
|
56ac7e9194
|
fix deprecation warning for loop in yum module
|
1 year ago |
root
|
a3eb7a3e72
|
bump slurm version
|
1 year ago |
B.E. Droge
|
c2fc2e779a
|
make slurm user owner of slurmdbd.conf
|
1 year ago |
B.E. Droge
|
e7cf23fb7d
|
change mode of slurmdbd.conf
|
1 year ago |