JASMIN update on forthcoming maintenance periods May-June 2022
Posted on May 16, 2022 (Last modified on October 19, 2023) • 3 min read • 506 wordsDear users,
Please note the current/upcoming maintenance work affecting JASMIN:
This week: SLURM scheduler upgrade in progress
LOTUS batch processing cluster unavailable for most of this week (see previous notice)
Tape services affected by maintenance work 23/24 May 2022
Patching of backend systems associated with tape services (Elastic Tape, JDMA & Near-Line Archive) will take place on Tuesday 24th May, but services will be made unavailable from midday on Monday 23rd May to facilitate this. Services should be back in operation again on Wednesday 25th May.
Work is currently underway to remove old compute nodes and replace with new, so there is currently a temporary reduction in capacity of the LOTUS batch processing cluster (by 240 nodes/4800 cores), which may result in some jobs taking longer to complete. This work started in mid-late April and is expected to be completed by the end of May, by which time 90 new nodes/8640 new cores should be in operation, thus increasing overall capacity by that time.
Work to update storage systems, recently postponed from March/April, has now been re-scheduled to coincide with our quarterly maintenance day now planned for Tuesday 14th June 2022.
On this occasion, due to the nature and extent of the work being carried out, users will not able to access JASMIN for large parts of the day. This affects the LOTUS batch processing cluster, significant parts of JASMIN storage, and many virtual machines on which services may be running.
You are advised to plan your work accordingly to avoid this date.
On a regular (roughly quarterly) basis, important updates are applied to systems within the JASMIN infrastructure (which also hosts the CEDA Archive and associated services) in order to keep them up to date and secure. Servers may need to be rebooted in order for these changes to take effect, so there may be an interruption to JASMIN and CEDA services on this date.
Other system work is also scheduled for these dates in order to minimise disruption.
The LOTUS batch processing cluster will be unavailable for the duration of the work on the day, to avoid running jobs being adversely affected. A reservation will start at 06:00 on the day, but any job submitted before that with a running time that goes over the reservation period will not start until after the reservation has finished.
Over the weekend of 18/19 June, there will be NO ACCESS to the RAL network, in which JASMIN is hosted, so there will be no access to JASMIN and CEDA services. Work is expected to continue on Monday 20th & Tuesday 21st June affecting some or all parts of the network. This work is being carried out by STFC’s network team. We will provide further details as these become available.
We advise you to plan your work accordingly to minimise the impact, but please accept our apologies in advance for any inconvenience.
JASMIN Team