UCS Development Group Services¶
The University Computing Service (UCS) was one of the parent organisations that merged to form University Information Services (UIS).
The UCS development group wrote services, some of which are still in use today. As the UCS and it's dev group no longer exist, responsibility for running these has been transferred to DevOps.
The services run by DevOps currently includes:
- Lookup
- Password Changing Application
- Human Tissue Tracking Application
- Network Access Tokens Application
- Streaming Media Service
- University Training Booking System
Deploying changes to UCS development group services¶
Note
DevOps are currently in the process of migrating between SLES and RHEL. Please make sure you are using the correct method.
Current Deployments¶
Application | Operating System | HA provider |
---|---|---|
Lookup | RedHat 8 | Keepalived |
Password Changing Application | Red Hat 8 | Traffic Manager |
Human Tissue Tracking Application | Red Hat 8 | Traffic Manager |
Network Access Tokens Application | Red Hat 8 | Traffic Manager |
Streaming Media Service | SLES 11 | Manual |
University Training Booking System | Red Hat 8 | Traffic Manager |
RedHat Deployments¶
New releases are deployed via
Ansible.
Each server should be set to maintenance mode then the run-ansible-playbook.sh
should be run
limited to the node in maintenance mode. e.g.
Note
Which node is on standby can be determined by running curl -s https://www.lookup.cam.ac.uk/adm/status | grep 'HOSTNAME:\|Overall status\|Application' | sed 's/.*<pre>'//g
or looking at the service's /adm/status
page.
- Put the first node in to maintenance mode by touching /maintenance_mode. This file is read by /usr/share/[app]/[app]-monitor-rhel. If the file exists, then the page returns a 500 result. Depending on the configuration, either keepalived or Traffic Manager polls this file (via [host]/adm/liveness) and will take the server out of service if the script return anything other than a 200 result.
touch /maintenance_mode
- On client machine, You can now run the playbook limiting to only the server that is currently out of service. You must ensure you have run the correct setup before this command. See https://gitlab.developers.cam.ac.uk/uis/devops/grails-application-ansible-deployment#deployment for more details.
./run-ansible-playbook.sh -i ibis-production ibis-playbook.yml --diff --limit lookup-live1
- Bring the node back in to service. After the deployment is completed, you should then test the changes before bring the server back in to service.
rm /maintenance_mode
- Repeat for 2nd node
# On lookup-live2
touch /maintenance_mode
# On client machine
./run-ansible-playbook.sh -i ibis-production ibis-playbook.yml --diff --limit lookup-live2
# On lookup-live2
rm /maintenance_mode
Confirm which node is in service¶
You can confirm which notes are serving traffic with the following command:
curl -s https://www.lookup.cam.ac.uk/adm/status | \
grep 'HOSTNAME:\|Overall status\|Application' | sed 's/.*<pre>'//g
SLES Deployements¶
The majority of SLES deployments are clustered using Pacemaker.
To make a release without incurring downtime, the following steps can be taken.
Examples, refer to Lookup service (ucs-ibis
package) and need substituting as
appropriate.
Start with the standby node¶
Note
Which node is on standby can be determined by running crm_mon -1
or looking at the service's
/adm/status
page.
- Put the cluster into maintenance mode, this ensures pacemaker does not try to make changes to the cluster in response to a node being unavailable.
crm configure property maintenance-mode=true
# Check for "unmanaged" status
crm_mon -1
- Release the lock on the package.
# List locked packages
zypper ll
# Release lock on appropriate package
zypper rl ucs-ibis
- Run the software upgrade.
# Refresh repositories
zypper ref
# List updates available
zypper lu
# Update application specific package
zypper up ucs-ibis
# Restart tomcat (a single "restart" doesn't always work)
service tomcat6 stop
service tomcat6 start
-
Check the service is running on the node with update.
- see
https://{node url}/adm/status
- check application functionality
- see
-
Move the service out of maintenance mode.
crm configure property maintenance-mode=false
# Check for removal of "unmanaged" status
crm_mon -1
- Reapply lock to package.
# Release lock on appropriate package
zypper al ucs-ibis
# List locked packages to check
zypper ll
Move to current live node¶
- Move service to already updated standby node
crm configure edit
# shift node weights so that the current standby node is the preferred service
# verify that the service has moved to the previous standby:
crm_mon -1
-
Repeat steps 1 to 6 above. i.e.
- put the service back in maintenance mode
- unlock the package
- update the package
- check success
- remove the service from maintenance mode
- relock the package
-
Move service to back to this node
crm configure edit
# shift node weights so that this nde is the preferred service
# verify that the service has moved back:
crm_mon -1
Following the above should allow a software upgrade to be deployed without any downtime. These steps will not work if the upgrade includes a breaking change to an external data source, e.g. a database migration which is not compatible with the previous version of the software. In this case downtime may need to be scheduled.
TLS certificates on UCS development group services¶
Installation of TLS certificates on UCS dev group services is a manual process on SLES servers. The certificates on Redhat based servers are manged by ansible
Certificate locations¶
Some services seem to have directories (ssl.crt and ssl.key) created to hold the certificate and key
files, other use the tomcat config directory, grep -i certificate
/srv/www/tomcat6/base/conf/server.xml
should show the path.
To install new certificates¶
Obtain the new certificates from the TLS certificate application.
Copy the new certificate and key files to the certificate location on the target system.
Update the tomcat configuration to use the new certificate, edit
/srv/www/tomcat6/base/conf/server.xml
.
Ensure that the certificate and key have the correct ownership and permissions with chown ucstomcat
<file>
and chmod 600 <file>
.
To update the intermidiate certificate¶
Create a new file, qvsslg3.crt
, in the certificate location containing the
new intermediate certificate, remove any blank lines from it.
Ensure that the certificate and key have the correct ownership and permissions with chown ucstomcat
<file>
and chmod 600 <file>
.
In /srv/www/tomcat6/base/conf/server.xml
, edit the line that says (path might be different on your
system):
certificateChainFile="/srv/www/tomcat6/base/conf/QuoVadisGlobalSSLICAG3.crt"
to:
certificateChainFile="/srv/www/tomcat6/base/conf/qvsslg3.crt"
Restart Tomcat:
service tomcat6 restart
Database backup and restore on UCS dev group services¶
Examples, refer to Lookup service (ucs-ibis
package) and need substituting as
appropriate.
Database backups and restores are managed by a pair of scripts in /usr/share/ibis/bin
,
ibis-backup
and ibis-restore
.
Database backups¶
Generally ibis-backup
is run from a cronjob /etc/cron.d/ibis
, which, by passing in an argument
hour|day|week|month|year
, outputs a bzip2 compressed database dump in the
/usr/share/ibis/backup/hour|day|week|month|year
directory. The ibis-backup
script can also take
the argument now
which creates a database dump in the current directory.
Database restores¶
The ibis-restore
script takes a bzip2 compressed backup file which was created by ibis-backup
and optionally the name of the database to restore to.
ibis-restore
does not restore to an existing database, if we want to replace an existing database
we must do a DROP DATABASE
first.
Configuration¶
It's a common pattern across all the dev-group apps to have a /usr/share/<app>/conf/params.yml
file with most of the local config in. And then a page at http://<app>/adm/config
to display the
raw and parsed contents of that file, along with a button to reload it from disk.
In most of the apps, the page is publicly visible (though not advertised), but the reload button requires admin rights. In some apps (passwords), the page itself is protected for added security.
Each of the apps is different in how it manages roles and permissions. Top-level admins are often listed explicitly in params.yml. Some apps use lookup groups for finer-grained permissions. Some store permissions in their own databases. E.g. the UTBS has quite a detailed system for managing fine-grained access controls across the different service providers.