abstract:farber:status

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
abstract:farber:status [2018-07-09 17:35] – [Node Status Notification] anitaabstract:farber:status [2020-04-03 11:24] (current) – [Ganglia Cluster Monitoring] anita
Line 1: Line 1:
 +==== Node Status Notification for Farber ====
  
 +An opt-in node status notification service is available for Farber users.  This service sends an email notification when any nodes in your workgroup transition between two of the following states:
 +
 +  * offline: down and not reachable.
 +  * online: up and reachable, but queues are disabled.
 +  * accepting-jobs: up, reachable and queues are enabled.
 +
 +The service currently checks nodes' statuses at the top of every hour and delivers email notifications at the bottom of the hour.  If you opt-in and any of your workgroup's nodes have changed state, you will receive ONE email message detailing the status changes.
 +
 +To opt-in the node status notification service for your workgroup(s), send an e-mail to consult@udel.edu with subject="Node notification opt-in Farber" and make the first line of the message body be
 +      Type=Cluster    
 +
 +==== Live Resources ====
 +[[http://farber.hpc.udel.edu|farber.hpc.udel.edu]] has live resources: system status, job stats, system alerts.
 +
 +==== Machine Information ====
 +[[http://www.hpc.udel.edu/systems/farber|UD IT HPC]] has Farber machine information: attributes including a database of node information, milestones, offline nodes and nodes disabled for maintenance.
 +
 +==== Ganglia Cluster Monitoring ====
 +[[http://farber.hpc.udel.edu/ganglia/|Cluster monitoring]] for Farber uses [[http://ganglia.sourceforge.net/|Ganglia]] to monitor its hardware components.
 +==== System Alerts ====
 +[[https://www.hpc.udel.edu/mantis/default/my_view_page.php|System alerts]]: Check here first if you are experiencing problems with the cluster.
 +
 +
 +==== Job Statistics ====
 +[[http://farber.hpc.udel.edu/jobstats/|Job statistics]]: Check here for the total number of jobs that ended on each day over a range (week, 2 weeks, month, 6 months, year) with an overlay of the total number of jobs which the job scheduler classified as "failed."