This is a discussion on Sun Ultra5 scsi ! Some little problems within the Sun Solaris Hardware forums, part of the Solaris Operating System category; --> [CUT] > > I think you will find this a bit more useful than all those flashing > lights, ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| [CUT] > > I think you will find this a bit more useful than all those flashing > lights, which add heat. The code below will allow you to monitor disk > temperature and take take what action you fancy based on that. > > http://groups.google.co.uk/group/com...owse_frm/threa d/5800e131547381a/eef20cf9806fcb25?rnum=21&hl=en&q=ultra+scsi+temper ature+di sk+solaris+work&_done=%2Fgroup%2Fcomp.sys.sun.hard ware%2Fbrowse_frm%2Fthread %2F5800e131547381a%2Fcda131318d7ff786%3Flnk%3Dst%2 6q%3Dultra+scsi+temperatur e+disk+solaris+work%26rnum%3D1%26hl%3Den%26#doc_70 3f8f97188185f9 > wow Dave, it's very very interesting ! Thanks ;-) Mike |
| ||||
| Michael Myers wrote: > http://groups.google.co.uk/group/com...owse_frm/threa > d/5800e131547381a/eef20cf9806fcb25?rnum=21&hl=en&q=ultra+scsi+temper ature+di > sk+solaris+work&_done=%2Fgroup%2Fcomp.sys.sun.hard ware%2Fbrowse_frm%2Fthread > %2F5800e131547381a%2Fcda131318d7ff786%3Flnk%3Dst%2 6q%3Dultra+scsi+temperatur > e+disk+solaris+work%26rnum%3D1%26hl%3Den%26#doc_70 3f8f97188185f9 > > > wow Dave, it's very very interesting ! > Thanks ;-) > Mike > > If you find http://groups.google.co.uk/group/com...3f8f97188185f9 interesting, you might like to take a look at this, which is a shell script I wrote to check disk temperature. I call it from cron every 15 minutes. If within 4 degrees C of the maximum temperature it issues a warning in syslog. If within 2 degrees C is shuts the machine down. It finds all the disks, as given in /dev/rdsk. You can exclude any, such as a CD-ROM drive. It calls a binary executable, the C source of which I have put in the shell script as a comment. Save the C to a file hdtemp.c (remove the # as the comment), then gcc hdtemp.c -o hdtemp There is a line in the script which says "THERE IS NO NEED TO CHANGE ANYTHING BELOW THIS LINE" but I just realised the location of the binary is hardcoded as /usr/local/bin, so that statement is not necessary true. You might also want to change the fact it logs to syslog with local3. Choose whatever you want, assuming it still conforms to syslog protocol (RFC 3164). As I said earlier, there are more complex utilities which are supposed to predict when hard drives will fail, but from what I have read, these don't actually work too well in practice, so I settled on temperature only. On an old SS20, which tends to run very warm, it is not possible to use the ultrascsi interface. So I used a brute-force approach, with a bi-metallic strip. http://www.g8wrb.org/useful-stuff/Su...20/index.shtml Good luck, dave k #!/bin/sh # This shell script calls a *C* program 'hdtemp' # which outputs temperature given the raw disk # Here is a typical output from 'hdtemp' # /usr/local/bin/hdtemp /dev/rdsk/c1t1d0s2 # current temp = 38 C, trip temp = 65 C # PATH=/usr/bin:/usr/sbin # This script will log to syslog disks that are too warm. If the # disks are very warm, the system will be shut down. # The C source code of 'hdtemp' is at the bottom of this # script. You will need to save that in a file, after # removing the hashes, which are used as comments. # You *must* set any disks you want to exclude for testing. The CD-ROM # is an obvious example of one to exclude. Also, exclude any that # do not support the ability to measure temperature. excludedisks="c0t6d0s2" # Warnings are issued if there is less than # 'warningsafetymargin' deg C between the actual temperature # and the trip temperature. warningsafetymargin=4 # The system is shut down if there are less than # 'criticalsafetymargin' deg C between the actual temperature # and the trip temperature. criticalsafetymargin=2 ################################################## ####### ################################################## ####### ## THERE IS NO NEED TO CHANGE ANYTHING BELOW THIS LINE ## ################################################## ####### ################################################## ####### # Check all disks on the system, except any that are # excluded. for disk in `ls /dev/rdsk/*s2 | grep -v $excludedisks` do # Read the disk temperature. temp=`/usr/local/bin/hdtemp $disk | awk '{print $4}'` # Read the maximum permitted temperature. maxtemp=`/usr/local/bin/hdtemp $disk | awk '{print $9}'` # Compute the difference between the actual temperature and the # maximum temperature. margin=`echo $maxtemp - $temp | bc` # Use syslog to log a message if the disk is too warm. if [ $margin -lt $criticalsafetymargin ] ; then echo Shutting down due to temperature problem on disk $disk Temperature is $temp deg C. Max permissable temperature is $maxtemp deg C. System shut down since the safety margin is less than $warningsafetymargin deg C. | logger -p local3.crit shutdown -y -g 120 -i 5 exit 1 fi # If the disk is really warm, then log it as critical and shut the system down. if [ $margin -lt $warningsafetymargin ] ; then echo Temperature problem on disk $disk Temperature is $temp deg C. Max permissable temperature is $maxtemp deg C. Warnings are issued when the safety margin is less than $warningsafetymargin deg C. | logger -p local3.warn fi done exit 0 # To put in one place, here is the C source of 'hdtemp' # Compile with gcc hdtemp.c -o hdtemp ##include <stdlib.h> ##include <stdio.h> ##include <sys/types.h> ##include <fcntl.h> ##include <unistd.h> ##include <errno.h> ##include <string.h> ##include <sys/scsi/scsi.h> # ##define LOG_SENSE 0x4d ##define TEMPERATURE_PAGE 0x0d # #int #scsi_log_sense(int fd, int pagenum, uint8_t *pbuf, size_t buflen, # size_t known_resp_len) #{ # struct uscsi_cmd ucmd; # struct scsi_extended_sense sense; # uint8_t cdb[10]; # int status; # # memset(&ucmd, 0, sizeof (ucmd)); # memset(cdb, 0, sizeof (cdb)); # # cdb[0] = LOG_SENSE; # cdb[2] = 0x40 | (pagenum & 0x3f); # cdb[7] = known_resp_len >> 8; # cdb[8] = known_resp_len; # # ucmd.uscsi_cdb = (caddr_t)cdb; # ucmd.uscsi_cdblen = sizeof (cdb); # ucmd.uscsi_bufaddr = pbuf; # ucmd.uscsi_buflen = known_resp_len; # ucmd.uscsi_rqbuf = (caddr_t)&sense; # ucmd.uscsi_rqlen = sizeof (sense); # ucmd.uscsi_timeout = 15; # ucmd.uscsi_flags = USCSI_READ; # # status = ioctl(fd, USCSICMD, &ucmd); # # return (status); #} #int main(int argc, char *argv[]) #{ # char *device; # int fd; # int err; # uint8_t tbuf[16]; # # if (argc != 2) { # fprintf(stderr, "usage: %s <device>\n", argv[0]); # exit (1); # } # device = argv[1]; # # fd = open(device, O_RDONLY | O_NONBLOCK); # if (fd < 0) { # perror(device); # exit(1); # } # # memset(tbuf, 0, sizeof (tbuf)); # err = scsi_log_sense(fd, TEMPERATURE_PAGE, tbuf, sizeof (tbuf), # sizeof (tbuf)); # if (err != 0) { # perror("scsi_log_sense failed - disk might not exist, or be powered off"); # tbuf[9]=1; # } # if (tbuf[9] == 255 ) # tbuf[9]=0; # printf("current temp = %d C, trip temp = %d C\n", tbuf[9], tbuf[15]); # # return (0); #} -- Dave K http://www.southminster-branch-line.org.uk/ Please note my email address changes periodically to avoid spam. It is always of the form: month-year@domain. Hitting reply will work for a couple of months only. Later set it manually. The month is always written in 3 letters (e.g. Jan, not January etc) |