= System Daily Maintenance = Author: Thai Tran == Exchange Server == === Physical Environmental Checks === * Verify that environmental conditions are tracked and maintained. * Check temperature and humidity to ensure that environmental systems such as heating and air conditioning settings are within acceptable conditions, and that they function within the hardware manufacturer's specifications. * Ensure that your physical network and related hardware such as routers, switches, hubs, physical cables, and connectors are operational. === Check Backups === * Make sure that the recommended minimum backup strategy of a daily online backup is completed. * Verify that the previous backup operation completed. * Analyze and respond to errors and warnings during the backup operation. * Verify that the transaction logs were successfully purged (if your backup type is purging logs). === Performance === * % Processor Time. * Available MBs. * % Committed Bytes in Use. === Event Logs === * Filter application and system logs on the Exchange server to see all errors. * Filter application and system logs on the Exchange server to see all warnings. * Note repetitive warning and error logs. * Respond to discovered failures and problems. === Exchange Database === * Check the number of transaction logs generated since the last check. Is the number increasing at the “usual” rate? * Verify that databases are mounted. * Make sure that public folder replication is up-to-date. * If full-text indexing is enabled, verify that indexes are up-to-date. * Test mailbox, verify the logon of each database and the send/receive capabilities. === MAPI Client Performance and server availability === * Examine System Monitor counters. * Examine Event Viewer logs. * Verify that a test account can log on to the Exchange server and has send/receive capabilities. * Verify your Performance monitor RPC counters against a baseline - RPC average latency/RPC requests/RPC operations. === Check Queue viewer === * Check queues for each server using the Queue Viewer tool in the Exchange Management Console. * Record queue size. === Message Paths and Mail flow === * Send messages between internal servers using test accounts. * Check and verify that messages deliver successfully. * Send outgoing messages to non-local accounts. * Check and verify that outgoing messages deliver successfully. With the test account on the external host, verify that mail comes in. * Verify successful message transfer across connectors and routes. === Security Logs === * //Mail Essential// and //Mail Security for Exchange// (these licenses expire in 12/31/2011). * View the security event log on Event Viewer and match security changes to known, authorized configuration changes. * Investigate unauthorized security changes discovered in security event log. * Check security news for latest virus, worm, and vulnerabilities. * Update and fix discovered security problems and vulnerabilities. * Verify that SMTP does not relay anonymously, or lock down to specific servers that require functionality. * Verify that SSL is functioning for configured secure channels. * Update virus signatures daily. Note: All the backups sync to the local hard drive. == CRM, OnContact Server == * Verify that SQL Services are running (SQL Agent). * Verify that SQL Agent jobs succeeded. * Verify that spindles have free space. * Verify that data and log files for each database have free space. === Check Backups === * Make sure that the recommended minimum backup strategy of a daily online backup is completed. * Verify that the previous backup operation completed. * Verify that full backups succeeded. * Verify that transactional log Backups succeeded. * Analyze and respond to errors and warnings during the backup operation. * Verify that the transaction logs were successfully purged (if your backup type is purging logs). === Performance === * % Processor Time. * Available MBs. * % Committed Bytes in Use. === Event Logs === * Filter application and system logs on the SQL to see all errors. * Filter application and system logs on the SQL server to see all warnings. * Note repetitive warning and error logs. * Respond to discovered failures and problems. Note: All the backups sync to the local hard drive. == Infusion server == * Verify that SQL Services are running (ie. SQL Agent). * Verify that SQL Agent jobs succeeded. * Verify that spindles have free space. * Verify that data and log files for each database have free space. === Check Backups === * Make sure that the recommended minimum backup strategy of a daily online backup is completed. * Verify that the previous backup operation completed. * Verify that full backups succeeded. * Verify that transactional log Backups succeeded. * Analyze and respond to errors and warnings during the backup operation. * Verify that the transaction logs were successfully purged (if your backup type is purging logs). === Performance === * % Processor Time. * Available MBs. * % Committed Bytes in Use. === Event Logs === * Filter application and system logs on the SQL to see all errors. * Filter application and system logs on the SQL server to see all warnings. * Note repetitive warning and error logs. * Respond to discovered failures and problems. Note: All the backups sync to the local hard drive. == Time Clock Server == * Clock communication – general items. * Clock communication – error messages. * Clock communication – error situational problems. * Make sure that the recommended minimum backup strategy of a daily online backup is completed. * Verify that the previous backup operation completed. * Verify that full backups succeeded. * Verify that transactional log backups succeeded. * Analyze and respond to errors and warnings during the backup operation. * Verify that the transaction logs were successfully purged (if your backup type is purging logs). Note: All the backups sync to the local hard drive. == File Server == * Check application and system logs on the server to see all errors. * Check application and system logs on the Exchange server to see all warnings. * Note repetitive warning and error logs. * Respond to discovered failures and problems. * Use daily data from event log and System Monitor * Check on disk usage. * Check on memory and CPU usage. * Check uptime and availability. * List the top generated, resolved, and pending incidents. * Create solutions for unresolved incidents. * Check anti-virus definition updates timely. * Check server and network status for the overall organization and segments. * Check organizational performance and availability. * Check risk analysis and evaluation including upcoming changes. * Check capacity, availability, and performance reviews. * Review items that have not met target objectives. Note: Backup on this server is sync to the NAS. == Spark Server == * Check disk space availability. * Check status of backups. * Check that the pmon process is running. * No changes to ''/etc/passwd'', ''/etc/shadow'', ''/etc/hosts'', and ''/etc/group''. * Check the latest entries in the logs. Note: Manual backup users/groups from the web GUI == Software Development Server (swdev) == * Check disk space availability. * Check status of backups. * Check that the pmon process is running. * No changes to ''/etc/passwd'', ''/etc/shadow'', ''/etc/hosts'', ''/etc/group''. * Check the latest entries in the logs. == Web Server (www) == * Check disk space availability. * Check status of backups. * Backup folder is ''/data/backup'' * Backup script is ''/etc/cron.daily/backup'' * Backup to mirror drive is ''/media/www/data'' * Backup script to mirror drive is ''/etc/cron.daily/backuptomirror'' * Backup of mirrored ''swdev.audina.net'' server is ''/data/mirror''. This backup is created using ''rsync'' (see script ''/root/rsync-swdev.sh''). #!/bin/bash rsync --daemon --config=/etc/rsyncd.conf root@www:~# cat rsync-swdev.sh #!/bin/bash #rsync --verbose --progress --stats --compress --rsh=/usr/bin/ssh \ # --recursive --times --perms --links --delete \ # --exclude "*bak" --exclude "*~" \ # 192.168.0.160:webfiles /var/www/mirror # Website rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ --recursive --times --perms --links --delete --exclude=stats \ 192.168.0.160::webfiles /data/mirror/swdev.audina.net/www # Databases rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ --recursive --times --perms --links --delete \ 192.168.0.160::databases /data/mirror/swdev.audina.net/databases # Root user home rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ --recursive --times --perms --links --delete \ 192.168.0.160::root /data/mirror/swdev.audina.net/root # Subserver Repositories rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ --recursive --times --perms --links --delete \ 192.168.0.160::repos /data/mirror/swdev.audina.net/repos * Check that the ''pmon'' process is running. * No changes to ''/etc/passwd'', ''/etc/shadow'', ''/etc/hosts'', ''/etc/group''. * Check the latest entries in the logs. Note: Backup sync/mirror to the internal drive and NAS. == System36 Client Emulator Server (Bosânova) == * User manual and installation procedures: ''\\NAS\public\Software.apps\ES.server.Bosanova\DOCS'' * Check for emulator server services are running. * Check for users’ connectivity. == Router/Switches/Firewall/Gateway == * Check system monitor, CPU usage, uptime, disk usage, system load, and performance. * Check web security, black list, custom sites, and policies. * Check and monitor remote user/VPN settings and logs. * Assign and adjust network configuration settings related to the IP addresses were given are met. * Check for system logs, error messages, and system diagnostics to analyze the network connectivity. == Suggestions == * Need to re-design a new network infrastructure for better productivity, connectivity, eliminate downtime, and point of failures. * All production servers need to be replaced at least once every five years. * Need to replace all the home built servers: ''Infusion'', ''OnContact'', and ''TimeClock''. These servers do not have hardware redundant functionality to handle production environment. * Need to rebuild and replace ''Fileserver'' because of hardware failure and running out of space. * Need to rebuild and upgrade ''Exchange'' server to Exchange 2010 with backup and restore software licenses. * Need a new gateway router that can monitor Audina bandwidth, productivity, and threats from the outside world. * Need new network switches. * Need to re-wire the whole network infrastructure. * Need to install a patch panel. * Eliminate all the small network switches, this will cause the slowness and bottleneck of the network. * Need to replace all QC computers except Sherry’s computer. * Need to have a better Internet bandwidth for better productivity. NOTE: These suggestions have been put forward to management when I first started, from day one. Keep in mind: my intentions here are to protect Audina’s data. --- //[[ttran@audina.net|Thai Tran]] 2011/10/28 12:19//