Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
audina:dailymaintenance:start [2011/10/28 12:13]
smayr created
audina:dailymaintenance:start [2011/11/29 12:59] (current)
smayr [Web Server (www)]
Line 1: Line 1:
 = System Daily Maintenance = = System Daily Maintenance =
 +Author: Thai Tran
  
-== Exchange server == +== Exchange Server ==
- Physical Environmental Checks +
-Verify that environmental conditions are tracked and maintained. +
-Check temperature and humidity to ensure that environmental systems such as heating and air conditioning settings are within acceptable conditions, and that they function within the hardware manufacturer's specifications. +
-Ensure that your physical network and related hardware such as routers, switches, hubs, physical cables, and connectors are operational.+
  
-Check Backups +=== Physical Environmental Checks === 
-Make sure that the recommended minimum backup strategy of a daily online backup is completed. +  Verify that environmental conditions are tracked and maintained
-Verify that the previous backup operation completed+  * Check temperature and humidity to ensure that environmental systems such as heating and air conditioning settings are within acceptable conditions, and that they function within the hardware manufacturer's specifications
-Analyze and respond to errors and warnings during the backup operation+  * Ensure that your physical network and related hardware such as routers, switches, hubs, physical cables, and connectors are operational.
-Verify that the transaction logs were successfully purged (if your backup type is purging logs).+
  
-Performance +=== Check Backups === 
-% Processor Time +  * Make sure that the recommended minimum backup strategy of a daily online backup is completed. 
-Available MBs +  * Verify that the previous backup operation completed. 
-% Committed Bytes in Use+  * Analyze and respond to errors and warnings during the backup operation. 
 +  * Verify that the transaction logs were successfully purged (if your backup type is purging logs).
  
-Event Logs +=== Performance === 
-Filter application and system logs on the Exchange server to see all errors+  * % Processor Time
-Filter application and system logs on the Exchange server to see all warnings+  * Available MBs
-Note repetitive warning and error logs. +  * % Committed Bytes in Use.
-Respond to discovered failures and problems.+
  
-Exchange Database +=== Event Logs === 
-Check the number of transaction logs generated since the last checkIs the number increasing at the “usual” rate? +  * Filter application and system logs on the Exchange server to see all errors
-Verify that databases are mounted. +  * Filter application and system logs on the Exchange server to see all warnings
-Make sure that public folder replication is up-to-date+  * Note repetitive warning and error logs
-If full-text indexing is enabled, verify that indexes are up-to-date+  * Respond to discovered failures and problems.
-Test mailbox, verify the logon of each database and the send/receive capabilities.+
  
-MAPI Client Performance and server availability +=== Exchange Database === 
-Examine System Monitor counters. +  * Check the number of transaction logs generated since the last checkIs the number increasing at the “usual” rate? 
-Examine Event Viewer logs. +  Verify that databases are mounted. 
-Verify that a test account can log on to the Exchange server and has send/receive capabilities+  * Make sure that public folder replication is up-to-date. 
-Verify your Performance monitor RPC counters against a baseline - RPC average latency/RPC requests/RPC operations.+  * If full-text indexing is enabled, verify that indexes are up-to-date. 
 +  * Test mailbox, verify the logon of each database and the send/receive capabilities.
  
-Check Queue viewer +=== MAPI Client Performance and server availability === 
-Check queues for each server using the Queue Viewer tool in the Exchange Management Console+  * Examine System Monitor counters. 
-Record queue size.+  * Examine Event Viewer logs. 
 +  * Verify that a test account can log on to the Exchange server and has send/receive capabilities
 +  * Verify your Performance monitor RPC counters against a baseline - RPC average latency/RPC requests/RPC operations.
  
-Message Paths and Mail flow +=== Check Queue viewer === 
-Send messages between internal servers using test accounts. +  Check queues for each server using the Queue Viewer tool in the Exchange Management Console
-Check and verify that messages deliver successfully. +  * Record queue size.
-Send outgoing messages to non-local accounts. +
-Check and verify that outgoing messages deliver successfully. With the test account on the external host, verify that mail comes in+
-Verify successful message transfer across connectors and routes.+
  
-Security Logs +=== Message Paths and Mail flow === 
-Mail Essential and Mail Security for exchange (these licenses are expired in 12/31/2011)  +  * Send messages between internal servers using test accounts. 
-View the security event log on Event Viewer and match security changes to known, authorized configuration changes. +  * Check and verify that messages deliver successfully. 
-Investigate unauthorized security changes discovered in security event log. +  * Send outgoing messages to non-local accounts. 
-Check security news for latest virus, worm, and vulnerabilities. +  * Check and verify that outgoing messages deliver successfully. With the test account on the external host, verify that mail comes in. 
-Update and fix discovered security problems and vulnerabilities. +  * Verify successful message transfer across connectors and routes. 
-Verify that SMTP does not relay anonymously, or lock down to specific servers that require functionality. + 
-Verify that SSL is functioning for configured secure channels. +=== Security Logs === 
-Update virus signatures daily. +  * //Mail Essential// and //Mail Security for Exchange// (these licenses expire in 12/31/2011) 
- Note: All the backups sync to the local hard drive.+  View the security event log on Event Viewer and match security changes to known, authorized configuration changes. 
 +  Investigate unauthorized security changes discovered in security event log. 
 +  Check security news for latest virus, worm, and vulnerabilities. 
 +  Update and fix discovered security problems and vulnerabilities. 
 +  Verify that SMTP does not relay anonymously, or lock down to specific servers that require functionality. 
 +  Verify that SSL is functioning for configured secure channels. 
 +  Update virus signatures daily. 
 + 
 +Note: All the backups sync to the local hard drive.
  
 == CRM, OnContact Server == == CRM, OnContact Server ==
-Verify that SQL Services are running (SQL Agent) +  * Verify that SQL Services are running (SQL Agent). 
-Verify that SQL Agent jobs succeeded +  Verify that SQL Agent jobs succeeded. 
-Verify that spindles have free space +  Verify that spindles have free space. 
-Verify that data and log files for each database have free space+  Verify that data and log files for each database have free space.
  
-Check Backups +=== Check Backups === 
-Make sure that the recommended minimum backup strategy of a daily online backup is completed. +  Make sure that the recommended minimum backup strategy of a daily online backup is completed. 
-Verify that the previous backup operation completed. +  Verify that the previous backup operation completed. 
-Verify that full backups succeeded +  Verify that full backups succeeded. 
-Verify that transactional log Backups succeeded +  Verify that transactional log Backups succeeded. 
-Analyze and respond to errors and warnings during the backup operation. +  Analyze and respond to errors and warnings during the backup operation. 
-Verify that the transaction logs were successfully purged (if your backup type is purging logs).+  Verify that the transaction logs were successfully purged (if your backup type is purging logs).
  
-Performance +=== Performance === 
-% Processor Time +  % Processor Time. 
-Available MBs +  Available MBs. 
-% Committed Bytes in Use+  % Committed Bytes in Use.
  
-Event Logs +=== Event Logs === 
-Filter application and system logs on the SQL to see all errors. +  Filter application and system logs on the SQL to see all errors. 
-Filter application and system logs on the SQL server to see all warnings. +  Filter application and system logs on the SQL server to see all warnings. 
-Note repetitive warning and error logs. +  Note repetitive warning and error logs. 
-Respond to discovered failures and problems. +  Respond to discovered failures and problems. 
- Note: All the backups sync to the local hard drive.+ 
 +Note: All the backups sync to the local hard drive.
  
 == Infusion server == == Infusion server ==
-Verify that SQL Services are running (ie. SQL Agent) +  * Verify that SQL Services are running (ie. SQL Agent). 
-Verify that SQL Agent jobs succeeded +  Verify that SQL Agent jobs succeeded. 
-Verify that spindles have free space +  Verify that spindles have free space. 
-Verify that data and log files for each database have free space+  Verify that data and log files for each database have free space.
  
-Check Backups +=== Check Backups === 
-Make sure that the recommended minimum backup strategy of a daily online backup is completed. +  Make sure that the recommended minimum backup strategy of a daily online backup is completed. 
-Verify that the previous backup operation completed. +  Verify that the previous backup operation completed. 
-Verify that full backups succeeded +  Verify that full backups succeeded. 
-Verify that transactional log Backups succeeded +  Verify that transactional log Backups succeeded. 
-Analyze and respond to errors and warnings during the backup operation. +  Analyze and respond to errors and warnings during the backup operation. 
-Verify that the transaction logs were successfully purged (if your backup type is purging logs).+  Verify that the transaction logs were successfully purged (if your backup type is purging logs).
  
-Performance +=== Performance === 
-% Processor Time +  % Processor Time. 
-Available MBs +  Available MBs. 
-% Committed Bytes in Use+  % Committed Bytes in Use.
  
-Event Logs +=== Event Logs === 
-Filter application and system logs on the SQL to see all errors. +  Filter application and system logs on the SQL to see all errors. 
-Filter application and system logs on the SQL server to see all warnings. +  Filter application and system logs on the SQL server to see all warnings. 
-Note repetitive warning and error logs. +  Note repetitive warning and error logs. 
-Respond to discovered failures and problems. +  Respond to discovered failures and problems. 
- Note: All the backups sync to the local hard drive.+ 
 +Note: All the backups sync to the local hard drive.
  
 == Time Clock Server == == Time Clock Server ==
-Clock communication – general items +  * Clock communication – general items. 
-Clock communication – error messages +  Clock communication – error messages. 
-Clock communication – error situational problems +  Clock communication – error situational problems. 
-Make sure that the recommended minimum backup strategy of a daily online backup is completed. +  Make sure that the recommended minimum backup strategy of a daily online backup is completed. 
-Verify that the previous backup operation completed. +  Verify that the previous backup operation completed. 
-Verify that full backups succeeded +  Verify that full backups succeeded. 
-Verify that transactional log Backups succeeded +  Verify that transactional log backups succeeded. 
-Analyze and respond to errors and warnings during the backup operation. +  Analyze and respond to errors and warnings during the backup operation. 
-Verify that the transaction logs were successfully purged (if your backup type is purging logs).+  Verify that the transaction logs were successfully purged (if your backup type is purging logs). 
 Note: All the backups sync to the local hard drive. Note: All the backups sync to the local hard drive.
  
 == File Server == == File Server ==
-Check application and system logs on the server to see all errors. +  * Check application and system logs on the server to see all errors. 
-Check application and system logs on the Exchange server to see all warnings. +  Check application and system logs on the Exchange server to see all warnings. 
-Note repetitive warning and error logs. +  Note repetitive warning and error logs. 
-Respond to discovered failures and problems. +  Respond to discovered failures and problems. 
-Use daily data from event log and System Monitor +  Use daily data from event log and System Monitor 
-Check on disk usage. +  Check on disk usage. 
-Check on memory and CPU usage. +  Check on memory and CPU usage. 
-Check uptime and availability. +  Check uptime and availability. 
-List the top generated, resolved, and pending incidents. +  List the top generated, resolved, and pending incidents. 
-Create solutions for unresolved incidents. +  Create solutions for unresolved incidents. 
-Check anti-virus definition updates timely. +  Check anti-virus definition updates timely. 
-Check server and network status for the overall organization and segments. +  Check server and network status for the overall organization and segments. 
-Check organizational performance and availability. +  Check organizational performance and availability. 
-Check risk analysis and evaluation including upcoming changes. +  Check risk analysis and evaluation including upcoming changes. 
-Check capacity, availability, and performance reviews. +  Check capacity, availability, and performance reviews. 
-Review items that have not met target objectives. +  Review items that have not met target objectives. 
-Note: Backup on this server is sync to the NAS+ 
 +Note: Backup on this server is sync to the NAS.
  
 == Spark Server == == Spark Server ==
-Check disk space availability +  * Check disk space availability. 
-Check status of backups  +  Check status of backups. 
-Check that the pmon process is running  +  Check that the pmon process is running. 
-No changes to /etc/passwd /etc/shadow /etc/hosts /etc/group +  No changes to ''/etc/passwd'', ''/etc/shadow'', ''/etc/hosts'', and ''/etc/group''. 
-Check the latest entries in the logs+  Check the latest entries in the logs
 Note: Manual backup users/groups from the web GUI Note: Manual backup users/groups from the web GUI
  
-== SWdev Server (Software Development) == +== Software Development Server (swdev) == 
-Check disk space availability +  Check disk space availability. 
-Check status of backups  +  Check status of backups. 
-Check that the pmon process is running  +  Check that the pmon process is running. 
-No changes to /etc/passwd /etc/shadow /etc/hosts /etc/group +  No changes to ''/etc/passwd'', ''/etc/shadow'', ''/etc/hosts'', ''/etc/group''. 
-Check the latest entries in the logs+  Check the latest entries in the logs.
  
 == Web Server (www) == == Web Server (www) ==
-Check disk space availability +  * Check disk space availability. 
-Check status of backups  +  Check status of backups.   
-Check that the pmon process is running  +    * Backup folder is ''/data/backup'' 
-No changes to /etc/passwd /etc/shadow /etc/hosts /etc/group +    * Backup script is ''/etc/cron.daily/backup'' 
-Check the latest entries in the logs +    * Backup to mirror drive is ''/media/www/data'' 
-Note: Backup sync/mirror to the internal drive and NAS+    * Backup script to mirror drive is ''/etc/cron.daily/backuptomirror'' 
 +    * Backup of mirrored ''swdev.audina.net'' server is ''/data/mirror''. This backup is created using ''rsync'' (see script ''/root/rsync-swdev.sh''). <code bash>#!/bin/bash 
 +rsync --daemon --config=/etc/rsyncd.conf 
 +root@www:~# cat rsync-swdev.sh 
 +#!/bin/bash 
 + 
 +#rsync --verbose  --progress --stats --compress --rsh=/usr/bin/ssh \ 
 +#      --recursive --times --perms --links --delete \ 
 +#      --exclude "*bak" --exclude "*~"
 +#      192.168.0.160:webfiles /var/www/mirror 
 + 
 +# Website 
 +rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ 
 +--recursive --times --perms --links --delete --exclude=stats \ 
 +192.168.0.160::webfiles /data/mirror/swdev.audina.net/www 
 + 
 +# Databases 
 +rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ 
 +--recursive --times --perms --links --delete \ 
 +192.168.0.160::databases /data/mirror/swdev.audina.net/databases 
 + 
 +# Root user home 
 +rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ 
 +--recursive --times --perms --links --delete \ 
 +192.168.0.160::root /data/mirror/swdev.audina.net/root 
 + 
 +# Subserver Repositories 
 +rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ 
 +--recursive --times --perms --links --delete \ 
 +192.168.0.160::repos /data/mirror/swdev.audina.net/repos 
 +</code> 
 +  * Check that the ''pmon'' process is running. 
 +  No changes to ''/etc/passwd'', ''/etc/shadow'', ''/etc/hosts'', ''/etc/group''. 
 +  Check the latest entries in the logs
 + 
 +Note: Backup sync/mirror to the internal drive and NAS
 + 
 +== System36 Client Emulator Server (Bosânova) == 
 +  * User manual and installation procedures: ''\\NAS\public\Software.apps\ES.server.Bosanova\DOCS'' 
 +  * Check for emulator server services are running. 
 +  * Check for users’ connectivity.
  
-== Emulator Server (BoSanova) == +== Router/Switches/Firewall/Gateway == 
-User manual and installation procedures: \\NAS\public\Software.apps\ES.server.Bosanova\DOCS +  * Check system monitor, CPU usage, uptime, disk usage, system load, and performance. 
-Check for emulator server services are running +  * Check web security, black list, custom sites, and policies
-Check for users’ connectivity+  Check and monitor remote user/VPN settings and logs. 
 +  * Assign and adjust network configuration settings related to the IP addresses were given are met. 
 +  Check for system logs, error messages, and system diagnostics to analyze the network connectivity.
  
-== Router/Switches/Firewall gateway == +== Suggestions == 
-Check system monitorCPU usage, uptimedisk usage, system load, and performance+  * Need to re-design a new network infrastructure for better productivityconnectivityeliminate downtime, and point of failures
-Check web securityblack list, custom sites, and policies +  * All production servers need to be replaced at least once every five years. 
-Check and monitor remote user/VPN settings and logs +  * Need to replace all the home built servers: ''Infusion''''OnContact'', and ''TimeClock''. These servers do not have hardware redundant functionality to handle production environment. 
-Assign and adjust network configuration settings related to the IP addresses were given are met +  * Need to rebuild and replace ''Fileserver'' because of hardware failure and running out of space. 
-Check for system logserror messages, and system diagnostics to analyze the network connectivity.+  * Need to rebuild and upgrade ''Exchange'' server to Exchange 2010 with backup and restore software licenses. 
 +  * Need a new gateway router that can monitor Audina bandwidthproductivity, and threats from the outside world. 
 +  * Need new network switches. 
 +  * Need to re-wire the whole network infrastructure. 
 +  * Need to install a patch panel. 
 +  * Eliminate all the small network switches, this will cause the slowness and bottleneck of the network. 
 +  * Need to replace all QC computers except Sherry’s computer. 
 +  * Need to have a better Internet bandwidth for better productivity.
  
-== Suggestion == +NOTE: These suggestions have been put forward to management when I first startedfrom day one. Keep in mindmy intentions here are to protect Audina’s data.  --- //[[ttran@audina.net|Thai Tran]] 2011/10/28 12:19//
-Need to re-design a new network infrastructure for better productivity, connectivity, eliminate downtime, and point of failures. +
-All production servers need to be replaced at least once every five years +
-Need to replace all the home build servers: Infusion, Oncontact, and Timeclock. These servers do not have hardware redundant functionality to handle production environment. +
-Need to rebuild and replace fileserver because of hardware failure and running out of space +
-Need to rebuild and upgrade exchange server to exchange 2010 with backup and restore software licenses. +
-Need a new gateway router that can monitor Audina bandwidth, productivity, and threats from the outside world. +
-Need new network switches +
-Need to re-wire the whole network infrastructure +
-Need to install a patch panel +
-Eliminate all the small network switches, this will cause the slowness and bottleneck of the network +
-Need to replace all QC computers except Sherry’s computer +
-Need to have a better Internet bandwidth for better productivity +
-NOTE: These suggestions had been told and mentioned when I (Thai) first started from day one. Keep in mindmy intentions here are to protect Audina’s data. +