This is an old revision of the document!
System Daily Maintenance
Author: Thai Tran
Exchange Server
Physical Environmental Checks
- Verify that environmental conditions are tracked and maintained.
- Check temperature and humidity to ensure that environmental systems such as heating and air conditioning settings are within acceptable conditions, and that they function within the hardware manufacturer's specifications.
- Ensure that your physical network and related hardware such as routers, switches, hubs, physical cables, and connectors are operational.
Check Backups
- Make sure that the recommended minimum backup strategy of a daily online backup is completed.
- Verify that the previous backup operation completed.
- Analyze and respond to errors and warnings during the backup operation.
- Verify that the transaction logs were successfully purged (if your backup type is purging logs).
Performance
- % Processor Time.
- Available MBs.
- % Committed Bytes in Use.
Event Logs
- Filter application and system logs on the Exchange server to see all errors.
- Filter application and system logs on the Exchange server to see all warnings.
- Note repetitive warning and error logs.
- Respond to discovered failures and problems.
Exchange Database
- Check the number of transaction logs generated since the last check. Is the number increasing at the “usual” rate?
- Verify that databases are mounted.
- Make sure that public folder replication is up-to-date.
- If full-text indexing is enabled, verify that indexes are up-to-date.
- Test mailbox, verify the logon of each database and the send/receive capabilities.
MAPI Client Performance and server availability
- Examine System Monitor counters.
- Examine Event Viewer logs.
- Verify that a test account can log on to the Exchange server and has send/receive capabilities.
- Verify your Performance monitor RPC counters against a baseline - RPC average latency/RPC requests/RPC operations.
Check Queue viewer
- Check queues for each server using the Queue Viewer tool in the Exchange Management Console.
- Record queue size.
Message Paths and Mail flow
- Send messages between internal servers using test accounts.
- Check and verify that messages deliver successfully.
- Send outgoing messages to non-local accounts.
- Check and verify that outgoing messages deliver successfully. With the test account on the external host, verify that mail comes in.
- Verify successful message transfer across connectors and routes.
Security Logs
- Mail Essential and Mail Security for Exchange (these licenses expire in 12/31/2011).
- View the security event log on Event Viewer and match security changes to known, authorized configuration changes.
- Investigate unauthorized security changes discovered in security event log.
- Check security news for latest virus, worm, and vulnerabilities.
- Update and fix discovered security problems and vulnerabilities.
- Verify that SMTP does not relay anonymously, or lock down to specific servers that require functionality.
- Verify that SSL is functioning for configured secure channels.
- Update virus signatures daily.
Note: All the backups sync to the local hard drive.
CRM, OnContact Server
- Verify that SQL Services are running (SQL Agent).
- Verify that SQL Agent jobs succeeded.
- Verify that spindles have free space.
- Verify that data and log files for each database have free space.
Check Backups
- Make sure that the recommended minimum backup strategy of a daily online backup is completed.
- Verify that the previous backup operation completed.
- Verify that full backups succeeded.
- Verify that transactional log Backups succeeded.
- Analyze and respond to errors and warnings during the backup operation.
- Verify that the transaction logs were successfully purged (if your backup type is purging logs).
Performance
- % Processor Time.
- Available MBs.
- % Committed Bytes in Use.
Event Logs
- Filter application and system logs on the SQL to see all errors.
- Filter application and system logs on the SQL server to see all warnings.
- Note repetitive warning and error logs.
- Respond to discovered failures and problems.
Note: All the backups sync to the local hard drive.
Infusion server
- Verify that SQL Services are running (ie. SQL Agent).
- Verify that SQL Agent jobs succeeded.
- Verify that spindles have free space.
- Verify that data and log files for each database have free space.
Check Backups
- Make sure that the recommended minimum backup strategy of a daily online backup is completed.
- Verify that the previous backup operation completed.
- Verify that full backups succeeded.
- Verify that transactional log Backups succeeded.
- Analyze and respond to errors and warnings during the backup operation.
- Verify that the transaction logs were successfully purged (if your backup type is purging logs).
Performance
- % Processor Time.
- Available MBs.
- % Committed Bytes in Use.
Event Logs
- Filter application and system logs on the SQL to see all errors.
- Filter application and system logs on the SQL server to see all warnings.
- Note repetitive warning and error logs.
- Respond to discovered failures and problems.
Note: All the backups sync to the local hard drive.
Time Clock Server
- Clock communication – general items.
- Clock communication – error messages.
- Clock communication – error situational problems.
- Make sure that the recommended minimum backup strategy of a daily online backup is completed.
- Verify that the previous backup operation completed.
- Verify that full backups succeeded.
- Verify that transactional log backups succeeded.
- Analyze and respond to errors and warnings during the backup operation.
- Verify that the transaction logs were successfully purged (if your backup type is purging logs).
Note: All the backups sync to the local hard drive.
File Server
- Check application and system logs on the server to see all errors.
- Check application and system logs on the Exchange server to see all warnings.
- Note repetitive warning and error logs.
- Respond to discovered failures and problems.
- Use daily data from event log and System Monitor
- Check on disk usage.
- Check on memory and CPU usage.
- Check uptime and availability.
- List the top generated, resolved, and pending incidents.
- Create solutions for unresolved incidents.
- Check anti-virus definition updates timely.
- Check server and network status for the overall organization and segments.
- Check organizational performance and availability.
- Check risk analysis and evaluation including upcoming changes.
- Check capacity, availability, and performance reviews.
- Review items that have not met target objectives.
Note: Backup on this server is sync to the NAS.
Spark Server
- Check disk space availability.
- Check status of backups.
- Check that the pmon process is running.
- No changes to
/etc/passwd
,/etc/shadow
,/etc/hosts
, and/etc/group
. - Check the latest entries in the logs.
Note: Manual backup users/groups from the web GUI
Software Development Server (swdev)
- Check disk space availability.
- Check status of backups.
- Check that the pmon process is running.
- No changes to
/etc/passwd
,/etc/shadow
,/etc/hosts
,/etc/group
. - Check the latest entries in the logs.
Web Server (www)
- Check disk space availability.
- Check status of backups.
- Backup folder is
/data/backup
- Backup script is
/etc/cron.daily/backup
- Backup to mirror drive is
/media/www/data
- Backup script to mirror drive is
/etc/cron.daily/backuptomirror
- Backup of mirrored
swdev.audina.net
server is/data/mirror
. This backup is created usingrsync
(see script/root/rsync-swdev.sh
).#!/bin/bash rsync --daemon --config=/etc/rsyncd.conf root@www:~# cat rsync-swdev.sh #!/bin/bash #rsync --verbose --progress --stats --compress --rsh=/usr/bin/ssh \ # --recursive --times --perms --links --delete \ # --exclude "*bak" --exclude "*~" \ # 192.168.0.160:webfiles /var/www/mirror # Website rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ --recursive --times --perms --links --delete \ --exclude=stats 192.168.0.160::webfiles /data/mirror/swdev.audina.net/www # Databases rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ --recursive --times --perms --links --delete \ 192.168.0.160::databases /data/mirror/swdev.audina.net/databases # Root user home rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ --recursive --times --perms --links --delete \ 192.168.0.160::root /data/mirror/swdev.audina.net/root # Subserver Repositories rsync --archive --verbose --progress --stats --rsh=/usr/bin/ssh \ --recursive --times --perms --links --delete \ 192.168.0.160::repos /data/mirror/swdev.audina.net/repos
- Check that the pmon process is running.
- No changes to
/etc/passwd
,/etc/shadow
,/etc/hosts
,/etc/group
. - Check the latest entries in the logs.
Note: Backup sync/mirror to the internal drive and NAS.
System36 Client Emulator Server (Bosânova)
- User manual and installation procedures:
\\NAS\public\Software.apps\ES.server.Bosanova\DOCS
- Check for emulator server services are running.
- Check for users’ connectivity.
Router/Switches/Firewall/Gateway
- Check system monitor, CPU usage, uptime, disk usage, system load, and performance.
- Check web security, black list, custom sites, and policies.
- Check and monitor remote user/VPN settings and logs.
- Assign and adjust network configuration settings related to the IP addresses were given are met.
- Check for system logs, error messages, and system diagnostics to analyze the network connectivity.
Suggestions
- Need to re-design a new network infrastructure for better productivity, connectivity, eliminate downtime, and point of failures.
- All production servers need to be replaced at least once every five years.
- Need to replace all the home built servers:
Infusion
,OnContact
, andTimeClock
. These servers do not have hardware redundant functionality to handle production environment. - Need to rebuild and replace
Fileserver
because of hardware failure and running out of space. - Need to rebuild and upgrade
Exchange
server to Exchange 2010 with backup and restore software licenses. - Need a new gateway router that can monitor Audina bandwidth, productivity, and threats from the outside world.
- Need new network switches.
- Need to re-wire the whole network infrastructure.
- Need to install a patch panel.
- Eliminate all the small network switches, this will cause the slowness and bottleneck of the network.
- Need to replace all QC computers except Sherry’s computer.
- Need to have a better Internet bandwidth for better productivity.
NOTE: These suggestions have been put forward to management when I first started, from day one. Keep in mind: my intentions here are to protect Audina’s data. — Thai Tran 2011/10/28 12:19