About a couple decades ago I came up with the idea for a storage server. This occurred while experimenting and working with a Hierarchical Storage Manager (HSM) product that worked on a Sun Solaris system and used tapes in an automated library to manage the contents of the files. At that time the idea of a Content Addressable Storage (CAS) was several years away.
After spending several weeks working with the software and new support for magneto optical disks (MOD) loaded in a small automated library, I decided to look for a better way to achieve similar results. After a few more weeks I came up with a competitive system which was considerably simpler and easy to understand.
Today several versions of the software, each with additional and improved features have been released.
A few years ago an issue with a particular customer that was caused by a network configuration at the site, would cause the storage server dispatcher module to stop responding. For different reasons the cause of the problem was never addressed. The approach was to determine when the issue would come up and provide a mechanism, via an API to restart the server. The issue would come up about three times a week.
During that time, a software developer designed and implemented a program that would ping the storage server using one of the available APIs. The program had a flaw. Under some circumstances it would lose track of state and would start looping making entries in the log file as fast as the runaway could. This gave the impression that a thread in the program would stop responding and the log files would quickly fill up with incorrect messages.
It just happens that a couple years or so before, I developed a mechanism for the logger subsystem to determine if too many messages were being written. Once detected, a message indicating the condition would be written to the current log file and the next several messages would not be ignored. After a time slot, the process would repeat. Configuration of a couple values may be done via registry keys (the storage server may use registry keys while running on .NET or environment variables while running on Linux or UNIX).
I learned about the issue with the monitoring software last week. The developer that created the code is no longer with the company. What is interesting is that the feature in the log files which could have been able to detect the issue was not part of the build in use. It is always a good idea to test the software and deploy in short periods of time. In my opinion, software updates should be deployed once a week.
The log mechanism allows all threads and processes on a single or cluster of machines to write to a single set of log files. The facility allows a maximum number of log files per day. It also maintains all log files for a specified number of days. Each log entry has the date and time, process and thread ID, a custom message which may include the values of several variables, line number and the source file name. These data allows for technical support and software developers to quickly identify the source of an issue.
A few years ago I put together the architecture and basic design for a log system that would collect messages from different machines and would write them to a set of centralized files. For some reason such feature was not made part of a release.
Each software system has different requirements implemented in different ways. The way the system is architected allows for ease of use, code simplicity and elegance. I am not aware of software developers that are constantly architecting or designing software. Most people architects, designs, implements, tests and repeats. Listening to customers and the development team is critical to the success of software projects.
If you have comments or questions regarding this or any other post in this blog, please do not hesitate and leave me a message. I will respond to it as soon as possible.
Follow me on Twitter: @john_canessa