Data streams have taken an increasing role as both data collection and sharing increased. This allows systems and applications to receive data in real time instead of wait for a batch job which runs nightly to give them what they want. For example, a patient has a disease which is fatal unless treated promptly, but the data about the collected specimens is critical too, as it contains information on specimen quality, percentage of malignant or dead cells, and genetic profiles of the disease (which reveal vulnerabilities to treatment options). You don't want the specimen analysis delayed for technology reasons, but at the same time, without the technological coordination, all of the information from multiple specimens quickly becomes difficult to track and integrate into useful information to treat the disease. The technology becomes even more important because the important specimen analysis is done at centralized locations, receiving specimens from hundreds of institutions, some of which are even in other countries.
Old Database-Centered Method
- Lab processes/analyze specimen
- Technicians enter data into a database
- The specimen might then have to go to another lab at the same institution for more processing
- That other lab might use a different system and database
- A nightly job sends the data to each institution, or possibly several jobs from multiple databases/systems
- Hopefully that institution integrates the data with their system as part of their nightly job process
- The data about the specimen is available the next day
New Data Stream-Centered Method
As far as I know, this approach has just started to be used within the past few years.
- Lab processes/analyze specimen
- Technicians enter data into a database
- The specimen might then have to go to another lab at the same institution for more processing
- That other lab might use a different system and database
- A near real-time data stream producer detects the information in the database and sends the data to each institution within seconds or minutes of being entered.
- A data stream consumer at the patient's institution is subscribed to updates from the processing/analysis labs.
- It receives the specimen data, and integrates it immediately into the hospital's system.
- Result: the patient gets treated 1-3 days sooner, and with a treatment approach based on the latest medical knowledge and genetic analysis technology, not just whatever happens to be available at the local hospital.
- The data is still retained in a database at both institutions, but the communication of the data update has been moved away from the database to the data stream--using the right tool for the job.