• You are currently viewing our forum as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to additional post topics, communicate privately with other members (PM), view blogs, respond to polls, upload content, and access many other special features. Registration is fast, simple and absolutely free, so please join our community today! Just click here to register. You should turn your Ad Blocker off for this site or certain features may not work properly. If you have any problems with the registration process or your account login, please contact us by clicking here.

Databases

Julius_Van_Der_Beak

Up the Wolves
Joined
Jul 24, 2008
Messages
19,454
MBTI Type
INTP
Enneagram
5w6
Instinctual Variant
sp/so
I feel like it's easy to overlook how ubiquitous they all are. Aren't all these posts in a database? It's just fascinating to me because we interact with this graphical overlay, but the "meat" of this place probably looks entirely different.
 

Oberon

Permabanned
Joined
Feb 24, 2019
Messages
151
MBTI Type
*NT*
Databases are fascinating in a way. I took a database class at community college - aced it. Some background - I've been indecisive between computer science, law, business, dream analysis (Jungian Analytical psychology). I went the business route but every year I re-asses. So I took a database class (I know, I know - TMI).

Anyways...some big points (abstractions).

Data tables are connected to other tables by a primary key. Should databases be imbibed into a mega-database, theoretically, your name and email address, in addition to some other personal datum, could be a unique key. With this key everything you ever did on the internet could be made obtainable with a simple query.

Although that is quite impossible at the moment...who knows.

I don't really care though, got nothing to hide but embarrassing photos and random poems.

Do you study databases? SQL? Python?
 

Coriolis

Si vis pacem, para bellum
Staff member
Joined
Apr 18, 2010
Messages
27,230
MBTI Type
INTJ
Enneagram
5w6
Instinctual Variant
sp/sx
Data tables are connected to other tables by a primary key. Should databases be imbibed into a mega-database, theoretically, your name and email address, in addition to some other personal datum, could be a unique key. With this key everything you ever did on the internet could be made obtainable with a simple query.

Although that is quite impossible at the moment...who knows.

I don't really care though, got nothing to hide but embarrassing photos and random poems.
That's what Martin Niemöller thought, too.

Do you study databases? SQL? Python?
I study Python, but it is a programming language, not a database.

 

JAVO

.
Joined
Apr 24, 2007
Messages
9,178
MBTI Type
eNTP
Data streams have taken an increasing role as both data collection and sharing increased. This allows systems and applications to receive data in real time instead of wait for a batch job which runs nightly to give them what they want. For example, a patient has a disease which is fatal unless treated promptly, but the data about the collected specimens is critical too, as it contains information on specimen quality, percentage of malignant or dead cells, and genetic profiles of the disease (which reveal vulnerabilities to treatment options). You don't want the specimen analysis delayed for technology reasons, but at the same time, without the technological coordination, all of the information from multiple specimens quickly becomes difficult to track and integrate into useful information to treat the disease. The technology becomes even more important because the important specimen analysis is done at centralized locations, receiving specimens from hundreds of institutions, some of which are even in other countries.


Old Database-Centered Method

  1. Lab processes/analyze specimen
  2. Technicians enter data into a database
  3. The specimen might then have to go to another lab at the same institution for more processing
  4. That other lab might use a different system and database
  5. A nightly job sends the data to each institution, or possibly several jobs from multiple databases/systems
  6. Hopefully that institution integrates the data with their system as part of their nightly job process
  7. The data about the specimen is available the next day


New Data Stream-Centered Method

As far as I know, this approach has just started to be used within the past few years.

  1. Lab processes/analyze specimen
  2. Technicians enter data into a database
  3. The specimen might then have to go to another lab at the same institution for more processing
  4. That other lab might use a different system and database
  5. A near real-time data stream producer detects the information in the database and sends the data to each institution within seconds or minutes of being entered.
  6. A data stream consumer at the patient's institution is subscribed to updates from the processing/analysis labs.
  7. It receives the specimen data, and integrates it immediately into the hospital's system.
  8. Result: the patient gets treated 1-3 days sooner, and with a treatment approach based on the latest medical knowledge and genetic analysis technology, not just whatever happens to be available at the local hospital.
  9. The data is still retained in a database at both institutions, but the communication of the data update has been moved away from the database to the data stream--using the right tool for the job.
 

Oberon

Permabanned
Joined
Feb 24, 2019
Messages
151
MBTI Type
*NT*
That's what Martin Niemöller thought, too.


I study Python, but it is a programming language, not a database.


Nice! You're studying or already fully utilizing and just improving?

I know python isn't really a database language...but you can automate queries with it. You could also use PSQL if you're using an oracle product, or the Microsoft version of P-SQL.

I'm learning python slowly as a hobby at the moment, hoping to find some use for it in what I do, but the gap is very large. Most sites sell on python's use in finance, or business, but the truth is the people who run these departments do not like employees automating things. They would rather pay consultants to design massive new software, or pair up with companies like SAP to automate it according to their own paradigms. That way they get the credit for it.
 

Oberon

Permabanned
Joined
Feb 24, 2019
Messages
151
MBTI Type
*NT*
Data streams have taken an increasing role as both data collection and sharing increased. This allows systems and applications to receive data in real time instead of wait for a batch job which runs nightly to give them what they want. For example, a patient has a disease which is fatal unless treated promptly, but the data about the collected specimens is critical too, as it contains information on specimen quality, percentage of malignant or dead cells, and genetic profiles of the disease (which reveal vulnerabilities to treatment options). You don't want the specimen analysis delayed for technology reasons, but at the same time, without the technological coordination, all of the information from multiple specimens quickly becomes difficult to track and integrate into useful information to treat the disease. The technology becomes even more important because the important specimen analysis is done at centralized locations, receiving specimens from hundreds of institutions, some of which are even in other countries.


Old Database-Centered Method

  1. Lab processes/analyze specimen
  2. Technicians enter data into a database
  3. The specimen might then have to go to another lab at the same institution for more processing
  4. That other lab might use a different system and database
  5. A nightly job sends the data to each institution, or possibly several jobs from multiple databases/systems
  6. Hopefully that institution integrates the data with their system as part of their nightly job process
  7. The data about the specimen is available the next day


New Data Stream-Centered Method

As far as I know, this approach has just started to be used within the past few years.

  1. Lab processes/analyze specimen
  2. Technicians enter data into a database
  3. The specimen might then have to go to another lab at the same institution for more processing
  4. That other lab might use a different system and database
  5. A near real-time data stream producer detects the information in the database and sends the data to each institution within seconds or minutes of being entered.
  6. A data stream consumer at the patient's institution is subscribed to updates from the processing/analysis labs.
  7. It receives the specimen data, and integrates it immediately into the hospital's system.
  8. Result: the patient gets treated 1-3 days sooner, and with a treatment approach based on the latest medical knowledge and genetic analysis technology, not just whatever happens to be available at the local hospital.
  9. The data is still retained in a database at both institutions, but the communication of the data update has been moved away from the database to the data stream--using the right tool for the job.

Who produces the stream? Is it an automated process within the database administration software, or an automated process outside from a different system, or a human working with the later on an ongoing basis?

The only problem I see with it, really, is that you risk errors perpetuating. The reason batch processing exists, besides it being a necessary step in the evolution of technology, is that it serves as an internal control. It is required by law that all public companies have internal controls to be traded on the market. Therefore, to bypass batch processing without a control in place to ensure optimal data integrity would be paramount to receiving massive fines and enlisting your company from being publicly traded.

There could be another control of course, such as having a very low error rate of error, but that would still require independent technology auditors to tear open your systems and prove it independently, which would of course, offset any economic gains from the new process. Hence we have automated driving, but won't see it for another fifty years - legal reasons as noted above as well as a matter of practicality in business.
 

JAVO

.
Joined
Apr 24, 2007
Messages
9,178
MBTI Type
eNTP
Who produces the stream? Is it an automated process within the database administration software, or an automated process outside from a different system, or a human working with the later on an ongoing basis?
I'm not sure if any databases have this built in. Generally it's a process specifically written to look for changes in the database. Apache Camel is often used to do this, sending messages to an Apache Kafka server.

The only problem I see with it, really, is that you risk errors perpetuating. The reason batch processing exists, besides it being a necessary step in the evolution of technology, is that it serves as an internal control. It is required by law that all public companies have internal controls to be traded on the market. Therefore, to bypass batch processing without a control in place to ensure optimal data integrity would be paramount to receiving massive fines and enlisting your company from being publicly traded.

There could be another control of course, such as having a very low error rate of error, but that would still require independent technology auditors to tear open your systems and prove it independently, which would of course, offset any economic gains from the new process. Hence we have automated driving, but won't see it for another fifty years - legal reasons as noted above as well as a matter of practicality in business.

Yep, errors resulting from unexpected conditions still have to be dealt with. Sometimes that's though an automated alert which triggers a human to investigate. In many cases, the offending message can be automatically put on hold and then triggered for resend by the human once the error condition has been fixed. Another approach is to treat or mark real-time stream data as preliminary, with the understanding that it's for heads-up style interpretation, or to be used where acting on preliminary data is better than not acting on any data.
 

21%

You have a choice!
Joined
May 15, 2009
Messages
3,224
MBTI Type
INFJ
Enneagram
4w5
I love databases. Love data. Love statistics. Love insights you can glean from data. Big data analyst is my dream job, because beyond all those numbers and text descriptions, sits an individual human -- a customer, a student, a someone behind a computer screen. The data links us. I want to see you. I want to understand you, and a thousand other yous with some variations. It's ultimately impossible, but it's fascinating.
 

Lark

Active member
Joined
Jun 21, 2009
Messages
29,569
I love databases. Love data. Love statistics. Love insights you can glean from data. Big data analyst is my dream job, because beyond all those numbers and text descriptions, sits an individual human -- a customer, a student, a someone behind a computer screen. The data links us. I want to see you. I want to understand you, and a thousand other yous with some variations. It's ultimately impossible, but it's fascinating.

I think/feel that too quite a lot, its struck me reading a lot of recent publishing how much of it is just reportage of data analysis.

Although a lot of it really interests me in terms of how it is collected and what questions people are trying to answer with it etc. There's a lot of confirmation bias and a lot of research errors and that's before you ever get remotely close to the average reader or street level observer like myself.

Its something I never understood the different times I was at university, we did have statistics classes but it was largely so that the degrees, diplomas and masters I did could claim to be sciences, or so I think, maybe that's cynical, there was only one (the masters) that I thought was created/designed with rigor anyway.

Largely it was about using SPSS and I was lucky that it worked properly when I filled the fields etc. I was very aware at the time that I hadnt properly learned and mastered it as I could not have used it privately to satisfy my curiousity, I did learn a lot of good things about the short falls of quantitative research though.

Its something that, if I were independently wealthy and could dedicate myself to learning, that I'd dedicate a few modules in my personal curriculum to.
 

Julius_Van_Der_Beak

Up the Wolves
Joined
Jul 24, 2008
Messages
19,454
MBTI Type
INTP
Enneagram
5w6
Instinctual Variant
sp/so
Databases are fascinating in a way. I took a database class at community college - aced it. Some background - I've been indecisive between computer science, law, business, dream analysis (Jungian Analytical psychology). I went the business route but every year I re-asses. So I took a database class (I know, I know - TMI).

Anyways...some big points (abstractions).

Data tables are connected to other tables by a primary key. Should databases be imbibed into a mega-database, theoretically, your name and email address, in addition to some other personal datum, could be a unique key. With this key everything you ever did on the internet could be made obtainable with a simple query.

Although that is quite impossible at the moment...who knows.

I don't really care though, got nothing to hide but embarrassing photos and random poems.

Do you study databases? SQL? Python?

Well, I know a bit about both. Right now I'm trying to learn a bit more about NoSQL databases. Specifically, Mongo.

What's interesting to me is how simple they are, and how long databases have actually been with is. It's like something from an earlier era of computing that worked well enough that people just continued to use it. I mean, I suppose NoSQL is shaking things up a bit, but I can't see every organization adopting it, and a lot of people are probably going to hand on to SQL.
 

Coriolis

Si vis pacem, para bellum
Staff member
Joined
Apr 18, 2010
Messages
27,230
MBTI Type
INTJ
Enneagram
5w6
Instinctual Variant
sp/sx
Although a lot of it really interests me in terms of how it is collected and what questions people are trying to answer with it etc. There's a lot of confirmation bias and a lot of research errors and that's before you ever get remotely close to the average reader or street level observer like myself.

Its something I never understood the different times I was at university, we did have statistics classes but it was largely so that the degrees, diplomas and masters I did could claim to be sciences, or so I think, maybe that's cynical, there was only one (the masters) that I thought was created/designed with rigor anyway.

Largely it was about using SPSS and I was lucky that it worked properly when I filled the fields etc. I was very aware at the time that I hadnt properly learned and mastered it as I could not have used it privately to satisfy my curiousity, I did learn a lot of good things about the short falls of quantitative research though.
A main shortfall is that statistics, like any process, is only as good as its inputs. Garbage in, garbage out. Especially in the social sciences if you don't ask the right questions, or ask them in the right way, it can skew your results and you won't be measuring what you think you are measuring. There is significant room for bias here. Asking one question can even affect how people answer later questions. That doesn't make it a worthless pursuit, just means researchers must have a sound understanding of the limitations of their methodology, and the factors that can influence it.

I love databases. Love data. Love statistics. Love insights you can glean from data. Big data analyst is my dream job, because beyond all those numbers and text descriptions, sits an individual human -- a customer, a student, a someone behind a computer screen. The data links us. I want to see you. I want to understand you, and a thousand other yous with some variations. It's ultimately impossible, but it's fascinating.
I appreciate data and statistics more than most people, but I don't want to be understood by some data analyst I don't even know, and who has no business peering into my personal affairs.

Nice! You're studying or already fully utilizing and just improving?

I know python isn't really a database language...but you can automate queries with it. You could also use PSQL if you're using an oracle product, or the Microsoft version of P-SQL.

I'm learning python slowly as a hobby at the moment, hoping to find some use for it in what I do, but the gap is very large. Most sites sell on python's use in finance, or business, but the truth is the people who run these departments do not like employees automating things. They would rather pay consultants to design massive new software, or pair up with companies like SAP to automate it according to their own paradigms. That way they get the credit for it.
I am somewhere in between with respect to Python. I did some online training, and am working now to apply it to my work, specifically the task of scientific calculations and modelling. So, I use scipy, numpy, etc.
 

Mole

Permabanned
Joined
Mar 20, 2008
Messages
20,284
The Chinese Communist Party is putting up video cameras all over China and are building a data base for every Chinese. And the Chinese Communist Party has imprisoned one million Chinese muslims in concentration camps. And considering that this same Communist Party has killed fifty million Chinese, I wouldn't feel too hopeful for the million imprisoned Chinese muslims.

And the Communist Party of China is making sophisticated attempts to interfere with and control Australian democracy.
 

JAVO

.
Joined
Apr 24, 2007
Messages
9,178
MBTI Type
eNTP
A main shortfall is that statistics, like any process, is only as good as its inputs. Garbage in, garbage out. Especially in the social sciences if you don't ask the right questions, or ask them in the right way, it can skew your results and you won't be measuring what you think you are measuring. There is significant room for bias here. Asking one question can even affect how people answer later questions. That doesn't make it a worthless pursuit, just means researchers must have a sound understanding of the limitations of their methodology, and the factors that can influence it.
Well-said.

I am somewhere in between with respect to Python. I did some online training, and am working now to apply it to my work, specifically the task of scientific calculations and modelling. So, I use scipy, numpy, etc.
One of the biggest advantages of Python is that there's at least one library for almost everything from numeric and genetic analysis to automating a house or vehicle. That's one of the reasons it has become one of my favorite languages over the past year.


I'm learning python slowly as a hobby at the moment, hoping to find some use for it in what I do, but the gap is very large. Most sites sell on python's use in finance, or business, but the truth is the people who run these departments do not like employees automating things. They would rather pay consultants to design massive new software, or pair up with companies like SAP to automate it according to their own paradigms. That way they get the credit for it.

You can always automate your own work, doing it way faster and better than everyone else. I work in a group which does software development and data analysis, and I'm the only one who uses Python. It has allowed me to do things slightly faster and better even among people who know how to write their own code and use other tools to automate things. I'm bragging only about Python. :)
 

Jaq

Remember, Humanity.
Joined
Apr 14, 2011
Messages
3,032
MBTI Type
ENTP
Enneagram
379
Instinctual Variant
sp/sx
I just want to weigh in that I find python to be annoying and prefer Java.
 

Lark

Active member
Joined
Jun 21, 2009
Messages
29,569
The Chinese Communist Party is putting up video cameras all over China and are building a data base for every Chinese. And the Chinese Communist Party has imprisoned one million Chinese muslims in concentration camps. And considering that this same Communist Party has killed fifty million Chinese, I wouldn't feel too hopeful for the million imprisoned Chinese muslims.

And the Communist Party of China is making sophisticated attempts to interfere with and control Australian democracy.

Mole the Chinese Communist Party Central Committee ARE the Australian democracy.

- - - Updated - - -


As coffees go I like Java too, I dont know what this Python is like.
 

Jaq

Remember, Humanity.
Joined
Apr 14, 2011
Messages
3,032
MBTI Type
ENTP
Enneagram
379
Instinctual Variant
sp/sx
I'm self-taught, and I kept on getting told that Python was an easily language to start with that was powerful. I use it for an engine in my game development and writing ventures, but I just find it annoying.
 

Mole

Permabanned
Joined
Mar 20, 2008
Messages
20,284
Mole the Chinese Communist Party Central Committee ARE the Australian democracy. - - - Updated - - - As coffees go I like Java too, I dont know what this Python is like.
You have an unhealthy obsession with me so I am putting you on Ignore.
 
Top