• You are currently viewing our forum as a guest, which gives you limited access to view most discussions and access our other features. By joining our free community, you will have access to additional post topics, communicate privately with other members (PM), view blogs, respond to polls, upload content, and access many other special features. Registration is fast, simple and absolutely free, so please join our community today! Just click here to register. You should turn your Ad Blocker off for this site or certain features may not work properly. If you have any problems with the registration process or your account login, please contact us by clicking here.

Machine Learning and MBTI

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
So it seems my team has been able to develop a machine learning classifier that can predict MBTI type based on text. The algorithm takes in all the text from a member's posts and then predicts based on that what their MBTI type is. I used posts from this forum to test it out. It accurately predicts a member's type at a 93% accuracy rate. For each of the MBTI letters, the accuracy is as follows: I/E: 98%, N/S: 99%, T/F: 98%, J/P: 98%. This includes removing all references to MBTI and Enneagram type within posts, which could result in so-called "data leakage" and allow the classifier to cheat. I have wanted to do this kind of project for years but finally, have been able to do it in my master's program at Berkeley in data science. I always thought it could be done but am actually astounded at the accuracy of the results. It was not without a lot of work to evaluate various classifiers, methods, and features.
 
Last edited:

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
The official MBI test averages .815 accuracy on each of the dichotomies. If you multiply that out by four, it isn't the best result. I have a research paper that describes all of this that I can ultimately publish.
 
Last edited:

ygolo

My termites win
Joined
Aug 6, 2007
Messages
5,619
So it seems my team has been able to develop a machine learning classifier that can predict MBTI type based on text. The algorithm takes in all the text from a member's posts and then predicts based on that what their MBTI type is. I used posts from this forum to test it out. It accurately predicts a member's type at a 93% accuracy rate. For each of the MBTI letters, the accuracy is as follows: I/E: 98%, N/S: 99%, T/F: 98%, J/P: 98%. This includes removing all references to MBTI and Enneagram type within posts, which could result in so-called "data leakage" and allow the classifier to cheat. I have wanted to do this kind of project for years but finally, have been able to do it in my master's program at Berkeley in data science. I always thought it could be done but am actually astounded at the accuracy of the results. It was not without a lot of work to evaluate various classifiers, methods, and features.

I'm curious. How big are the classifier models?
What percent did you train on vs. test on?
 

Luminous

༻✧✧༺
Joined
Oct 25, 2017
Messages
10,079
MBTI Type
Iᑎᖴᑭ
Enneagram
952
Instinctual Variant
sx/sp
This is very exciting.

However, I urge you to test it somewhere where people's types have been verified by an outside professional source. Otherwise, you're including mistypes and purposeful entering of mistypes in your data, making it unreliable and potentially meaningless. I love the forum, but you know there are many mistypes here, along with disagreements about whether to use cognitive functions or dichotomies for typings, in addition to disagreements about how those are all defined. It just makes the test pool far too murky. It sounds like you're using dichotomies? Surely you must be able to get a test pool of people who've been professionally typed?
 

Coriolis

Odd man out
Staff member
Joined
Apr 18, 2010
Messages
27,050
MBTI Type
INTJ
Enneagram
5w6
Instinctual Variant
sp/sx
This is very exciting.

However, I urge you to test it somewhere where people's types have been verified by an outside professional source. Otherwise, you're including mistypes and purposeful entering of mistypes in your data, making it unreliable and potentially meaningless. I love the forum, but you know there are many mistypes here, along with disagreements about whether to use cognitive functions or dichotomies for typings, in addition to disagreements about how those are all defined. It just makes the test pool far too murky. It sounds like you're using dichotomies? Surely you must be able to get a test pool of people who've been professionally typed?
I had a similar thought. There are also people who will post joke types. @highlander has been around long enough, perhaps he used only those members who are reasonably confident about their types. Professional verification is one yardstick, but I'm sure I am not the only one who gained that confidence level through independent study and interaction with people who know both the system and me as a person.

If I recall correctly from Gifts Differing, that is how Briggs and Meyers initially validated the MBTI system, by giving the questionnaire to people whose type had been verified independently, in that case by interviews.
 

ygolo

My termites win
Joined
Aug 6, 2007
Messages
5,619
I had a similar thought. There are also people who will post joke types. @highlander has been around long enough, perhaps he used only those members who are reasonably confident about their types. Professional verification is one yardstick, but I'm sure I am not the only one who gained that confidence level through independent study and interaction with people who know both the system and me as a person.

If I recall correctly from Gifts Differing, that is how Briggs and Meyers initially validated the MBTI system, by giving the questionnaire to people whose type had been verified independently, in that case by interviews.

It is interesting. What is "ground truth" in general in typology, considering Myers-Briggs has lost credence relative to the Big Five?

I should say I have lost credence in personality theories ability to tell anything real--unless Dario Nardi's brain scan work as been repeated by many independent sources (I haven't kept up since I lost interest).
 

Vendrah

Well-known member
Joined
Mar 26, 2017
Messages
1,901
MBTI Type
NP
Enneagram
952
So it seems my team has been able to develop a machine learning classifier that can predict MBTI type based on text. The algorithm takes in all the text from a member's posts and then predicts based on that what their MBTI type is. I used posts from this forum to test it out. It accurately predicts a member's type at a 93% accuracy rate. For each of the MBTI letters, the accuracy is as follows: I/E: 98%, N/S: 99%, T/F: 98%, J/P: 98%. This includes removing all references to MBTI and Enneagram type within posts, which could result in so-called "data leakage" and allow the classifier to cheat. I have wanted to do this kind of project for years but finally, have been able to do it in my master's program at Berkeley in data science. I always thought it could be done but am actually astounded at the accuracy of the results. It was not without a lot of work to evaluate various classifiers, methods, and features.
It remembers me of a study which used hand-made signature to identify MBTI types and at least it did get to higher test-retest rates. Regardless, I think you seem to be doing a good job.
Two questions, though:
- What does it say about me? I am actually helping the algorithm for MBTI, since any xNxP will do.
- Did you ever considered moving to a better typology system or a more credited personality system, like HEXACO or Big Five? Are you aware of the types on Big Five, for example?
 

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
I'm curious. How big are the classifier models?
What percent did you train on vs. test on?
What do you mean "how big"? The best model trained on 120,000 samples and tested on 93,000.
 

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
It remembers me of a study which used hand-made signature to identify MBTI types and at least it did get to higher test-retest rates. Regardless, I think you seem to be doing a good job.
Two questions, though:
- What does it say about me? I am actually helping the algorithm for MBTI, since any xNxP will do.
- Did you ever considered moving to a better typology system or a more credited personality system, like HEXACO or Big Five? Are you aware of the types on Big Five, for example?
I have read many papers on big 5 and machine learning. The original work on this topic was done on Big 5 because that is what academics use. What makes you think big 5 is better? The fact that we got 98 - 99% accuracy on each letter accuracy says something about MBTI. I honestly have never heard about HEXACO. Who uses it and why is that better? MBTI and Enneagram are much more practically useful because there is a lot more information on how to use those systems.
 

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
This is very exciting.

However, I urge you to test it somewhere where people's types have been verified by an outside professional source. Otherwise, you're including mistypes and purposeful entering of mistypes in your data, making it unreliable and potentially meaningless. I love the forum, but you know there are many mistypes here, along with disagreements about whether to use cognitive functions or dichotomies for typings, in addition to disagreements about how those are all defined. It just makes the test pool far too murky. It sounds like you're using dichotomies? Surely you must be able to get a test pool of people who've been professionally typed?
The counterpoint to your argument and the point @Coriolis made is that the accuracy results speak for themselves. We filtered out posts with invalid type. If the type data was not good or MBTI couldn't be predicted by analysis of text, the classifier wouldn't perform so well.
 

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
It remembers me of a study which used hand-made signature to identify MBTI types and at least it did get to higher test-retest rates. Regardless, I think you seem to be doing a good job.
Two questions, though:
- What does it say about me? I am actually helping the algorithm for MBTI, since any xNxP will do.
- Did you ever considered moving to a better typology system or a more credited personality system, like HEXACO or Big Five? Are you aware of the types on Big Five, for example?
We did have a big 5 classifier on the site by the way for a few years using IBM AI software. It performed ok on the forum posts and against tumbler posts but not well against Twitter or Facebook posts. Generally, the graphics were beautiful but the type description info was pretty mediocre.
 

Luminous

༻✧✧༺
Joined
Oct 25, 2017
Messages
10,079
MBTI Type
Iᑎᖴᑭ
Enneagram
952
Instinctual Variant
sx/sp
The counterpoint to your argument and the point @Coriolis made is that the accuracy results speak for themselves. We filtered out posts with invalid type. If the type data was not good or MBTI couldn't be predicted by analysis of text, the classifier wouldn't perform so well.
How did you determine what was invalid?
 

ygolo

My termites win
Joined
Aug 6, 2007
Messages
5,619
What do you mean "how big"? The best model trained on 120,000 samples and tested on 93,000.
What I meant: The number of weights + hyperparameters is "the size of the model".

When you compare against the samples for training and testing, ideally we would like the model size to be small. This desire is similar in logic to avoiding ad-hoc hypothesizing, avoiding overfitting curves to the number of datapoints, using ANOVA instead of multiple t-tests, etc. (Just giving for context for readers who may not know. I am sure you do.)
 

Vendrah

Well-known member
Joined
Mar 26, 2017
Messages
1,901
MBTI Type
NP
Enneagram
952
I have read many papers on big 5 and machine learning. The original work on this topic was done on Big 5 because that is what academics use. What makes you think big 5 is better? The fact that we got 98 - 99% accuracy on each letter accuracy says something about MBTI. I honestly have never heard about HEXACO. Who uses it and why is that better? MBTI and Enneagram are much more practically useful because there is a lot more information on how to use those systems.

I have read many papers on big 5 and machine learning. The original work on this topic was done on Big 5 because that is what academics use. What makes you think big 5 is better? The fact that we got 98 - 99% accuracy on each letter accuracy says something about MBTI. I honestly have never heard about HEXACO. Who uses it and why is that better? MBTI and Enneagram are much more practically useful because there is a lot more information on how to use those systems.

You forgot to answer one of my questions, the personal one, which type the algorithm thinks I am? I am fine with any XNXP, for enneagram I am quite open but I give a lot that I am a 5 or a 9.

What makes me think Big Five is better? Well, first you can do both, its not necessarily a matter of it being better. But Big Five definitely have it cons over the MBTI. First, as you said, the academics basically use the Big Five rather than MBTI or enneagram and that's for a reason - you can ask them perhaps. Second, the Big Five is falsifiable whereas the MBTI and enneagram are not, the Big Five is more objective than the MBTI in nature due to that. Related and overlapped to that, but not totally, Big Five is less "guess-based" work than the MBTI and due to being more empirical it is more in touch with reality than the MBTI. The method does matter, and that is what makes the Big Five more scientific than the MBTI. Third, predicts life outcomes better than MBTI, and the enneagram doesn't even have anything on that. Fourth, normally the aspects of reliability, stability, validity, etc... of Big Five are slightly better than the official MBTI and the enneagram.

The Big Five is not obscure - like, in fact there is a lot of information about it on the internet - and the big five info on the internet is much more consistent than the MBTI. So, for example, you won't find people on the big five arguing if Low Agreeableness is selfish or not, if it being selfish is a stereotype of low agreeableness or not, yet you will find discussions if 'Fi' is selfish or not, arguing of that kind is almost everywhere on typology community, not just typology central. Actually, people even disagree what MBTI is - if it is the 'letter' typing, the cognitive functions, my own more literal jung interpretation, akhromant's and company stack, etc... Such discussion does not exist on the Big Five nor there was ever a need for me to create a thread and even a mini book explaining what the original source is because there is no such distortion of the original source on the big five while on the MBTI, and Jung specially, there is. The Enneagram is similar - there are Ichazo's followers, Naranjo's followers (like @mancino perhaps), Hudson & Enneagram institute, and there are conflicts such as that in Naranjo's most interpret that an E9 cannot be intuitive whereas other interpret that is possible. This vagueness is great to bring activity to the forum - people engage, discuss and even fight, so on the Big Five there are less things to discuss and big five boards/sub-forums are in generally dry - but for exploration and as a system that is actually a bad thing.

You can still use one and the other rather than using one and despising the other.
However, i get that in the forum there is a limitation. You didn't put a place to fill the big five personality on profiles properly, but you did for MBTI and enneagram.
 

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
You forgot to answer one of my questions, the personal one, which type the algorithm thinks I am? I am fine with any XNXP, for enneagram I am quite open but I give a lot that I am a 5 or a 9.

What makes me think Big Five is better? Well, first you can do both, its not necessarily a matter of it being better. But Big Five definitely have it cons over the MBTI. First, as you said, the academics basically use the Big Five rather than MBTI or enneagram and that's for a reason - you can ask them perhaps. Second, the Big Five is falsifiable whereas the MBTI and enneagram are not, the Big Five is more objective than the MBTI in nature due to that. Related and overlapped to that, but not totally, Big Five is less "guess-based" work than the MBTI and due to being more empirical it is more in touch with reality than the MBTI. The method does matter, and that is what makes the Big Five more scientific than the MBTI. Third, predicts life outcomes better than MBTI, and the enneagram doesn't even have anything on that. Fourth, normally the aspects of reliability, stability, validity, etc... of Big Five are slightly better than the official MBTI and the enneagram.

The Big Five is not obscure - like, in fact there is a lot of information about it on the internet - and the big five info on the internet is much more consistent than the MBTI. So, for example, you won't find people on the big five arguing if Low Agreeableness is selfish or not, if it being selfish is a stereotype of low agreeableness or not, yet you will find discussions if 'Fi' is selfish or not, arguing of that kind is almost everywhere on typology community, not just typology central. Actually, people even disagree what MBTI is - if it is the 'letter' typing, the cognitive functions, my own more literal jung interpretation, akhromant's and company stack, etc... Such discussion does not exist on the Big Five nor there was ever a need for me to create a thread and even a mini book explaining what the original source is because there is no such distortion of the original source on the big five while on the MBTI, and Jung specially, there is. The Enneagram is similar - there are Ichazo's followers, Naranjo's followers (like @mancino perhaps), Hudson & Enneagram institute, and there are conflicts such as that in Naranjo's most interpret that an E9 cannot be intuitive whereas other interpret that is possible. This vagueness is great to bring activity to the forum - people engage, discuss and even fight, so on the Big Five there are less things to discuss and big five boards/sub-forums are in generally dry - but for exploration and as a system that is actually a bad thing.

You can still use one and the other rather than using one and despising the other.
However, i get that in the forum there is a limitation. You didn't put a place to fill the big five personality on profiles properly, but you did for MBTI and enneagram.
Since your four-letter type isn't one of the valid 16 options, we would have pulled it out of the analysis, so I'm afraid I don't have that for you at the present time. I understand there is some value of having a personality type system that can be independently verified but people don't use OCEAN in practice that much. I don't believe it will ever be adopted by the masses. One of its issues is that there are certain traits, such as neuroticism, that aren't associated with a positive connotation so you have a system that describes traits about a person that has negative or positive attributes (like being conscientious or not). It is of limited value because people won't want to share their type information as a result for fear of being labeled or discriminated against - as compared to MBTI or Enneagram or Strengthsfinder. Other systems focus on how people are different from each other but not in negative or positive ways.
 

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
How did you determine what was invalid?
We trained the classifier on 120,00 posts and then tested it against type information for 93,000 posts. The label is the MBTI designation in a profile. That's what's being predicted.
 

highlander

Administrator
Staff member
Joined
Dec 23, 2009
Messages
26,396
MBTI Type
INTJ
Enneagram
6w5
Instinctual Variant
sx/sp
What I meant: The number of weights + hyperparameters is "the size of the model".

When you compare against the samples for training and testing, ideally we would like the model size to be small. This desire is similar in logic to avoiding ad-hoc hypothesizing, avoiding overfitting curves to the number of datapoints, using ANOVA instead of multiple t-tests, etc. (Just giving for context for readers who may not know. I am sure you do.)
I think Transformers using neural networks may operate differently than you are thinking. We used a T5-small model, which provides eight-headed attention across the encoder and decoder, resulting in approximately 60 million parameters.
 

Vendrah

Well-known member
Joined
Mar 26, 2017
Messages
1,901
MBTI Type
NP
Enneagram
952
Since your four-letter type isn't one of the valid 16 options, we would have pulled it out of the analysis, so I'm afraid I don't have that for you at the present time. I understand there is some value of having a personality type system that can be independently verified but people don't use OCEAN in practice that much. I don't believe it will ever be adopted by the masses. One of its issues is that there are certain traits, such as neuroticism, that aren't associated with a positive connotation so you have a system that describes traits about a person that has negative or positive attributes (like being conscientious or not). It is of limited value because people won't want to share their type information as a result for fear of being labeled or discriminated against - as compared to MBTI or Enneagram or Strengthsfinder. Other systems focus on how people are different from each other but not in negative or positive ways.
What about enneagram, tri-types is invalid format either?

But big five is the most popular system on the articles and papers regardless, you don't have plan on write one of those? You could be "discredited" or less credited or taking a risk of being sort of not well considered nor quoted even if the work have a good results due to the lower credibility of the MBTI. As I said, I am NOT suggesting you to put your analysis of MBTI and Enneagram on a thrash can, that is good work, I am suggesting you to expand towards the big five at least. Expanding it to big five in the future will give you more credit and it is going to be more relevant if you publish it. The "MBTI" on the way we use on community is for free, but formally MBTI is a rights reserved company that sells the tests and you can only do experiments with their permission and it will be hard to get any of the officiaal test for free (they probably somehow even managed to take Sakinorva's 'unofficial' version out), while Big Five IPIP NEO at least is free sourced and probably for academics NEO PI R shouldn't be that hard to access.
 

Luminous

༻✧✧༺
Joined
Oct 25, 2017
Messages
10,079
MBTI Type
Iᑎᖴᑭ
Enneagram
952
Instinctual Variant
sx/sp
What about enneagram, tri-types is invalid format either?

But big five is the most popular system on the articles and papers regardless, you don't have plan on write one of those? You could be "discredited" or less credited or taking a risk of being sort of not well considered nor quoted even if the work have a good results due to the lower credibility of the MBTI. As I said, I am NOT suggesting you to put your analysis of MBTI and Enneagram on a thrash can, that is good work, I am suggesting you to expand towards the big five at least. Expanding it to big five in the future will give you more credit and it is going to be more relevant if you publish it. The "MBTI" on the way we use on community is for free, but formally MBTI is a rights reserved company that sells the tests and you can only do experiments with their permission and it will be hard to get any of the officiaal test for free (they probably somehow even managed to take Sakinorva's 'unofficial' version out), while Big Five IPIP NEO at least is free sourced and probably for academics NEO PI R shouldn't be that hard to access.
I can't imagine a reputable psychology journal publishing something that was created purely from forum posts and tested purely on forum posts. It may well be a great test, it may be accurate, but there's no way to prove that if it's created and tested in a vacuum with no standardized objective definitions of what any of the terms mean or whether anyone actually is the type they have listed on their profile.

It would be a really cool test for users of the forum, though.
 

Vendrah

Well-known member
Joined
Mar 26, 2017
Messages
1,901
MBTI Type
NP
Enneagram
952
I can't imagine a reputable psychology journal publishing something that was created purely from forum posts and tested purely on forum posts. It may well be a great test, it may be accurate, but there's no way to prove that if it's created and tested in a vacuum with no standardized objective definitions of what any of the terms mean or whether anyone actually is the type they have listed on their profile.

It would be a really cool test for users of the forum, though.

What makes me believe that @highlander could have something in mind is this part:

have been able to do it in my master's program at Berkeley in data science.

This could be used for the masters, but for that you would need a Big Five (or HEXACO, but if @highlander doesn't know it, then Big Five is perhaps a better suggestion) rather than the MBTI. He could simply publish a paper with text prediction software for personality on the big five. The sad and scary part of it would be the corporate use for this.

Using text analysis bots to type isn't new, but almost all of them I know fail to do the correspondence, but in part it is related to context. So, for example, I did tried to throw people's journal/diary on the text analyzer and they mostly orbit towards INFP even if they were ENTJ - because journaling is sort of an INFP thing on definitions.
 
Last edited:
Top