take 5000 people - nationally represenative
get them all to complet their MBTI questionnaire on a specific day
(get them professionally assesed and categoriesed) - code the data base.
This data can be evaludated for internal reliability
6 months (maybe a year) later do it again - code to the data base
Check out which answers have changed - check out which MBTI classifications ahve changed (instantly you've provide SOME evidence of sustainability of classification.
Keep repeating and evaluating for say waves....
Each wave would cost in data collection ... est £300k....(maybe more)
So very expensive to do, but far from impossible at all.
As I said in my post to SW, I'd guess it is reliable about 80-90% of the time... or it simily wouldn't work on mass level
It's internal maths of a large sample, and repeated over time and compared between waves....
Probably a far less sexy answer than you want, but it will establish the degree of error of MBTI as a device
I'd prefer that nothing in my interactions with you involve the topic of sex at all, thanks.
Anyway even if people consistently test the same way, you still haven't solved the problem of self-report because there's no guarantee that the way people describe themselves is accurate.
So let's say we did this experiment and 16% of people tested ISFJ. Even if we repeated it later and even if
exactly the same 16% of people tested ISFJ, we'd still be nowhere because we have no idea how accurately any of those people described themselves during the testing process. All we know is "16% of people DESCRIBE themselves as ISFJ."
We'd still have no idea whether those people genuinely use the cognitive processes we've labeled I, S, F and J more often than any others. They could all be ENTPs who imagine themselves to be ISFJs, for all we know.
The "80-90% margin of error" that you describe would only cover the consistency in results between repeated waves of testing--
it still doesn't do anything to guarantee the descriptive accuracy of any of the tests in the first place.
you don't - you measure it... and evaluate it over time...
Unless people lie over time - which on a questionnaire that isn't discrete is REALLY difficult to do... they would be within the ball park of what their type is.
I'm not claiming yu can definatively type people... which I think is where you head is, but you can establish levels of error.
Also take the first wave of the data set... you can establish how close people fall between the lines of categorisation... you have multiple questions evaluating the dicotomy of NS.... if out the 5000, 100 are sat fairly tight to the mid point, then they would be grey classifications... this gets clocked up across the 4 dicotomies... which as I siad before is likely to show less than 20% based on my gut feel of wooly classifications.... say at the end of wave 1 there are 500 people who fall into mid way classifications...
The analyst would then look at those 500 people answers the next time and see how they ahve changed etc... this is then measured etc..
so you build up the reliability case... It begins to measure reliability... it does work...
As I said no segmentation is 100% reliable....
People don't have to consciously lie; they just have to mistakenly describe themselves inaccurately. We have no hard evidence that any of MBTI's descriptors actually fit real neurochemical processes biologically in the first place, and no idea how well the test correlates with people's genuine psychological types.