User Tag List

12 Last

Results 1 to 10 of 19

  1. #1
    Senior Member
    Join Date
    Aug 2013
    MBTI
    ISTP
    Posts
    353

    Default Testing cognitive function theories using document classification

    I have come up with a sketch of a design for testing cognitive function theories. This missing piece is that I need a really big dataset of text written by each type. Online forums could conceivably work - the data can be noisy (i.e., written by mistyped people) as long as people are typed correctly on average and there is a lot of data. Collecting all the data is by far the hardest part of the project.

    The sketch goes like this - use LibShortText in logistic regression mode. Create eight classes, one for each cognitive function. The training set consists of all the text from every type that has that function in its dominant set (top 4, or even top 2 or top 1). Then feed new pieces of text through the system (i.e., forum or blog text written by someone we want to type) and get the probability estimates for each of the eight classes. If there is real signal there, this will give an ordering of the extent to which each function is present in the person who generated that text. This framework would support further experiments that directly compare differing theories on the ordering of cognitive functions.

    Let me know your ideas for datasets - I can do a simple version of this project in a weekend.

  2. #2
    Member chaoticbrain's Avatar
    Join Date
    Jul 2013
    MBTI
    NeTi
    Enneagram
    6 sx
    Posts
    82

    Default

    Awesome idea.

    Will be interesting to see what you find.

  3. #3
    Senior Member
    Join Date
    Aug 2013
    MBTI
    ISTP
    Posts
    353

    Default

    What! You changed your response! Haha.

    $ time wget -q -r --no-parent http://www.typologycentral.com/forums/archive/index.php

    real 1187m19.800s
    user 0m45.366s
    sys 2m8.574s

    $ du -sh www.typologycentral.com/
    1.3G www.typologycentral.com/


    Unfortunately this archive dataset didn't actually contain each user's type, so I'll have to do it again on the full forum.

  4. #4
    Senior Member
    Join Date
    Aug 2013
    MBTI
    ISTP
    Posts
    353

    Default

    Are we quick and dirty yet?

    Code:
    #!/bin/python
    # Download and extract sweet, sweet data from Typology Central
    # This doesn't log in and only grabs the first page of a thread
    # It discards types that are not all upcased MBTI types
    # Ignores posts with "Originaly posted by" quotes
    
    from pprint import pprint
    from bs4 import BeautifulSoup as bs
    from urllib import urlopen
    import re
    
    base = 'http://www.typologycentral.com/forums'
    
    # get total number of threads
    main_page = bs(urlopen(base + '/forum.php').read())
    n_threads = int(main_page.find(id="wgo_stats").dd.string.replace(",",""))
    
    data = {}
    for i in range(n_threads):
        url = base + "/showthread.php?pp=100&t=" + str(i)
        page = bs(urlopen(url))
    
        # No Thread specified - skip
        error = page.find("div", {"class" : "standard_error"})
        if error is not None: continue
    
        posts = page.findAll("li", id=re.compile("^post_.*"))
    
        for post in posts:
            mbti = post.find("dd", text=re.compile("ISTJ|ISFJ|INFJ|INTJ|ISTP|ISFP|INFP|INTP|ESTP|ESFP|ENFP|ENTP|ESTJ|ESFJ|ENFJ|ENTJ"))
            if mbti is None: continue
            mbti = mbti.text
    
            content = post.find("blockquote", {"class" : "postcontent"})
            content = content.text.replace("\n"," ").replace("\n", " ").strip()
            if "Originally Posted by" in content: continue
    
            if not data.has_key(mbti):
                data[mbti] = [content]
            else:
                data[mbti].append(content)
    
        print "Thread:", url.split("=")[-1]

  5. #5
    Senior Member
    Join Date
    Aug 2013
    MBTI
    ISTP
    Posts
    353

    Default

    I killed it after the first 28220 threads (~half of all threads) figuring that was enough data for now. Here's how many posts per type that I have:


    INFP 28404
    INFJ 26662
    INTP 24219
    ENTP 23095
    ENFP 22371
    INTJ 17188
    ENTJ 8443
    ISFP 7848
    ISTP 5432
    ENFJ 4917
    ISTJ 4518
    ESTP 3295
    ISFJ 2833
    ESFJ 1661
    ESFP 1233
    ESTJ 1067

  6. #6
    Senior Member
    Join Date
    Aug 2013
    MBTI
    ISTP
    Posts
    353

    Default

    I trained a model to predict the type of the person who wrote a piece of text. I trained on half of the data and tested on the other half. That's 91482 training posts and 91482 testing posts, with the same distribution of posts per type as listed above. This particular model isn't biased by unevenly distributed training data.

    Training:

    time python ~/Downloads/libshorttext-1.1/text-train.py typoc_train.libshorttext
    **.***
    optimization finished, #iter = 19
    Objective value = -67916.618569
    nSV = 542992

    real 2m48.757s
    user 2m7.682s
    sys 0m3.177s


    Testing:

    time python ~/Downloads/libshorttext-1.1/text-predict.py typoc_test.libshorttext typoc_train.libshorttext.model predict_result
    Accuracy = 26.9649% (24668/91482)

    real 2m3.003s
    user 1m43.357s
    sys 0m2.029s


    Note that chance performance is 1/16 = 6.25%, whereas the classifier's performance was ~27%. Impressive!

    However, there are all manner of potential issues. What we really need is a completely out of set testing set. i.e., a testing set that is not from TypoC. I don't feel like putting this model up as a web service right now, but if you guys don't mind pasting pieces of example text from confirmed types that were not posted on this forum (or any forum, ideally) we can see how the performance stacks up.

  7. #7
    Senior Member
    Join Date
    Aug 2013
    MBTI
    ISTP
    Posts
    353

    Default

    I tested the model on my last 10 Facebook posts and it got them all wrong. The guesses suggest that the model is, in fact, strongly biased by the uneven distribution of types in the training data. This is easily fixed - I just have to limit the maximum number of training examples to the number of instances of posts by ESTJs (who have posted the least). This will also greatly weaken the classifier, however.

    For the record, it guessed: ENFP, INFP, ENFP, ENTP, INTJ, ENTP, ENTP, ENTP, INFP, INFP.

    Which, as you can see above, is pulled right out of the distribution of types who post most frequently.

    INFP 28404
    INFJ 26662
    INTP 24219
    ENTP 23095
    ENFP 22371
    INTJ 17188
    ENTJ 8443
    ISFP 7848
    ISTP 5432
    ENFJ 4917
    ISTJ 4518
    ESTP 3295
    ISFJ 2833
    ESFJ 1661
    ESFP 1233
    ESTJ 1067


    Sadly, controlling for this factor will probably result in a classifier that does not perform better than chance.

    I'll need a lot of training data from each type in order to really test it out.

  8. #8
    Senior Member
    Join Date
    Aug 2013
    MBTI
    ISTP
    Posts
    353

    Default

    So I ran a quick experiment to see if the classifier is actually better than chance. I randomly selected about 1000 posts from each type, since there were only about 1000 posts for ESTJs, and I trained on 500 and tested on 500. The classifier got 15.57%, which is more than 2x better than chance, which is actually impressive given how small the training set is.

    Still, not having done a detailed investigation of its performance, it could still be cheating somehow. These classifiers are clever like that.

  9. #9
    Senior Member
    Join Date
    Aug 2013
    MBTI
    ISTP
    Posts
    353

    Default

    I ran 10-fold cross-validation using 90% of the data for training and 10% for testing, and got a mean performance of 19.93% accuracy with a standard deviation of .71. That's pretty awesome, the classifier is > 3x better than chance on average. However, there is still a problem with it. As you can see here, it is systematically biased. For instance, it guesses ESFJ > 2x most frequently than its less preferred types. I am guessing that this is due to differences in mean post length for type, which I will now control for in the training set.

    Code:
     76 ENFP 84 ENTJ 87 ENFJ 87 INTJ 87 INTP 94 ISFP 95 INFP 97 INFJ 104 ISTP 107 ISFJ 115 ISTJ 117 ESTJ 119 ESTP 127 ESFP 131 ENTP 179 ESFJ
     77 INFP 78 ENTP 78 INFJ 80 INTJ 88 ENTJ 90 ISTP 92 INTP 93 ENFJ 93 ISFP 97 ENFP 109 ISFJ 115 ISTJ 130 ESTJ 144 ESTP 151 ESFP 192 ESFJ
     73 ENTJ 74 ENTP 83 INFJ 84 INFP 84 INTJ 89 INTP 92 ENFP 108 ISFP 113 ENFJ 113 ISTP 115 ISTJ 118 ISFJ 119 ESTJ 127 ESFP 131 ESTP 184 ESFJ
     73 INTJ 79 INFJ 87 ENFJ 89 INTP 91 ENTP 97 ISFJ 100 ISFP 101 ISTJ 104 ENFP 104 INFP 109 ENTJ 112 ESFP 114 ISTP 127 ESTJ 139 ESTP 180 ESFJ
     76 INFP 77 INTJ 83 ENFJ 83 ENFP 89 INTP 91 INFJ 95 ISFJ 98 ENTJ 99 ISFP 101 ENTP 105 ISTP 123 ESFP 126 ESTP 128 ISTJ 159 ESTJ 174 ESFJ
     69 INFP 77 ENTP 78 ENFP 89 ENFJ 90 INTP 100 INFJ 105 ISFJ 106 INTJ 108 ISTJ 108 ISTP 109 ISFP 116 ENTJ 128 ESFP 129 ESTP 135 ESTJ 159 ESFJ
     74 INTP 81 ENFP 83 ENTP 90 INFP 91 INTJ 94 ENFJ 94 ISFJ 96 ISFP 100 INFJ 102 ENTJ 106 ISTP 113 ISTJ 126 ESTJ 131 ESTP 139 ESFP 187 ESFJ
     70 INFP 74 INTP 82 INTJ 85 ENTP 89 ENFJ 89 ENFP 107 INFJ 108 ESTP 109 ISTJ 112 ISTP 115 ENTJ 118 ISFJ 118 ISFP 125 ESTJ 127 ESFP 178 ESFJ
     79 INTJ 81 ENFP 85 ENTP 87 INFP 92 INTP 94 ISFP 95 INFJ 96 ENFJ 96 ISTP 102 ENTJ 102 ISTJ 113 ESTP 125 ISFJ 139 ESFP 149 ESTJ 171 ESFJ
     72 INFP 76 ISFP 78 INTJ 89 ENFJ 94 ENTP 99 ENFP 103 INTP 104 INFJ 106 ENTJ 111 ESTP 115 ISTJ 115 ISTP 120 ESFP 120 ISFJ 137 ESTJ 167 ESFJ

  10. #10
    Member chaoticbrain's Avatar
    Join Date
    Jul 2013
    MBTI
    NeTi
    Enneagram
    6 sx
    Posts
    82

    Default

    Woah, that's a lot of data.

    And ya I can still help you if you want me to try collecting data, but I'm not even sure how you collected that much.

    Out of curiosity, do you need programming knowledge to run something like this ?

Similar Threads

  1. [JCF] For those who are new to the cognitive function theory
    By wolfnara in forum Myers-Briggs and Jungian Cognitive Functions
    Replies: 6
    Last Post: 03-25-2016, 10:34 PM
  2. Cognitive functions in use test?
    By Jacobman77 in forum Myers-Briggs and Jungian Cognitive Functions
    Replies: 0
    Last Post: 04-04-2015, 10:20 AM
  3. [ENFP] ENFPs cognitive functions test results :)
    By mackie in forum The NF Idyllic (ENFP, INFP, ENFJ, INFJ)
    Replies: 104
    Last Post: 09-01-2010, 01:59 PM
  4. [JCF] Leanor Thomson's Theory and INFP cognitive functions
    By heart in forum The NF Idyllic (ENFP, INFP, ENFJ, INFJ)
    Replies: 1
    Last Post: 08-31-2007, 01:07 AM
  5. Cognitive Functions Test?
    By MerkW in forum Online Personality Tests
    Replies: 3
    Last Post: 08-23-2007, 04:01 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Single Sign On provided by vBSSO