In this analysis, I will investigate whether enabling the “skip” functionality in Minno.js is responsible for repetitions of questions and tasks in our data. To explore this, I modified the script at the beginning of the demo.genderscience study to randomly assign whether the “skip” functionality is activated for each participant. Specifically, I introduced a skipped variable, which is set to either ‘true’ or ‘false’ based on a random condition (50% chance).
Here’s the JavaScript code I used:
var skip = 'false';
if (Math.random() < 0.5) {
skip = 'true';
API.addSettings('skip', true);
}
API.save({skipped:skip});
The API.save function records the value of skipped in the dataset. This analysis will help us test whether enabling this “skip” functionality leads to duplicated questions and tasks in the data.
towork <- "C:/Users/Yoav/Documents/gdrive.mail.tau.ac.il/Other computers/My Laptop"
source(paste0(towork, "/work/resources/stasExamples/R/yba.funcs.R"))
# The data's folder (directory)
dir.to.data = "C:\\Users\\Yoav\\OneDrive - Tel-Aviv University\\Documents\\bigfiles\\"
dir.raw = paste0(dir.to.data, "demo.studies\\demo.genderscience.0003.skip\\raw")
dir.out <- paste0(dir.to.data, "demo.studies\\demo.genderscience.0003.skip\\processed")
explicit.raw <- read.table(paste(dir.raw, "explicit.txt", sep = "\\"), sep = "\t", header = TRUE, fill = TRUE, quote = NULL, comment = "")
exp.sids <- explicit.raw$session_id[which(explicit.raw$question_name == "skipped")]
exp.raw <- explicit.raw[which(explicit.raw$session_id %in% exp.sids), ]
cat("number of unique sessions in the explicit table:", length(unique(exp.raw$session_id)))
## number of unique sessions in the explicit table: 5691
skipped <- exp.raw[which(exp.raw$question_name == "skipped"), c("session_id", "question_response")]
names(skipped)[names(skipped) == "question_response"] <- "skipped"
If the ‘skip’ functionality is responsible for duplicated questions, we should not observe any duplicate entries for the variable ‘skipped’, which only contain the value ‘false’.
# Count duplicate session_id in rows that recorded the 'skipped' variable
dups <- setDF(setDT(skipped)[, if (.N > 1L) .SD, by = .(session_id)])
dup.sids <- unique(dups$session_id)
skipped.dups <- setDF(skipped)[which(skipped$session_id %in% dup.sids), ]
# Group by session_id and check if all values of skipped are 'false'
false_only_sessions <- skipped.dups %>%
group_by(session_id) %>%
dplyr::summarize(all_false = all(skipped == "false"), num_repetitions = n() # Count the number of repetitions for each session
)
# Compute the expected likelihood of having only 'false' by chance (0.5^n)
false_only_sessions <- false_only_sessions %>%
mutate(expected_all_false = 0.5^num_repetitions)
# Count how many sessions have only 'false' in all their repetitions
false_only_count <- sum(false_only_sessions$all_false)
# Get the number of unique sessions in skipped.dups
unique_session_count <- n_distinct(skipped.dups$session_id)
# Compute the expected number of sessions with only 'false'
expected_false_only_count <- sum(false_only_sessions$expected_all_false)
Total number of unique sessions with at least 2 recordings of the ‘skipped’ variable: 251
Number of sessions that included at least 2 recordings of the ‘skipped’ variable, but always with the value ‘false’: 48
Expected number of sessions with only ‘false’ (by chance): 55.890625
We found that there are duplicated sessions with only “skipped = false” values. Further, the number of only-false cases fits the expected likelihood if skipping does not increase the likelihood of repetition.
Still, hereafter, in the rest of the analyses, we will consider any session that had “skipped=true” even once as a session in the “skipped=true” condition.
# Group by session_id and summarize the skipped column If any 'skipped' value is 'true', mark the whole session as 'true', otherwise 'false'
skipped_summary <- skipped %>%
group_by(session_id) %>%
dplyr::summarize(skipped = ifelse(any(skipped == "true"), "true", "false")) %>%
ungroup()
# View the resulting data.frame head(skipped_summary)
# Count dups
for.dup <- exp.raw[, c("session_id", "attempt", "questionnaire_name", "question_name", "question_response")]
for.dup <- setDF(setDT(for.dup)[, nReps := 1:.N, by = c("session_id", "questionnaire_name", "question_name")])
rep.qst.sids <- unique(for.dup$session_id[which(for.dup$nReps > 1)])
for.dup <- setDF(setDT(for.dup)[, nResps := 1:.N, by = c("session_id", "questionnaire_name", "question_name", "question_response")])
rep.resp.sids <- unique(for.dup$session_id[which(for.dup$nResps > 1)])
multi.attempt.sids <- unique(for.dup$session_id[which(for.dup$attempt > 1)])
reps <- data.frame(session_id = unique(for.dup$session_id))
reps$rep.attempt <- ifelse(reps$session_id %in% multi.attempt.sids, T, F)
reps$rep.qst <- ifelse(reps$session_id %in% rep.qst.sids, T, F)
reps$rep.resp <- ifelse(reps$session_id %in% rep.resp.sids, T, F)
reps <- merge(reps, skipped_summary, by = "session_id")
attempts <- for.dup %>%
group_by(session_id) %>%
dplyr::summarise(m.attempt = mean(attempt, na.rm = TRUE), n.attempt.gt1 = sum(attempt > 1, na.rm = TRUE))
reps <- merge(reps, attempts, by = "session_id")
We will examine a few by-session variables:
mysumBy(rep.qst + rep.resp + rep.attempt + m.attempt + n.attempt.gt1 ~ skipped, dt = reps)
Any session that had repeated question also had a repeated response. Although there were repetitions even without having “skip” enabled, the results suggest that enabling skipping increased the likelihood of duplicate questions.
# Convert logical columns to numeric (TRUE -> 1, FALSE -> 0)
reps_numeric <- reps %>%
mutate(across(c(rep.qst, rep.resp, rep.attempt, m.attempt, n.attempt.gt1), as.numeric)) %>%
# Convert 'skipped' to numeric ('true' -> 1, 'false' -> 0)
mutate(skipped = ifelse(skipped == "true", 1, 0))
Let’s explore the relations between the different repetition variables (well, between having many repeating questions and attempt with values higher than 1).
my.htmlTable(cornp(reps_numeric[, c("rep.qst", "rep.resp", "rep.attempt", "m.attempt", "n.attempt.gt1", "skipped")]))
| varName____ | rep.qst____ | rep.resp____ | rep.attempt____ | m.attempt____ | n.attempt.gt1____ | |
|---|---|---|---|---|---|---|
| 1 | rep.resp |
1 < .001 5691 |
||||
| 2 | rep.attempt |
0.014 0.304 5691 |
0.014 0.304 5691 |
|||
| 3 | m.attempt |
0.218 < .001 5691 |
0.218 < .001 5691 |
0.143 < .001 5691 |
||
| 4 | n.attempt.gt1 |
0.222 < .001 5691 |
0.222 < .001 5691 |
0.101 < .001 5691 |
0.772 < .001 5691 |
|
| 5 | skipped |
0.13 < .001 5691 |
0.13 < .001 5691 |
-0.01 0.449 5691 |
0.037 0.005 5691 |
0.035 0.008 5691 |
Having repeated question was more likely when having attempt value above 1, but the relation was quite small. So, the attempt variable is probably not useful for investigating or detecting duplicates.
To verify these results, we will use different processing to compute similar variables.
for.dup2 <- exp.raw[, c("session_id", "attempt", "questionnaire_name", "question_name", "question_response")]
for.dup2 <- for.dup2 %>%
arrange(session_id, questionnaire_name, question_name) %>%
group_by(session_id, questionnaire_name, question_name) %>%
mutate(iRep = row_number() - 1, nReps = n() - 1, isRep = nReps > 1) %>%
ungroup()
for.dup2 <- for.dup2 %>%
arrange(session_id, questionnaire_name, question_name, question_response) %>%
group_by(session_id, questionnaire_name, question_name, question_response) %>%
mutate(iRepResp = row_number() - 1, nRepsResp = n() - 1, isRepResp = nRepsResp > 1) %>%
ungroup()
for.dup2 <- for.dup2[order(for.dup2$session_id, for.dup2$questionnaire_name, for.dup2$question_name, for.dup2$attempt, for.dup2$question_response), ]
How many of the all the rows in the explicit table where a repetition?
isRep = For each row in the explicit table, does its combination of [session_id, questionnaire_name, question_name] appear in other rows? If yes, that’s a repetition of the question.
isRepResp = The same as isRep, but the combination also includes the exact response (i.e., [session_id, questionnaire_name, question_name, question_response])
repsB <- merge(for.dup2, skipped_summary, by = "session_id", all.x = T)
mysumBy(isRep + isRepResp ~ skipped, dt = repsB, round = 4)
Without skipping, repetition occurred in 0.001% of the rows. With skipping, repetition occurred in 0.02% of the rows.
Let’s see how common each attempt value was in the explicit table.
my.freq(for.dup2$attempt)
Next, we will summarize the repetition variable by session_id, and the results are supposed to be very similar to what we saw previously.
# Convert logical columns to numeric (TRUE -> 1, FALSE -> 0)
reps2 <- for.dup2 %>%
group_by(session_id) %>%
dplyr::summarise(
m.attempt = mean(attempt, na.rm = TRUE),
n.attempt.gt1 = sum(attempt > 1, na.rm = TRUE),
m.nReps = mean(nReps, na.rm = TRUE),
n.reps = sum(nReps > 0, na.rm = TRUE),
any.rep = m.nReps > 0,
m.nRepsResp = mean(nRepsResp, na.rm = TRUE),
n.repsResp = sum(nRepsResp > 0, na.rm = TRUE),
any.repResp = m.nRepsResp > 0
)
reps2 <- merge(reps2, skipped_summary, by='session_id')
knitr::kable(mysumBy(n.reps + any.rep + n.repsResp + any.repResp + m.attempt + n.attempt.gt1 ~ skipped, dt = reps2))
| var | skipped | n | M | SD | SE | med |
|---|---|---|---|---|---|---|
| n.reps | false | 2787 | 0.406 | 7.305 | 0.138 | 0.000 |
| n.reps | true | 2904 | 2.083 | 16.149 | 0.300 | 0.000 |
| any.rep | false | 2787 | 0.018 | 0.133 | 0.003 | 0.000 |
| any.rep | true | 2904 | 0.072 | 0.259 | 0.005 | 0.000 |
| n.repsResp | false | 2787 | 0.233 | 3.625 | 0.069 | 0.000 |
| n.repsResp | true | 2904 | 0.969 | 6.785 | 0.126 | 0.000 |
| any.repResp | false | 2787 | 0.018 | 0.133 | 0.003 | 0.000 |
| any.repResp | true | 2904 | 0.072 | 0.259 | 0.005 | 0.000 |
| m.attempt | false | 2787 | 1.673 | 0.291 | 0.006 | 1.699 |
| m.attempt | true | 2904 | 1.696 | 0.305 | 0.006 | 1.714 |
| n.attempt.gt1 | false | 2787 | 25.525 | 15.665 | 0.297 | 32.000 |
| n.attempt.gt1 | true | 2904 | 26.661 | 16.538 | 0.307 | 33.000 |
The results are the same as in the previous method: having the “skip” option increased the likelihood of a repeated question (an effect of about d = 0.25 on any.rep), but repeated questions (and responses) were still possible even without the “skip” option.
qreps <- for.dup2 %>%
group_by(questionnaire_name) %>%
dplyr::summarise(
m.attempt = mean(attempt, na.rm = TRUE),
n.attempt.gt1 = sum(attempt > 1, na.rm = TRUE),
m.nReps = mean(nReps, na.rm = TRUE),
n.reps = sum(nReps > 1, na.rm = TRUE),
m.nRepsResp = mean(nRepsResp, na.rm = TRUE),
n.repsResp = sum(nRepsResp > 1, na.rm = TRUE)
)
For each questionnaire we computed:
knitr::kable(qreps[order(-qreps$m.nReps), ])
| questionnaire_name | m.attempt | n.attempt.gt1 | m.nReps | n.reps | m.nRepsResp | n.repsResp |
|---|---|---|---|---|---|---|
| mgr | 1.623097 | 6228 | 0.1326233 | 322 | 0.1001747 | 224 |
| ageCheck | 1.016317 | 320 | 0.0326340 | 48 | 0.0157343 | 24 |
| race | 2.136439 | 54886 | 0.0187882 | 54 | 0.0088210 | 24 |
| demographics | 2.734115 | 79644 | 0.0166533 | 0 | 0.0078833 | 0 |
| explicits | 1.007853 | 476 | 0.0161963 | 0 | 0.0054689 | 0 |
| iat | 1.010606 | 66 | 0.0141414 | 0 | 0.0033670 | 0 |
| debriefing | 1.336515 | 6940 | 0.0105012 | 0 | 0.0044869 | 0 |
| under18 | 1.000000 | 0 | 0.0000000 | 0 | 0.0000000 | 0 |
It makes sense that duplicates would be detected more often in earlier stages of the study (mgr and ageCheck).
qqreps <- for.dup2 %>%
group_by(question_name) %>%
dplyr::summarise(
m.attempt = mean(attempt, na.rm = TRUE),
n.attempt.gt1 = sum(attempt > 1, na.rm = TRUE),
m.nReps = mean(nReps, na.rm = TRUE),
n.reps = sum(nReps > 1, na.rm = TRUE),
m.nRepsResp = mean(nRepsResp, na.rm = TRUE),
n.repsResp = sum(nRepsResp > 1, na.rm = TRUE)
)
For each question we computed:
knitr::kable(qqreps[order(-qqreps$m.nReps), ])
| question_name | m.attempt | n.attempt.gt1 | m.nReps | n.reps | m.nRepsResp | n.repsResp |
|---|---|---|---|---|---|---|
| gaySet | 1.500000 | 1 | 1.0000000 | 0 | 1.0000000 | 0 |
| genderIdentity_0002otherrt | 2.545454 | 6 | 0.1818182 | 0 | 0.0000000 | 0 |
| isTouch | 1.675986 | 3419 | 0.1331336 | 161 | 0.1331336 | 161 |
| skipped | 1.570525 | 2808 | 0.1318901 | 161 | 0.0669442 | 63 |
| genderIdentity_0002other | 2.142857 | 10 | 0.0952381 | 0 | 0.0000000 | 0 |
| raceomb_003sub_blackrt | 2.292035 | 257 | 0.0412979 | 3 | 0.0000000 | 0 |
| raceomb_003sub_black | 2.316384 | 268 | 0.0395480 | 3 | 0.0395480 | 3 |
| raceomb_003sub_hispanicotherrt | 2.675214 | 96 | 0.0341880 | 0 | 0.0000000 | 0 |
| raceomb_003sub_hispanicother | 2.650000 | 98 | 0.0333333 | 0 | 0.0333333 | 0 |
| birthmonth | 1.016317 | 80 | 0.0326340 | 12 | 0.0310800 | 12 |
| birthmonthrt | 1.016317 | 80 | 0.0326340 | 12 | 0.0007770 | 0 |
| birthyear | 1.016317 | 80 | 0.0326340 | 12 | 0.0303030 | 12 |
| birthyearrt | 1.016317 | 80 | 0.0326340 | 12 | 0.0007770 | 0 |
| raceomb_003sub_middleeastrt | 2.357143 | 60 | 0.0238095 | 0 | 0.0000000 | 0 |
| raceomb_003sub_hispanicrt | 2.439759 | 384 | 0.0200803 | 0 | 0.0000000 | 0 |
| raceomb_003sub_middleeast | 2.460784 | 76 | 0.0196078 | 0 | 0.0196078 | 0 |
| raceomb_003sub_hispanic | 2.446602 | 396 | 0.0194175 | 0 | 0.0194175 | 0 |
| raceomb_003sub_whitert | 2.280000 | 2028 | 0.0190826 | 0 | 0.0000000 | 0 |
| raceomb_003_other | 2.251561 | 3223 | 0.0189595 | 3 | 0.0184971 | 3 |
| raceomb_003_otherrt | 2.251561 | 3223 | 0.0189595 | 3 | 0.0000000 | 0 |
| raceomb_003_white | 2.251561 | 3223 | 0.0189595 | 3 | 0.0180347 | 3 |
| raceomb_003_whitert | 2.251561 | 3223 | 0.0189595 | 3 | 0.0000000 | 0 |
| raceomb_003_asian | 1.961858 | 2589 | 0.0189552 | 3 | 0.0184928 | 3 |
| raceomb_003_asianrt | 1.961858 | 2589 | 0.0189552 | 3 | 0.0000000 | 0 |
| raceomb_003_black | 1.961858 | 2589 | 0.0189552 | 3 | 0.0184928 | 3 |
| raceomb_003_blackrt | 1.961858 | 2589 | 0.0189552 | 3 | 0.0000000 | 0 |
| raceomb_003_hispanic | 2.131993 | 3012 | 0.0189552 | 3 | 0.0180305 | 3 |
| raceomb_003_hispanicrt | 2.131993 | 3012 | 0.0189552 | 3 | 0.0000000 | 0 |
| raceomb_003_middleeast | 2.131993 | 3012 | 0.0189552 | 3 | 0.0180305 | 3 |
| raceomb_003_middleeastrt | 2.131993 | 3012 | 0.0189552 | 3 | 0.0000000 | 0 |
| raceomb_003_native | 1.961858 | 2589 | 0.0189552 | 3 | 0.0175682 | 0 |
| raceomb_003_nativert | 1.961858 | 2589 | 0.0189552 | 3 | 0.0000000 | 0 |
| raceomb_003_pacific | 2.131993 | 3012 | 0.0189552 | 3 | 0.0180305 | 3 |
| raceomb_003_pacificrt | 2.131993 | 3012 | 0.0189552 | 3 | 0.0000000 | 0 |
| raceomb_003sub_white | 2.282656 | 2051 | 0.0188679 | 0 | 0.0145138 | 0 |
| postcodelongrt | 3.035539 | 2697 | 0.0177696 | 0 | 0.0000000 | 0 |
| postcodenowrt | 2.929841 | 2643 | 0.0177696 | 0 | 0.0000000 | 0 |
| postcodelong | 3.035474 | 2701 | 0.0177370 | 0 | 0.0159021 | 0 |
| postcodenow | 2.929358 | 2648 | 0.0177370 | 0 | 0.0159021 | 0 |
| countrycit003 | 2.861353 | 3423 | 0.0165055 | 0 | 0.0165055 | 0 |
| countrycit003rt | 2.861353 | 3423 | 0.0165055 | 0 | 0.0004716 | 0 |
| countryres003 | 2.861353 | 3423 | 0.0165055 | 0 | 0.0165055 | 0 |
| countryres003rt | 2.861353 | 3423 | 0.0165055 | 0 | 0.0004716 | 0 |
| edu | 2.942466 | 3472 | 0.0165055 | 0 | 0.0165055 | 0 |
| edurt | 2.942466 | 3472 | 0.0165055 | 0 | 0.0004716 | 0 |
| occuSelf | 2.966282 | 3506 | 0.0165055 | 0 | 0.0160340 | 0 |
| occuSelfrt | 2.966282 | 3506 | 0.0165055 | 0 | 0.0004716 | 0 |
| politicalid7 | 2.637992 | 3248 | 0.0164978 | 0 | 0.0146123 | 0 |
| politicalid7rt | 2.637992 | 3248 | 0.0164978 | 0 | 0.0004714 | 0 |
| religion2014 | 2.637992 | 3248 | 0.0164978 | 0 | 0.0155550 | 0 |
| religion2014rt | 2.637992 | 3248 | 0.0164978 | 0 | 0.0004714 | 0 |
| religionid | 2.641999 | 3251 | 0.0164978 | 0 | 0.0136696 | 0 |
| religionidrt | 2.641999 | 3251 | 0.0164978 | 0 | 0.0004714 | 0 |
| genderIdentity_0002 | 2.379359 | 2820 | 0.0164939 | 0 | 0.0160226 | 0 |
| genderIdentity_0002rt | 2.379359 | 2820 | 0.0164939 | 0 | 0.0004713 | 0 |
| num002 | 2.381715 | 2828 | 0.0164939 | 0 | 0.0113101 | 0 |
| num002rt | 2.381715 | 2828 | 0.0164939 | 0 | 0.0004713 | 0 |
| transIdentity | 2.379359 | 2820 | 0.0164939 | 0 | 0.0160226 | 0 |
| transIdentityrt | 2.379359 | 2820 | 0.0164939 | 0 | 0.0004713 | 0 |
| Larts | 1.007853 | 17 | 0.0161963 | 0 | 0.0112883 | 0 |
| Lartsrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| Lscience | 1.007853 | 17 | 0.0161963 | 0 | 0.0103067 | 0 |
| Lsciencert | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| arts | 1.007853 | 17 | 0.0161963 | 0 | 0.0103067 | 0 |
| artsrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| factorability | 1.007853 | 17 | 0.0161963 | 0 | 0.0093252 | 0 |
| factorabilityrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| factordiscrimination | 1.007853 | 17 | 0.0161963 | 0 | 0.0098160 | 0 |
| factordiscriminationrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| factorencouragement | 1.007853 | 17 | 0.0161963 | 0 | 0.0098160 | 0 |
| factorencouragementrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| factorfamily | 1.007853 | 17 | 0.0161963 | 0 | 0.0098160 | 0 |
| factorfamilyrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| factorhighpower | 1.007853 | 17 | 0.0161963 | 0 | 0.0112883 | 0 |
| factorhighpowerrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| factorinterest | 1.007853 | 17 | 0.0161963 | 0 | 0.0098160 | 0 |
| factorinterestrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| goal1 | 1.007853 | 17 | 0.0161963 | 0 | 0.0112883 | 0 |
| goal1rt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| goal2 | 1.007853 | 17 | 0.0161963 | 0 | 0.0083436 | 0 |
| goal2rt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| ran9thboys | 1.007853 | 17 | 0.0161963 | 0 | 0.0107975 | 0 |
| ran9thboysrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| ran9thgirls | 1.007853 | 17 | 0.0161963 | 0 | 0.0122699 | 0 |
| ran9thgirlsrt | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| science | 1.007853 | 17 | 0.0161963 | 0 | 0.0117791 | 0 |
| sciencert | 1.007853 | 17 | 0.0161963 | 0 | 0.0004908 | 0 |
| occuSelfDetailrt | 3.128169 | 2406 | 0.0159778 | 0 | 0.0006947 | 0 |
| occuSelfDetail | 3.131220 | 2455 | 0.0156783 | 0 | 0.0149966 | 0 |
| block3Cond | 1.010606 | 22 | 0.0141414 | 0 | 0.0050505 | 0 |
| d | 1.010606 | 22 | 0.0141414 | 0 | 0.0010101 | 0 |
| feedback | 1.010606 | 22 | 0.0141414 | 0 | 0.0040404 | 0 |
| raceomb_003sub_whiteotherrt | 2.651515 | 466 | 0.0134680 | 0 | 0.0000000 | 0 |
| raceomb_003sub_whiteother | 2.640364 | 517 | 0.0121396 | 0 | 0.0091047 | 0 |
| raceomb_003sub_asianrt | 2.298246 | 387 | 0.0116959 | 0 | 0.0000000 | 0 |
| raceomb_003sub_asian | 2.329787 | 428 | 0.0106383 | 0 | 0.0070922 | 0 |
| broughtwebsite | 1.336515 | 694 | 0.0105012 | 0 | 0.0076372 | 0 |
| broughtwebsitert | 1.336515 | 694 | 0.0105012 | 0 | 0.0009547 | 0 |
| iatevaluations | 1.336515 | 694 | 0.0105012 | 0 | 0.0066826 | 0 |
| iatevaluations001 | 1.336515 | 694 | 0.0105012 | 0 | 0.0095465 | 0 |
| iatevaluations001rt | 1.336515 | 694 | 0.0105012 | 0 | 0.0009547 | 0 |
| iatevaluations002 | 1.336515 | 694 | 0.0105012 | 0 | 0.0076372 | 0 |
| iatevaluations002rt | 1.336515 | 694 | 0.0105012 | 0 | 0.0009547 | 0 |
| iatevaluations003 | 1.336515 | 694 | 0.0105012 | 0 | 0.0085919 | 0 |
| iatevaluations003rt | 1.336515 | 694 | 0.0105012 | 0 | 0.0009547 | 0 |
| iatevaluationsrt | 1.336515 | 694 | 0.0105012 | 0 | 0.0009547 | 0 |
| raceomb_003sub_otherrt | 2.450451 | 177 | 0.0090090 | 0 | 0.0000000 | 0 |
| raceomb_003sub_other | 2.471572 | 233 | 0.0066890 | 0 | 0.0000000 | 0 |
| blackLabels | 1.000000 | 0 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceSet | 1.000000 | 0 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_asianother | 2.412698 | 49 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_asianotherrt | 2.406780 | 46 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_blackother | 2.846154 | 42 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_blackotherrt | 2.833333 | 38 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_middleeastother | 2.782609 | 19 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_middleeastotherrt | 2.809524 | 17 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_native | 2.452514 | 141 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_nativert | 2.408451 | 55 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_pacific | 2.731707 | 31 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_pacificother | 3.500000 | 4 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_pacificotherrt | 3.333333 | 3 | 0.0000000 | 0 | 0.0000000 | 0 |
| raceomb_003sub_pacificrt | 2.730769 | 21 | 0.0000000 | 0 | 0.0000000 | 0 |
| under18 | 1.000000 | 0 | 0.0000000 | 0 | 0.0000000 | 0 |
| under18rt | 1.000000 | 0 | 0.0000000 | 0 | 0.0000000 | 0 |
| whiteLabels | 1.000000 | 0 | 0.0000000 | 0 | 0.0000000 | 0 |
Nothing to see here, probably (beyond: questions in earlier stages of the studies were more likely to be repeated).
Let’s see whether we can learn anything about participants whose “skipped” variable was always “false”
Number of relevant sessions
length(unique(skipped_summary$session_id[which(skipped_summary$skipped == "false")]))
## [1] 2787
for.dup3 <- exp.raw[which(explicit.raw$session_id %in% unique(skipped_summary$session_id[which(skipped_summary$skipped == "false")])), c("session_id", "questionnaire_name", "question_name", "question_response")]
for.dup3 <- for.dup3 %>%
arrange(session_id, questionnaire_name, question_name) %>%
group_by(session_id, questionnaire_name, question_name) %>%
mutate(iRep = row_number() - 1, nReps = n() - 1, isRep = nReps > 1) %>%
ungroup()
for.dup3 <- for.dup3 %>%
arrange(session_id, questionnaire_name, question_name, question_response) %>%
group_by(session_id, questionnaire_name, question_name, question_response) %>%
mutate(iRepResp = row_number() - 1, nRepsResp = n() - 1, isRepResp = nRepsResp > 1) %>%
ungroup()
for.dup3 <- for.dup3 %>%
mutate(repNoResp = iRepResp < iRep)
for.dup3 <- for.dup3[order(for.dup3$session_id, for.dup3$questionnaire_name, for.dup3$question_name, for.dup3$question_response), ]
How many repeated rows these subset of participants had?
my.freq(for.dup3$nReps[which(for.dup3$iRep == 0)])
Very few.
In which questionnaires did they have repetitions?
my.freq(for.dup3$questionnaire_name[which(for.dup3$nReps > 0 & for.dup3$iRep == 1)])
All of the questionnaires.
Let’s see whether the responses were sometimes different. If not, then maybe it was somehow the same data sent twice by the browser.
my.freq(for.dup3$repNoResp[which(for.dup3$iRep > 0)])
Yes, the response was sometimes different. So, this is an actual repetition of the question, and we don’t know why. At least it occurred only in 0.6% of the rows in the explicit table.
Let’s test whether particular client setups are more likely to produce duplicate data.
ss <- read.table(paste(dir.raw, "sessions.txt", sep = "\\"), sep = "\t", header = TRUE, fill = TRUE)
library(uaparserjs)
parsed_ua <- ua_parse(ss$user_agent)
parsed_ua$session_id <- ss$session_id
reps3 <- merge(reps2, parsed_ua, by = "session_id")
Let’s examine repetitions by browser:
ttt <- mysumBy(n.reps + any.rep ~ ua.family, dt = reps3)
ttt[which(ttt$n > 100), ]
There are repeated question in all of the browsers.
By the skipped feature:
ttt <- mysumBy(n.reps + any.rep ~ skipped + ua.family, dt = reps3)
knitr::kable(ttt[which(ttt$n > 100), ])
| var | skipped | ua.family | n | M | SD | SE | med | |
|---|---|---|---|---|---|---|---|---|
| 2 | n.reps | false | Chrome | 1724 | 0.447 | 7.761 | 0.187 | 0 |
| 6 | n.reps | false | Edge | 592 | 0.517 | 8.679 | 0.357 | 0 |
| 12 | n.reps | false | Safari | 330 | 0.097 | 0.863 | 0.047 | 0 |
| 14 | n.reps | true | Chrome | 1873 | 2.166 | 16.327 | 0.377 | 0 |
| 18 | n.reps | true | Edge | 506 | 1.036 | 10.696 | 0.475 | 0 |
| 20 | n.reps | true | Firefox | 118 | 1.831 | 16.783 | 1.545 | 0 |
| 24 | n.reps | true | Safari | 358 | 3.028 | 21.024 | 1.111 | 0 |
| 27 | any.rep | false | Chrome | 1724 | 0.019 | 0.135 | 0.003 | 0 |
| 31 | any.rep | false | Edge | 592 | 0.012 | 0.108 | 0.004 | 0 |
| 37 | any.rep | false | Safari | 330 | 0.015 | 0.122 | 0.007 | 0 |
| 39 | any.rep | true | Chrome | 1873 | 0.072 | 0.258 | 0.006 | 0 |
| 43 | any.rep | true | Edge | 506 | 0.045 | 0.209 | 0.009 | 0 |
| 45 | any.rep | true | Firefox | 118 | 0.068 | 0.252 | 0.023 | 0 |
| 49 | any.rep | true | Safari | 358 | 0.089 | 0.286 | 0.015 | 0 |
If some of the skippipng was due to somekind of non-human behavior, perhaps we will see that in failure to complete the IAT reasonably.
iat <- read.table(paste(dir.raw, "iat.txt", sep = "\\"), sep = "\t", header = TRUE, fill = TRUE)
iat.errs <- iat %>%
group_by(session_id) %>%
summarise(iat.rows = n(), iat.err = mean(trial_error, na.rm = TRUE))
reps4 <- merge(reps2, iat.errs, by = "session_id", all.x = T)
reps4$anyIAT <- !is.na(reps4$iat.rows) & reps4$iat.rows > 0
IAT performance and number of recorded rows, by the “skip” functionality, and by whether the session had any repetition (any.rep):
knitr::kable(mysumBy(iat.err + iat.rows + anyIAT ~ skipped + any.rep, dt = reps4))
| var | skipped | any.rep | n | M | SD | SE | med |
|---|---|---|---|---|---|---|---|
| iat.err | false | FALSE | 1987 | 0.079 | 0.067 | 0.001 | 0.061 |
| iat.err | false | TRUE | 48 | 0.097 | 0.101 | 0.014 | 0.066 |
| iat.err | true | FALSE | 1974 | 0.079 | 0.065 | 0.001 | 0.066 |
| iat.err | true | TRUE | 175 | 0.079 | 0.059 | 0.004 | 0.061 |
| iat.rows | false | FALSE | 1987 | 188.304 | 31.397 | 0.600 | 196.000 |
| iat.rows | false | TRUE | 48 | 210.750 | 59.308 | 8.387 | 196.000 |
| iat.rows | true | FALSE | 1974 | 188.227 | 31.828 | 0.613 | 196.000 |
| iat.rows | true | TRUE | 175 | 219.246 | 71.221 | 4.915 | 196.000 |
| anyIAT | false | FALSE | 2737 | 0.726 | 0.446 | 0.009 | 1.000 |
| anyIAT | false | TRUE | 50 | 0.960 | 0.198 | 0.028 | 1.000 |
| anyIAT | true | FALSE | 2694 | 0.733 | 0.443 | 0.009 | 1.000 |
| anyIAT | true | TRUE | 210 | 0.833 | 0.374 | 0.026 | 1.000 |
The IAT error rate was not much higher (if at all) in sessions with repetitions. The number of rows was higher among in sessions with repetitions, probably due to repetitions in taking the IAT. anyIAT means that there were any IAT rows. When repetition occurred, there was a higher likelihood of having any IAT rows.
Repetition of questions occurred in about 7% of the sessions that had the “skip” feature enabled, and almost 2% of the sessions that did not have the “Skip” feature enabled.
I did not find any clue regarding the reason for the duplicates when “skip” is not enabled (no relation to system configuration, or to IAT performance).