Interruption Lab Study
Abstract
One's performance on C-TOC and other similar applications will likely be affected by interruptions and distractions. This study examines the nature of disruptions caused by interruptions in the C-TOC test-taker's environment, and how these disruptions interact with the age of the test-taker. Previous research relating to interruptions in the field of human-computer interaction (HCI) focuses predominantly on younger adults in workplace settings. A body of HCI research on the design of information and communication technology for older adults is similarly well established, yet is disjoint from research pertaining to interruptions and distractions. This research project attempts to unify these research areas. A decline in higher cognitive functioning and prospective memory in old age has been well documented in the cognitive psychology literature. As such, it is hypothesized that the effect of interruptions may interact with the age of the test-taker. Our mixed experimental design involves 3 factors: a between-subjects factor of age and a within-subjects factor of interruption demand. Our dependent measures include performance scores (i.e. completion time and accuracy) on a small subset of tests in the C-TOC battery. We will be recruiting an equal number of cognitively-healthy participants from 3 age groups: 19-54, 55-69, and more than 70 years old. Throughout each of the C-TOC tests used in this study, participants will experience three levels of interruption demand in the form of 20-second distraction tasks. Our aim is to learn how interruption demand interacts with age. We will additionally rely on self-reports to acquire an understanding of task resumption strategies employed by participants in the three age groups. This knowledge will hopefully guide the design of features in C-TOC pertaining to interruption mitigation and task resumption. Such knowledge would also be an important contribution to the broader HCI community, especially valuable for those designing tools for older adults.
Related work
See ongoing notes added to the
literature review page.
Experiment
- 3 levels of age (young vs. young-old vs. old-old, BS)
- 3 levels of interruption complexity (no interruption vs. low-demand vs. high-demand, ordering counterbalanced WS)
- 2 main tasks: (verbal: Sentence Comprehension Task (SC) vs. non-verbal: Square Puzzles (SqP), ordering counterbalanced WS)
- 8 trials in SqP, 10 trials in SC, 3 banks of isomorphic trials, which increase in difficulty;
- difficulty is gauged by number of actions required and complexity of instruction for SC puzzles, gauged by number of lines moved for SqP puzzles;
- ordering of 3 task banks randomly assigned; subsets of trials in which interruptions occur are semi-randomly generated, with the constraint that interruptions are spread across medium and hard trials; no interruptions occur on easy trials (1st trial in each bank);
- our dependent measures are speed and accuracy scores from the subset of interruption trials for each participant (4 in SC, 3 in SqP); dependent measures from other trials are not considered in this analysis
- for both tasks, interruptions occur on the same subset of trials between interruption conditions;
- in SC trial banks, interruptions occur after a fixed onset of time as a function of task difficulty;
- in SqP trial banks, interruptions occur after a short delay (500ms) following the first move in 2-move trials; for 3-move puzzles, there is a 50% probability of the interruption being triggered by the first move, and a 50% probability of the interruption being triggered by the second move;
Planned Analyses
Analysis 1: To determine the local effects of interruption disruption (specific to interrupted trials), we are to conduct a 3 (age, BS) x 3 (interruption demand, WS) x [3,4] (trial, WS) mixed-factor ANOVA for both SC and
SqP. Main effect of trial is not interesting, neither are interactions of trial with age or interruption demand.
- local performance measures - only on subset of trials that are interrupted compared with control condition; remainder of uninterrupted trial scores discarded;
- 3 x 4 = 12 data points per participant in SC, 3 x 3 = 9 data points per participant in SqP
- DV1: trial completion time;
- DV2: trial accuracy score, based on predefined scoring scheme;
- ?DV3: task resumption time vs. avg. interaction interval;?
Analysis 2: To determine the global effects of interruption disruption, we are to conduct a 3 (age, BS) x 3 (interruption demand, WS) x 2 (main task, WS) mixed-factor ANOVA. Main effect of task is expected, not interesting; interaction effects of task with age, interruption demand are interesting.
- global performance measures, collapsed across all trials in a bank of trials
- 6 data points per participant (3 from each task)
- DV1: total completion time for each bank of trials - normalized to compare total completion time between SC and SqP trial banks; (if normally distributed, normalize using z-scores; else, subtract mean from each total completion time)
- DV2: total accuracy score for each bank of trials, score out of 100%;
- ?DV3: avg task resumption time vs. avg. interaction interval;?
Potential covariates, other recorded DVs:
- other recorded measures
- # moves, # adjustments, # clicks, # aborted moves, # invalid moves, SC instruction read time (SC only), first-action time, interruption dismissal time, resumption lag time (time to first completed action after interruption),interaction interval before interruption, interaction interval after interruption, total interaction time;
- for high-demand n-back interruptions: # correct, # incorrect, # false positives, # false negatives, # true positives, # true negatives;
- NASA-TLX workload assessment survey scores, administered after each bank of trials (fatigue, cognitive demand, physical demand, annoyance)
- MoCA test score - participant is excluded if scores less than 26
- English language test score - using the NAART (North American Adult Reading Test, [Uttl 02]); participant is excluded if less than 50% of words on page 1 are pronounced incorrectly;
Notes:
- CJ: timing: advise against using instructions in time to complete - tests reading comprehension rather than task at hand (i.e. pattern construction) opportunity to separate criteria: item completion and task completion (including items, transitions, breaks in between, etc.)
Other Factors (unsupported or for future work)
- interruption type/modality (considered in addition to similarity/complexity): computerised (to simulate email / IM alert, antivirus notification, software update, browser crash (total screen change)) vs. uncomputerised (to simulate a telephone call, a caller at the door, a conversation with someone in the room, an alarm ringing, an errand or chore) - the latter forcing one away from the screen
- pop-up interruptions requiring simple vs. complex response (to simply dismiss interrupting pop-ups (easy) vs. pop-ups requiring responses to urgent forced-choise questions (difficult)).
- dimension: verbal (spoken) vs. written vs. interactive interruption
- unsupported: 2-3 levels of interruption frequency (no interruption/control, infrequent interruption, frequent interruption) (not sure if frequent interruption will be an issue for 1/2 hour cognitive test)
- partially supported: 2-3 levels of interruption length (short, med, long), (no support for length from [Gillie 89])
- task instruction (emphasis on speed vs. emphasis on accuracy)
- length of interruption lag (currently fixed at 2s)
Participants
- 3 age groups (19-55),(56-69),(70+) - rationale for 2-3 groups from [Moffatt 10], CJ
- CJ: young-old vs. old-old : age dif @ 70 in literature / 65 for AD/MCI research
Tasks
2 adapted and programmed C-TOC cognitive tests (one verbal, one nonverbal);
Computerized Interruption tasks: one low-demand (passive n-back); the other high-demand (active n-back); interruptions have a fixed onset for SC tasks, depending on the difficulty of the SC task (number of moves and complexity of instruction taken into consideration), identical between 3 isomorphic SC task sets); interruptions occur in SqP after short random interval after moving 1st line (or with some probability after 2nd line in 3-line puzzles); 3 SqP task banks also isomorphic;
- interrupting listing task used in [Farrimond 06] - one which would conflict with verbal memory), (puzzle task - sorting or arranging for nonverbal interruption)
- [Storch 92] shows on-screen interruptions more disruptive than phone/visitor interruptions with data entry tasks on GUI and CUI interfaces (GUI facilitating simultaneous execution of main task during interruption);
- phone interruption shows no difference from control group; screen interruption (similar modality) shows worst disruption, then walk-in interruption
- a simulated office - door, phone, desk, computer, etc.
- GUI allows users to work simultaneously on main task during an interruption
- [Speier 03] - did not vary simplicity/complexity of interrupting tasks, but main tasks, varied interruption condition BS; all interruptions were data acquisition tasks that could be carried out regardless of the interface/task complexity of the main task;
- models of interruptions identified; did not control social characteristics of interruption - will this factor be worse for older adults?
Materials
- 2 adapted C-TOC tests; verbal: sentence comprehension (SC); non-verbal: square puzzles (SqP)
Procedure
- Review and Sign Consent form
- Administer MoCA and NAART tests
- Demonstrate interruption practice tasks
- experimental tasks A & B, counterbalanced between task types, 3 banks of each counterbalanced on interruption demand
- NASA-TLX questionnaire administered between each bank
- Brief interview re: strategies and perceptions
- Compensation
Note:
- Regarding the state of the software, I have automated the main task ordering, the interruption sequence for 3 task sets (for each of the 2 main tasks), and the subset of tasks within each task set on which interruptions occur, which is completed using an initialization screen by the experimenter. This is followed by the main portal UI to the 12 possible interactive states (SC demo, SC training, SC_{A,B,C}, SqP demo, SqP training set, SqP_{A,B,C}, low-demand interrupt demo and high-demand interrupt demo). If the main task-order is set SC-SqP on it initialization screen, SqP options on the main portal UI are disabled until all SC states are completed, and vice versa; also after a task set is completed its portal link becomes disabled. The main 12-button portal is needed to allow users to experience the demo states more than once, and to permit users a short break between task sets, and to serve as visual indication to the user of what remains to be done. If you want I can show you this before our next meeting at your convenience.
As per subject-task-interrupt sequence/ordering in my experiment, the following factors are involved:
- 2 main tasks (SC and
SqP) - important to counterbalance within each age group
- 3 trial banks (ABC) orderings for both main tasks, randomly assigned at run-time
- 3 interruption conditions (none, low, high) with 6 possible permutations, also important to counterbalance within each age group
- subsets of trials within each task set on which interruptions occur (i.e. interruptions on trials #3,5,7,9 or interruptions on trials #2,4,6,8), are randomly assigned, with the restriction that interruptions are spread across difficulty levels in both tasks
There are 12 unique orderings for each subject group, see the attached .xls. It will be necessary for the experimenter to refer to this lookup table while completing the initialization screen before the participant begins the experiment.
Hypotheses
Age-interaction effects in H1-4 are explained by theories of cognitive ageing: a decrease in working memory [Craik 82], [Hasher 88] - inhibition theory, a loss of sensory acuity [Lindenberger 94], a drop in processing speed [Salthouse 96]; increased distractibility and interruptibility in old age, resulting from decreased ability to suppress some stimuli and enhance others;
- H1: Age interaction effects for speed and accuracy; Older adults performing proportionally worse on interruption conditions
- H2: High-demand interruptions will incur worse performance than low-demand interruptions; Possible age interaction or age-related compensation (attention enhancement and suppression) low-demand interruptions may cause mind to wander, become distracted with internal thoughts; high-demand interrupting tasks to main task will have a greater negative effect (in terms of TWT, TOT, TOI, and error rate) than low-demand interrupting tasks.
- following from [Gillie 89], interruptions that are similar in nature to the main task being performed will have a greater negative effect (again in terms of TWT, TOT, TOI, and error rate) than dissimilar interruptions. Once again a greater effect can be expected for older adults.
- From [Storch 92], interruptions using a similar modality (on screen vs. on the phone or in person) will have a greater detrimental effect on users.
- From [Speier 03], interruptions differ from distractions in that they exist in the same modality as the main task, leading to an overloading of cues. Distractions during complex tasks can still be detrimental to performance do to an overall cognitive overload.
- H3: Performance detriments greater for verbal task (SC) than non-verbal task (SqP); Age interaction possible; following from [Speier 03], detriments to performance following interruptions will be more severe for verbal tasks (SC) than non-verbal tasks (SqP). An interaction would exist in that older adults performance on verbal tasks following interruptions are proportionally worse than younger adults performance, compared to the performance of both groups on non-verbal tasks.
- H4: Self-reports regarding disruptive effects of interruptions to reinforce H1-H3
Former Hypotheses
- fH1: performance measures (speed): disruptions caused by interruptions will lead to greater total working time (TWT) for older adults than younger adults on time-intensive tasks. A similar difference should be found in increases in time on task (TOT). Time on interruption (TOI) differences should also be expected, as time to re-orient to and from the main task (change-over and resumption) are expected to be greater for older adults. Older adults will commit more errors and omissions than younger adults on the main task, both groups are expected commit more errors/omissions upon task resumption compared to prior to the interruption, however a greater effect is expected for older adults.
- fH2: performance measures (accuracy): disruptions caused by interruptions will lead to a decrease in TWT and TOT for younger adults (following from [Zijlstra 99]), and an increase in TWT and TOT for older adults. Similar differences in TOI and change-over/resumption times are to be expected as in H1. (Zeigarnik effect)
Implications of hypotheses
- types of prompts (how much info is too much info for C-TOC - worry about too much aid)
- [Parnin CHI 10] - relevant prompts (developer user)
- snapshots / instant replay of past actions
- task sketches - annotations of a task breakdown: steps, objectives, plans + goals (short animated clip or series of images)
- prospective cues - contextual reminders that are displayed when a condition is true;
- change summary - natural language summary of past actions in log format
- simple contextual cue of recent action: visual highlighting of last object moved/clicked (glow), display origin distinguished from current location of recently moved object
- repeat of selected instructions and speed/accuracy tradeoff - to avoid overcompensation on speed at expense of accuracy
- do older adults compensate to overcome deficiencies in attention suppression and enhancement; does interruption recovery lead to increased fatigue for older adults?
- use of interruption lag [Altmann 04], [Hodgetts 06] - do older adults make use of contextual cues from the interruption lag and resumption lag to resume a task, or does automatic encoding of task goals occur less in older age? what are the implications for prompting systems?
- from [Speier 03] spatial presentation formats able to mitigate effect of interruptions while symbolic formats were not for complex tasks;
- use of icons and graphics rather than text-based resumption prompts
- crossover point exists between perceptual and analytical processes - where is the crossover for older adults?
- graphical resumption prompts will be more effective than verbal resumption prompts, regardless of the type of main task. A proportionally greater benefit will be seen for older adults than younger adults.
- [Storch 92] - GUI better than CUI for concurrent execution of main task and interrupting task - likely not a factor with C-TOC