Downloads / Import File Verification Tool
DataShop can import a tab-delimited text file of transaction data similar to that generated by the DataShop export. While the import process can only be done by DataShop developers, you should first run this tool against your import file to verify that it's valid, and fix any errors it finds.
You may want to import data to:
- create a smaller dataset from an existing one
- rename problems or steps
- clean up an existing dataset
- add data to DataShop without creating XML
Tip: If you want to create a new domain KC model for an existing dataset in DataShop, use the KC Model export/import feature in DataShop. See the KC Model help page for more information.
Note: As opposed to our XML format, the tab-delimited format is meant to represent a single dataset within each file (as it is based off of the DataShop transaction export file format). When preparing data to send us, please provide only one file per dataset.
Download the Import File Verification Tool
-
DS_verify_java1.5_2012_0113.zip (348 KB)
Requires the Java Runtime Environment (JRE) version 1.5 or greater
Note: Java must be installed and available from the
command line. To check this, open a command prompt (Windows: Start >
Run > cmd or Mac: Applications > Utilities >
Terminal) and type: java -version If you see
something like the following, continue to with step 1 below. If you don't, make sure you
have Java installed (see Do I have Java? on
Sun's website).
java version "1.6.0_11" Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)
To verify that your import file is valid:
- Download the ZIP file above and extract its contents to your hard disk.
- Open a command prompt and navigate to the Import File Verification tool directory, which should contain both "dist" and "extlib" directories.
- Enter the following on a single line:
On Windows:
java -classpath "dist\datashop-verify.jar;extlib\log4j-1.2.13.jar;." edu.cmu.pslc.importdata.DatasetVerificationTool -filename path\to\file.txt
On Mac:
java -classpath "dist/datashop-verify.jar:extlib/log4j-1.2.13.jar:." edu.cmu.pslc.importdata.DatasetVerificationTool -filename path/to/file.txt
wherepath/to/file.txtis the path to the file you'd like to verify. The import file verification tool will run and provide information about the validity of your import file. Results from the verification are printed to the console and to an output text file calleddatashop.log.
Note: If you see an error such as Exception in thread "main"
java.lang.NoClassDefFoundError: edu/cmu/pslc/importdata/DatasetVerificationTool, make sure
your current working directory contains both the "dist" and "extlib" directories. If it does, also
ensure that the classpath is set as shown in the commands above.
Format Documentation
Import file column requirements are described in the table below, in the notes that follow it, and in more detail in the Guide to the Tutor Message Format (the XML format upon which this columnar format is based).
| Order | Column | Required? | Additional Description | Size Limit (characters) |
|---|---|---|---|---|
| 1 | Anon Student Id | * | An anonymized student identifier. Multiple Anon Student Id columns are OK. If you specify multiple columns, at least one column must have a value. | ≤ 55 |
| 2 | Session Id | * | A dataset-unique string that identifies the user's session with the tutor. If you specify multiple Anon Student Id columns, different combinations cannot have the same Session Id. | ≤ 255 |
| 3 | Time | * | Local time. Must be given in one of the following standard time formats [2] | |
| 4 | Time Zone | Local time zone ID as provided by the zoneinfo (or tz) database. Select a time zone name from the "TZ" column in this List of zoneinfo time zones. | ≤ 50 | |
| 5 | Student Response Type | A semantic description
of the event. DataShop-expected values are ATTEMPT or
HINT_REQUEST. See the corresponding "Tutor Response Type" below. |
≤ 30 | |
| 6 | Student Response Subtype | A further classification of student response type. | ≤ 30 | |
| 7 | Tutor Response Type | A semantic description of the tutor's response. DataShop-expected values are RESULT or
HINT_MSG. See the corresponding "Student Response Type" above. |
≤ 30 | |
| 8 | Tutor Response Subtype | A further classification of tutor response type. | ≤ 30 | |
| 9 | Level() | * | A Dataset Level.
An example of the correct use of this column heading is Level(Unit), where
"Unit" is the dataset level title and the value in the column is the level
name (e.g., "Understanding Fractions"). The Level column
should always be of the format Level(level_title).
The level title must be ≤ 100 characters
and consist of letters, numbers, dashes, underscores, and spaces.
If a dataset level title is
not included, it will become "Default". Multiple Level columns are OK. For additional
description, see the level
element in the Guide. In tutor-message format XML, level "title" is referred to
as "type". |
≤ 100 |
| 10 | Problem Name | * | The name of the problem or activity. | ≤ 255 |
| 11 | Problem View | The number of times the student encountered the problem so far. This counter increases with each instance of the same problem. Provide either this column or Problem Start Time, but not both. If both are provided, Problem Start Time is used to determine Problem View. A longer description of problem view, including how it is determined if it's not present in the imported data, is available here. | ||
| 12 | Problem Start Time | The time the problem is shown to the student. Must be given in one of the standard time formats [2]. Provide either this column or Problem View, but not both. If both are provided, Problem Start Time is used to determine Problem View. A longer description of problem start time, including how it is determined if it's not present in the imported data, is available here. | ||
| 13 | Step Name | The name of a discrete problem-solving step. Include a step name for a transaction if the transaction also has a Tutor Response Type and an Outcome. Otherwise, Attempt At Step will not be calculated. | ≤ 255 | |
| 14 | Attempt At Step | DataShop ignores the values in this column when processing the import file. "Attempt at Step" is computed from the rest of the transaction data, but only if Step Name is provided. | ||
| 15 | Outcome | The tutor's evaluation of the action, if applicable. DataShop prefers the
values CORRECT, INCORRECT, or HINT. |
≤ 30 | |
| 16 | Selection | * | A description of the interface element that the student selected or interacted with. Multiple Selection columns are OK. Also see Selection in the Guide. | ≤ 255 |
| 17 | Action | * | A description of the manipulation applied to the selection. Multiple Action columns are OK. | ≤ 255 |
| 18 | Input | * | The input the student submitted. Multiple Input columns are OK. Also see Input in the Guide. | ≤ 255 |
| 19 | Feedback Text | The body of a hint, success, or error message shown to the student. | ≤ 65,535 | |
| 20 | Feedback Classification | A further classification of the outcome. See action_evaluation / classification in the Guide. Note that if Feedback Classification has a value, Feedback Text must have a value as well. | ≤ 255 | |
| 21 | Help Level | Applicable only to hints, this is the current hint level/depth. If given, value must be a number. | ||
| 22 | Total Num Hints | Total number of hints available to the student for this step. If given, value must be a number. | ||
| 23 | Condition Name | A study/experimental condition. Must always be paired with Condition Type, even if a condition does not have a condition type. Multiple Condition Name columns are OK. See condition in the Guide. | ≤ 80 | |
| 24 | Condition Type | A condition classification. Must always be paired with Condition Name, even if a condition does not have a condition type. Multiple Condition Type columns are OK. If Condition Type is specified, Condition Name must have a value as well. | ≤ 255 | |
| 25 | KC() | A knowledge component.
An example of the correct use of this column heading could be KC(Area), where
'Area' is the KC
model name for that knowledge component. The KC column should always be
of the format KC(kc_model_name). The model name must be ≤ 50 characters and
consist of letters, numbers, dashes, underscores, and spaces. If a KC model name is not
included, the name will default to "Default". Multiple KC columns are OK. |
≤ 65,535 | |
| 26 | KC Category() | A knowledge component category. An example of the correct use of this column
heading could be KC Category(Area), where 'Area' is the KC model name for that
knowledge component. The KC Category column should always be of the format KC
Category(kc_model_name). The model name must be ≤ 30 characters and consist of
letters, numbers, dashes, underscores, and spaces. If a KC model name is not included, the
name will default to "Default". If including KC Category, be sure to pair it with a
corresponding KC column by using the same KC model name. (Condition Name and Type must be
paired together in the same way.) If you specify a a KC Category value, a KC value must be
given as well. Multiple KC Category columns are OK. |
≤ 50 | |
| 27 | School | The school in which the data were collected, if applicable. | ≤ 100 | |
| 28 | Class | The class in which the data were collected, if applicable. | ≤ 75 | |
| 29 | CF() | A Custom Field. Use this element to describe other contextual information or a
new variable not adequately captured by the other columns. An example of the correct use of
this column heading could be CF(Factor or add-m), where 'Factor or add-m' is
the name for that custom field. The CF column should always be of the
format CF(custom_field_name). The custom field name must be ≤ 255 characters
and consist of letters, numbers, dashes, underscores, and spaces. If a custom field name is
not included, the name will default to "Default". Multiple CF columns are OK. See also Custom
Field in the Guide. |
≤ 255 |
[1] The Import Tool expects the column headings to be in the order indicated in the table above. Placing columns in other orders can cause the import tool to fail during processing.
[2] Time must be given in one of the following formats:
| Format | Example and Notes |
|---|---|
yyyy-MM-dd HH:mm:ss | 2001-07-04 12:08:56 |
yyyy-MM-dd HH:mm:ss z | 2001-07-04 12:08:56 Pacific Standard Time |
yyyy-MM-dd HH:mm z | 2001-07-04 12:08 PST |
MMMMM dd, yyyy hh:mm:ss a z | July 04, 2001 12:08:56 AM PST ** WPI-Assistments format |
MM/dd/yy HH:mm:ss:SSS z | 07/04/01 12:08:56:322 PST |
MM/dd/yy HH:mm:ss z | 07/04/01 12:08:56 GMT-08:00 |
yyyy-MM-dd HH:mm:ss:SSS | 2001-07-04 12:08:39:110 ** Carnegie Learning format |
mm:ss.0 z | 08:56.0 PST ** not recommended—this is the result of Excel applying a date format |
mm:ss.0 | 12:08:0 ** not recommended—this is the result of Excel applying a date format |
yyyy-MM-dd HH:mm:ss.SSSSS | 2001-07-04 12:08:39.11000 ** OLI format - 5 digit millisecond |
MM/dd/yy HH:mm:ss | 07/04/01 12:08:56 |
MM/dd/yy HH:mm | 07/04/01 12:08 |
long | 1239939193 |
double | 01239939193.31 |
yyyy-MM-dd HH:mm:ss.SSS | 2010-05-11 16:06:11.908 ** CTAT format |
yyyy/MM/dd HH:mm:ss.SS | 2010/05/11 16:06:28.65 |
MM/dd/yyyy HH:mm:ss | 2/24/2007 17:18:02 |
[3] Multiple similarly named columns that are required in pairs: For columns that are required as pairs— Condition Name and Condition Type, or KC and KC Category—these columns must be listed in the order that they are paired. For example, if a dataset file has two condition columns, the column format would be Condition Name, Condition Type, Condition Name, Condition Type.
Version Information
To get version information for the Import File Verification Tool, run the following command on a single line:
On Windows:
java -classpath "dist\datashop-verify.jar;extlib\log4j-1.2.13.jar;." edu.cmu.pslc.datashop.util.VersionInformation
On Mac:
java -classpath "dist/datashop-verify.jar:extlib/log4j-1.2.13.jar:." edu.cmu.pslc.datashop.util.VersionInformation