The Research Computing Resource

KEYSEARCH

KEYSEARCH is a cgi program that has been designed to be a study aid. Using this program a teacher can create web pages containing study questions. The student's answer is analysed and a response to the student is provided based on an analysis of that specific answer. Such responses can go well beyond the usual "correct/incorrect" that an examination program might provide. Hopefully such nuanced responses will focus the student's on-going study.

  Overview of Operation

KEYSEARCH has three components. The web page: which is where the question is posed: the student provides the answer to the question in this page and submits it. The Response Logic File (RLF): this contains the data KEYSEARCH needs to analyse the student's answer and to feed the response back to the student, and finally... The response log: this contains a log of the student answers, plus the responses that KEYSEARCH produces. Using the log file the instructor can modify the RLF better to respond to the answers the real students are providing (distinguishing from faculty pretending to be students).

KEYSEARCH works by searching for "keywords" which can be words, word fragments or phrases in the answer that the student provides. Since the web cgi interface is very general the instructor can create questions that use combinations of checkboxes, radio buttons, textareas or pull-downs as tools to build the study questions and elicit answers.

To make the program work, the instructor must create both the web page and prepare the RLF. The balance of this document describes how to do that. Skip to the bottom of the document if you need to read about maintaining KEYSEARCH materials at this point.

  Management

Before you can start, there are some management issues to deal with. First, if you are working from a distribution with source code or executables on a CD, follow this link to the instructions below. The webmaster/system manager of the web server to be used must install the program and then provide the instructor with read/write access to two areas: the location of the web pages and to a directory where the response logic files can be created and edited.

The webmaster also needs to create a directory tree in the web area where the log files can be placed. For "neatness" each course directory should be password protected to allow selective web access to the student responses. Finally, the instructor needs to be added to the mail alias file that allows e-mail feedback from the questions to the instructor.

  Creating the Web Page

Web pages, such as the one you've used many times to buy books from Amazon.com or to order theater tickets, use things called "forms" to take data that gets filled in on the web page back to the web server for processing. The instructor's job is to create a form that prepares data for KEYSEARCH. The easiest way to do this is to take the example (shown below) and to edit it for your needs. The following information will help you to edit the right things and not break it!

First off, the form needs three lines in it that provide the information needed to connect the page to the RLF needed to process the answer the student provides. These lines look like this:


<input type="hidden" name="course" value="histo">
<input type="hidden" name="subject" value="muscle">
<input type="hidden" name="qunn" value="4a">

Their function is to collect the course name, subject within the course and the question number. In the example, the course is "histo" (the tag named "course"), the subject is "muscle" (the tag named "subject") and the question number is "4a" (the tag named "qunn"). This set of information defines where KEYSEARCH looks to find the RLF: in this case it will look for a file called something like "/path/courses/histo/muscle-4a.txt" for processing instructions. We discuss the more complex options below.

Next, the form needs to collect the answer the student makes as a response to the question. KEYSEARCH expects, by default, that a tag named "response" will contain these data. So an example of a way to input a response would be captured would be:


<textarea name="response" rows=5 cols=60 wrap="off">
Replace this text with your response to the study question.
Please make your response brief, but precise.
</textarea>

The information returned to KEYSEARCH would be typed into the box on the web page.

The names of the tags used to return information are not limited to the default, however. The options are described more fully below.

  Coding the Response Logic File

The RLF contains six types of statements: Logic lines, which define tests of the student response, 'Rithmetic lines, which calculate whether a program answer is printed or logged, Answer lines, which contain answers to be printed to the student, Value lines which can perform arithmetic operations and Management lines, which define optional input to the program. Finally, Edit lines can be used to edit the information returned from the web page to produce better output to the student as shown in the cases below. The formats of these lines are all similar and the format is fairly inflexible: stick to the examples.

We'll begin with a simple overview. But if you already feel that this is heading in a direction that is way, way too complicated for you, then skip to the description of Keysearch "Lite" at the bottom of this page and see if this is better suited to your immediate needs.

Logic Lines. An example of a logic line is:

L1: 2; cat; dog; mouse; rat; cockroach

The effect of this line is to test the student answer by searching for the words cat, dog, mouse, etc., and counting the test as "true" (correct) if in minimum 2 of the words are found. In the example, the variable L1 is set to "true" or "false" depending on this result. The colon is required. The second term in the line is the word count and method (described later): the logic line is "true" if this is the minimum number of matches between words in the list and words in the student response. Note that L0 has a special significance: a line beginning L0: will be ignored (see below).

'Rithmetic Lines. An example of such a line is:

R1: L1,L2,L3,S

These lines evaluate logical expressions (to the right of the colon). Generally, "R" lines are paired with answer ("A") lines which contain the text to be returned to the student if the logical expression is true. In the example above, R1 is "true" if L1, L2 and L3 are all true. We'll describe how to write R lines a bit later. In the default operation, KEYSEARCH evaluates each of the logical expressions in turn, prints the answer line of first one it evaluates as "true" (if there is one) and then stops. Note that R0 has a special significance: a line beginning R0: will be ignored (see below).

Value Lines. An example of such a line is:

V3: V1,2,/,$value,*

"V-lines" contain arithmetic expressions (in RPN) which are evaluated when required. In this example the variable V3 is set to half the value of V1, multiplied by the number the student entered in the field tagged "value": we'll go into this in more detail later. The V-lines are used in three ways: to calculate other V-lines, as logical values in R-lines (Vx≥0 is true, Vx<0 is false) and to insert calculated numbers into fields in A-lines. Note that V0 has a special significance: a line beginning V0: will be ignored (see below).

Answer Lines. The "A-lines" contain the text that the student gets back in exchange for a "correct" answer, as defined by the processing the instructor defines for that question. For the most part, each A line is paired with an R-line which contains the logical expression to be evaluated to make the decision. The A-line content starts with the first character after the colon ":".

Answer "Continuation". If multiple answers are printed (by coding "M" [more] in the R-line paired with the A-line) they are separated by a paragraph marker (<p>). In some circumstances you may wish to suppress the new paragraph and have the answers run together on the output screen. To achieve this, the last two characters in the answer line must be "--" (two hyphens).

Substitutions into "A-lines"
A-line inclusion: If the control line contains the "SUBSTITUTE" directive, A-lines can be included into the body of other A-lines as follows:

A2: Some text {A4}; Optionally, some more text..

The term like {Ax}, if it is used, can be placed anywhere in the text. The curly brackets are required. Text preceding the leading "{" and trailing the semicolon is wrapped around the contents of the "A" line, so you may want to put in a paragraph separator (<p>) to keep the output neat.

There is an additional feature: instead of a static pointer to an A-line, a V-line can be used such as {V6}. In this case, the A-line pointed to by the V-line is inserted (e.g. if V6=3, then A3 would be inserted).

Note that if the A-line pointed to does not exist, nothing is inserted. Also the default is "NOSUBSTITUTE" so if replacement strings exist in an A-line they are ignored and are printed out.

Numeric substitutions: The results of arithmetic calculations performed in a V-line can be inserted into an A-line using a term like <<Vn|X>>. The number to be inserted into the A-line is stored in "Vn" and the format to be used is "X", where X can be one of "I" for integer, "F" for floating point, "L" for logical, "T" for time, "D" for date or "Z" for hexadecimal. If the format indicator is omitted, floating point is assumed.
NOTES: Date format: is dd-mmm-yyyy derived from a decimal number like YYMMDD (e.g. the number 81112 would be rendered 12-Nov-2008).
Time format: renders an internal time in seconds as "hh:mm:ss", or "mm:ss" if the time is less than an hour.
Logical format: results in either a T or an F being written out depending on the logical value of the V-line (Vn≥0 is true, Vn<0 is false).

For example, an A-line written like this:

A3: This is miserable! only <<V4|F>> marks out of <<V7|I>>?\
And it took you <<V8|T>>?

where V4=2.5, V7=10 and V8=435, would be written out on the web page like this:

This is miserable! only 2.5 marks out of 10? And it took you 07:15?

NOTES: A-lines are scanned for numeric substituion strings if any V-lines are defined. If there are none, no scanning is done and the substitution strings appear in the printed output.

Tag substitutions: The content of a tag can be written into an "A-line" using a term like <<$tagname>>. This can be useful to directly quote what a student may have written in the response to that student. It can also be used to handle administrative content to facilitate generating chains of questions. For example, if the student had entered "small blue cell" into the field tagged with cell3, an A-line written like this:

A6: A "<<$cell3>>" does not adequately describe a PMN.

would be written to the web page like this:

A "small blue cell" does not adequately describe a PMN.

Special Tags: There are two special tags that can be used if an authenticated user is detected. These are $USER and $EMAIL. The $USER tag is recognized only if the name has been hashed (see below). This allows the insertion of the hashed name into an A-line. $EMAIL is similar and can be used to insert an e-mail address constructed from the username. This can be used to allow the user to e-mail a web page, or other material, back to him or herself.

Numeric and tag substitutions can be mixed in an "A-line". Note that there are no spaces in these terms.

Text Line Continuation in the RLF. The backslash character "\" is used as a continuation character at the end of a line to be continued. In the case of an A-line, the first character of the following line replaces the "\" character: i.e. there is no white-space inserted. A-lines which, obviously, can be quite long, generally need continuation but any line can be continued in this way. An example of an A line, with continuation would be:

A1: Cats, dogs and cockroaches are common animals living in NYC\
apartments. Rats are not that common, except as pets.

To facilitate the continuation of "A-lines", and ONLY "A-lines", the continued material can be indented 4 spaces.

The "Give Up" Line. If the student fails to elicit any of the answers, the instructor has the option to provide him/her with the chance to get the answer anyway. To do this the instructor codes a "give up" line like this:

GU: Put the text in here....

If a G-line exists, and the student fails to get any of the answers, a button will appear on the screen that the student can press to get the answer. Since the "G" lines will, in all probability, duplicate text already in one of the "A" lines, the contents of that "A" line can be included in the "G" line. The scheme is identical to that described above for A-lines, except that you don't need to code the "SUBSTITUTE" directive for it to work:

GU: Some text {A4}; Optionally, some additional comments..

The term like {Ax}, if it is used, can be placed anywhere in the text and the curly brackets are required, as described for A-line substitution above. Note that V-lines are not applicable if the student has given up so terms like {Vx} are ignored: references to A-lines must be static.

There is a final wrinkle: if just a pair of curly brackets are used like this: {}, the "default" A-line is inserted. This is the line pointed to by the "Fail" command "Fn" in the R-line that set the student up to choose to give up on the question. These A-lines may be lines that are not paired with R-lines. Unless an un-paired A-line is accessed like this, it is never used.

Edit Lines. The edit line allows the instructor to code the web form with symbols that get expanded by KEYSEARCH. These are not of use when the student is entering free text, rather they are used when the instructor has a form with radio buttons or check boxes and the data sent back to the web server is a code like "c2", for example, which won't mean too much to the student who clicked that box. The Edit lines allow the instructor to generate a more "user-friendly" result. For example:

E1: c2|This is the cat of my dreams

The contents of the line consists of two fields, separated by the vertical bar character "|". Wherever the characters to the left of the bar are found in the text sent to KEYSEARCH they are replaced by the characters to the right of the bar: a simple edit operation. Important!   the editing ONLY applies to the output to the screen. The logic lines should be coded to look for the code returned by the web page, not for the contents of the edited line.

Management Lines

For the most part, these are optional (excepting the TagSet line when appropriate). You decide.

Question Line. You can enter the text of the question here and it gets placed at the top of the output returned by the program. This can help the student understand the programmed response, since they've probably forgotten the question at this point already... An example:

QN: List some of the animals likely to be found in the better homes in NYC.

Question Emphasis Line. You can enter a list of terms that the student is meant to have included in the response. If any of these terms are found, they are bolded in the response that is echoed. An example:
QW: epithelium hair gland nail
If the QW line is present but empty then KS looks for a tag called questionwd and uses its contents as the list of emphasis terms.

In addition, if any of the terms in the student response match terms in the emphasis line the logic term L0 is set to .true.. If an emphasis line is used, therefore, L0 provides a broad indication as to whether the student is on the right track by identifying at least one of the expected terms correctly. As a way of providing more information V0 is set to the number of keywords that are found in the student response; save this value before executing any "work" lines since it will be overwritten in that case.

Failure Lines. If all the logical expressions evaluate as false, the program will, by default, print an encouraging but bland message to the student highlit by a yellow banner. You can override this message with one of your own: e.g.

FT: Wake up bozo!! Get yer brain in gear!

On second thoughts, maybe you should stick to the default.

Control Line. The control line allows the instructor to manage the output from the program. The list of directives is listed below:

Control Commands (Shorten to the first 4 characters)
Command (Default action is bolded)Action
ALTQLOGNOALTQLOGUse the alternate Q-Log file for Q-logging.
ANSHEADERNOANSHEADERPrint the header preceding the answer KEYSEARCH generates.
AUTHENTICATENOAUTHENTICATETest a username/password pair to see if they authenticate the user: see below for more information.
BODYNOBODYPut a standard <body> line into the output web page (see below).
COMMENTNOCOMMENTSpecial processing appropriate to the comment tag.
DEBUGNODEBUGInsert debug information as a comment into the html output.
FAILTEXTNOFAILTEXTPrint the text that accompanies a failure message.
HASHNOHASHHash the authenticated username before writing it to the Q-Log.
HEADERNOHEADERPrint the header line before printing the student response.
KEYSEARCHNOKEYSEARCHSuppress the KEYSEARCH data for JSON output.
LOGNOLOGLog the transaction into the log file.
ONLYNOONLYPrint the student response to the question, then immediately exit.
PACKNOPACKStrip leading and trailing spaces from tags before substitution.
 QLOG  NOQLOG Enable logging with the "Q" directive.
 QUESTIONHEADER  NOQUESTIONHEADER Suppress the header that announces the question.
RESPONSENORESPONSEPrint the student response to the question.
STRICTNOSTRICTForce checking of the ALEX session on the verification server.
SUBSTITUTENOSUBSTITUTEAllow substitution of A-lines into other A-lines.
WORKNOWORKEnable the W option in R-lines: see below for a discussion of security issues.

For example:-

CN: NORE; NOHE; NOAN; NOQU; NOFA; HASH; \
NOLOG; NOQL; WORK; AUTH; SUBS; DEBU;

Note that in the example here, NOHE[ADER] is redundant because if you don't get a response printed there is no header to that response. Also, if you choose NOFA[ILTEXT] and you specify an "FT" line, that "FT" line will never get printed.

Why have these options? Basically, to permit a more flexible use of the output of the program, particularly in conjunction with the Edit lines described below. Remember to use only the first four characters of the directives.

Note that the NOBODY directive suppresses the automatic generation of the </head><body> line in the web page. This allows you the opportunity to output new material as part of the header of the web page (which you may need to do in some applications) but it also requires that you to provide the </head><body> tags since these are needed for a properly formed html document. If you don't, some web browsers won't display your page, or, worse, will display the page but then fail to work correctly.

TagSet Lines. We'll discuss these in detail later. An example would be:

TS: ans; bop|n;

QLog Name. It is sometimes very useful to be able to alter the name of the file to which logging is done. This is done as follows:

QL: FileName
or
QL: $tagstring

In the first case, "FileName" is a simple string and the file is named accordingly: you can do tag substitution within this text string if you wish. In the second tagstring is the name of a tag and the file name used is the string pointed to by this tag. This allows you to pass the file name into the form without having to have multiple hard-coded versions.

Alternate RLF Input. This feature allows you to insert variable RLF input into a file. This means that you can have a block of standard RLF to deal with many different types of forms, but with the option of inserting a few additional lines needed in a specific case. This is done as follows:

>>: FileName
or
>>: $tagstring

In the first case, "FileName" is a simple string and the file to be inserted is named accordingly: you can do tag substitution within this text string if you wish. In the second tagstring is the name of a tag and the file name used is the string pointed to by this tag. Once again this allows you to pass the file name into the form without having to have multiple hard-coded versions. Remember when you are writing the main RLF file to leave a gap in the numbering of the R- and V- lines sufficient to accomodate the variable blocks you plan to have inserted. Note that you can't insert variable RLF inside a block of variable RLF.

"Base." If you are using images it is sometimes useful to define a "base" for the web page so that coding a full path to the images is not necessary: its use is entirely optional, of course. However, it can simplify the creation of the question items in some cases.

BS: path

In this case "path" is the full path to the location of any images (e.g. http://machine/location/). The "base" line is inserted into the header of the web page. Note that tag substitution may be used.

"Script" files. The "NOBODY" control directive (above) switches off the <body> tag supplied by KEYSEARCH and allows the programmer to place scripts (e.g. Javascript) into the header and then to provide the <body> tag. However, if standard script or scripts are to be used in many RLFs, an easier alternative is to load the script file(s) directly into the header of the web page. "SC-lines" provide this function and simplify RLF programming.

SC: file1,file2,file3

The line consists of a comma-separated list of file names and each file name must be rooted (e.g. /full/path/to/script1.js) and readable by the web server. Multiple "SC-lines" can be coded and tag substitution may be used. The contents of these files should be complete and syntactically correct, of course.

Restart file stem. A RS-line allows the stem of the restart filename to be modified, usually to distinguish related threads in a question set.

RS: stem

The text string is added into the filename to distinguish it from others. Tag substitution may be used. More on Restart management can be found below.

ButtonBar. You can code a button bar at the bottom of the page. If you must. It has to be error-free html that does not break the framing of the page, which isn't easy to do. Example:

BB: <Some-html-string>

Comment Lines. Yes, you can add comments to the RLF. You mark them as follows:

XX: This is a comment

Note that you cannot have comments with continuation lines, each line must begin with "XX:".

Unexpected Behavior

Badly constructed RLF code: You were warned above - you must stick to the syntax of the RLF or something unexpected may happen. A specific "error" might be that part of the file might seem to be missing. This may be due to skipping a continuation character or having unexpected white-space preceding a valid line definition string, like having " TS: ..." rather than "TS: ...". In cases like this KEYSEARCH stops reading the RLF and continues processing with what is, in effect, truncated input.

Badly edited RLF code: Writing an RLF isn't that difficult and if you plan out the program getting it running should be easy. But problems can arise when you try to edit existing code, or to try to merge code from different sources. In this case it is easy to duplicate L-, R- and V-lines which have been picked up from different places. If a line is duplicated the last one to be read from the input file is the one used. A warning of line re-definition might be placed in the "debug" output to tell you that this has happened, but you need to know to look at the source of the "broken" response to see it. Be aware and when you see odd or inconsistent results, look first for line duplications.

Bad html: Another source of unexpected behaviour is bad html. Incorrect nesting of tags, for example </center>, </table> and </form>, can make some pages non-functional. It is very worthwhile, as a proactive measure, to test all pages (particularly those that Keysearch produces as a response) with the html validator at http://validator.w3.org/file-upload.html. If nothing else it will reassure you that wierd stuff on the screen isn't your fault.

  The Response Log File

The log file maintains a record of the student answers. It is formatted as HTML and is designed to be viewed on the web so that the instructors can review answers and modify the RLFs as needed to improve the operation of each question. However, for detailed reporting a plain text file won't work. To accomodate this need the log file is structured by adding semicolons (";") to separate the data elements. This degrades the readability of the log on the web somewhat, but does not do too much of a mischief. More importantly it allows the log file to be read by a spreadsheet program such as Excel which will place each of the data elements in known places in the sheet using the semicolons (";") as column delimeters. The columns in the regular log file are:-
Padding ; UserName ; KeyTag ; Date-Time ; Padding ; browserID ; AuthLog ; Other data
The columns in the Q-log file are:-
Padding ; UserName ; KeyTag ; Date-Time ; ytime ; browserID ; AuthLog ; Other data
Note that all fields will not necessarily be filled. The "padding" fields will have HTML tags in them: these columns can be deleted from the sheet to clean it up.

As noted below, the "Q" symbol causes an authenticated entry to be written to the log. To ensure that the authenticated userIDs can be kept private if necessary, these entries are written to a separate log if a writable location where they can be created has been hard-coded into KEYSEARCH (in module PARAMS). If such a location exists then it will be used for these records: if it does not, then they will be placed in the regular log. If this feature is important to you, please test that it works before putting userIDs into your log. Of course, if you code "NOQL" (No Q-log) then there will be no logging of this type. This can be important if you want to speed things up: if you code "NOLOG; NOQLOG" then no logging will be done at all and KEYSEARCH will not waste time opening a log file - which should remind you to code these if you do not plan to use the respective logs. Please also note that by default the log files remain open until KEYSEARCH exits and can have several records written to them (although this is not recommended). If you need to spawn tasks that read these files, to generate a report for the student as an example, those tasks may find that the open file is locked and unreadable. Coding QC (Q-log/Close) causes the Q-log file to be closed immediately after writing the first record (which will be the last, therefore) freeing the file to be read for analysis.

A final point. The log is written sequentially and so some lines need to be concatenated to obtain the spreadsheet-readable file that will be of most use. The padding field that proceeds each new record contains the string "++;" in positions 7-9 and this string can be used to identify it.

  What Happens During Processing?

In case you are interested, this is the way KEYSEARCH works. When the student response is posted by his/her computer KEYSEARCH is started by the server's CGI interface. KEYSEARCH begins by reading all the information posted by the client computer and then opens the path back to the client (this is when a cookie would be written to the client, but right now KEYSEARCH does not use cookies.) KEYSEARCH parses the posted data, opens the RLF and reads it all into storage. It then processes the emphasis keywords (if present) setting L0 and V0, and writes out the student response (if requested). Then the main processing starts, where the program cycles through all the R-lines and evaluates them in ascending numerical order (R1, R2, ...Rn). If items, such as L-lines, are needed for the evaluation of an R-line they are computed only at the time they are first needed. In this way nothing is evaluated that is not needed to get the answer returned to the student.

  A Simple Example of Text Searching

Lets look at a few simple examples of RLFs that could be constructed to deal with a question like this: "List some of the animals likely to be found in the better homes in NYC." The first job is to write the html for the web page. It might look like this:


<form method="POST" action="http://www.example.edu/cgi-bin/keysearch">
List some of the animals likely to be found in better NYC homes.<p>
<input type="hidden" name="course" value="NYC-Life">
<input type="hidden" name="subject" value="Apartments">
<input type="hidden" name="qunn" value="2">
<textarea name="response" rows=5 cols=60 wrap="off">
Replace this text with your response to the study question.
Please make your response brief, but precise.
</textarea><p>
<input type="submit" value=" Submit Response ">    
<input type="reset" value=" Clear Input ">
</form>

Now we have to do the hard bit, trying to decide what kind of test to be applied to the "typical" student answer.

Option 1
One option, which is not necessarily the worst, is to create a RLF which contains only an "A" line.

A1: Cats, dogs and cockroaches are common animals living in NYC\
apartments. Rats are not that common, except as pets.

This is a special case: If the RLF contains only an "A" line then that answer gets printed out irrespective of the response the student enters. Cheap, but maybe appropriate in some cases.

Option 2
The next step in complexity is to add a logic line or two:

L1: 2; dog; cat; mouse; cockroach
L2: 1; parrot; rabbit; hamster
A1: Cats, dogs and cockroaches are common animals living in NYC\
apartments. Rats are not that common, except as pets.

We don't have any R-lines. This is another special case which assumes that what the author wants is to have all the logic lines be "true": if this is so the A line is printed. Note that the general rule that A-lines be paired with R-lines takes precedence: if there are any R/A pairs, un-paired A-lines are ignored. This is the only case where a single A has an effect.

Option 3
The next level is to dissect the answer the student provides and to tailor the programmed response to that answer.

QN: List some of the animals likely to be found in better NYC homes.
L1: 2; dog; cat; hamster; parrot
L2: 1; rat; mouse; mice
R1: L1,M
A1: Cats, dogs, hamsters and parrots are common animals living in NYC\   apartments.
R2: L1,L2,A
A2: Rats and mice are not that common, except as pets.
R3: L2
A3: Really! in the better homes we have "nice" animals, not vermin.\
Think cats and dogs for heavens sake!

The effect here is probably closer to the one you'd like. If L1 is true (the student coded 2 of the keywords) then A1 is printed. The "M" in R1 tells KEYSEARCH not to quit but to continue to the next R line, R2. In R2, if L1 and L2 are true A2 is printed: if this happens KEYSEARCH quits. If R2 was false, however, R3 is tested: and if L2 is true, A3 is printed.

At this point all the lines are exhausted, and if L1 and L2 were both false, the general "failure" message is printed and the program quits. Quite likely, however, some student typed in "Turtles, cockatoos and pot-belly pigs are common pet animals in the better NYC apartment". This answer would have been evaluated as "false" and this would get logged by KEYSEARCH. The instructor, reading the log, would say to itself, "'Tis true, my pot-belly pig is indeed a mark of sophisticated living", and add it to the list in L1. Next time the question was accessed, the new term would be searched for and the programmed result might be different.

  A Simple Example of a Non-Text Question

Text analysis is tough and many valuable review questions can be coded without the heavy machinery. Lets look at a simple example of a multiple-choice question. First the web page....


<form method="POST" action="http://www.example.edu/cgi-bin/keysearch">
List some of the animals likely to be found in better NYC homes.<p>
<input type="hidden" name="course" value="NYC-Life">
<input type="hidden" name="subject" value="Apartments">
<input type="hidden" name="qunn" value="2">
<input type="radio" name="apt_1" value="dog"> Dog <br>
<input type="radio" name="apt_1" value="cat"> Cat <br>
<input type="radio" name="apt_1" value="mongoose"> Mongoose <p>
<input type="submit" value=" Submit Response ">    
<input type="reset" value=" Clear Input ">
</form>

By using radio buttons, the student can only select one of the options so the information coming back to KEYSEARCH is constrained in this case to just three possibilities: dog, cat and mongoose: so no guessing on the part of the instructor as to whether "chicken" will turn up... So a simple RLF can be written, maybe like this:-

L1: 1; dog; cat
L2: 1; mongoose
R1: L1
A1: Cats and dogs are common animals living in NYC apartments.
R2: L2
A2: Nope. Mongooses are really rare on account of them being ILLEGAL\
PETS even though they are very cute as this picture shows\
<img source="mon.jpg">

You'll notice that A2 contains explicit html code, which is quite legal and maybe valuable for the response generated. Feel free to make the responses rich html documents with links to other material, images, multimedia, whatever. Anything to keep the student from getting bored and quitting.

  More Details

The examples above show the basics of KEYSEARCH, but the program, at this level of sophistication, wouldn't be as flexible as is required to deal with all the real questions you may need to pose and all the situations where you need to set up the program. We need to explain TagSets which are needed for multi-part questions, we need more powerful searching options in the L-lines and the construction of the R-lines needs full explanation.

Named Tags

First, lets talk about the named tags that KEYSEARCH uses that carry "hidden" values to the program. These are never to be used for other purposes. The list is:

  • course - This is the name of the course, e.g. histo
  • subject - This is the subject within the course, e.g. muscle
  • qunn - This is the question number, e.g. 5

The values above must be coded since they define the file names of the RLF and the student response log, and the e-mail alias for students to contact the instructor (see below). There are situations where the defaults coded internally in "KEYSEARCH" for the paths to these files are not suitable. In this case other named tags are available:

  • filepath - This is the path to the RLF file, and
  • htmlpath - This is the path to the html response log

For example, if "filepath" is (optionally) defined as the path to the RLF it overrides the internal default. On a UNIX system "filepath" might be set to "/usr/local/questions/" so the path that will be created is: "/usr/local/questions/subject-qunn.txt", where "subject" and "qunn" will be replaced by the values given to them in the web page. Similarly, the path to the html will be: "htmlpath"subject-qunn-r.html: "htmlpath" will be replaced by the string you code. VERY IMPORTANT! "filepath" and "htmlpath" must contain the delimiter: "/" in the case of UNIX, "\" for Windows, etc. The example above shows this.

The last named tags are:

  • author - The text associated with this tag is included as the "DC.Author" metatag identifying the question author.
  • banner - This is the banner control tag. If it is omitted, you get the standard RCR banner. If you code it with a number >0, then the banner is omitted. Yes, we'll fix that soon...
  • comment - The comment tag, with the COMMENT directive, turns on special processing for VirtualMicroscope markers. The tag contains a description of the tissue pointed to by the marker. If used in this context comment should be added to the TagSet line so that it is kept separated from any other tags.
  • contenttype - Set the "contenttype" tag to json to get KS to format its output in JSON (JavaScript Object Notation) data-interchange format rather than html (more).
  • debug - The debug tag set to "debug" will switch on DEBUG output from KEYSEARCH. The RLF can override this by setting "NODEBUG" in the control line. If debugging is "off" debugging can not be transmitted to down-stream scripts.
  • keytag - This is a tag created internally for answer tracking. See below under Authentication.
  • mailpath - This is the server address that will understand the aliases that are used for student feedback, see next. If this is not coded, the server is "syllabus.med.nyu.edu", a hard-coded parameter you can edit in the source for your site.
  • questionwd - If the QW line is present but empty then KEYSEARCH looks for this tag and uses its contents as the list of emphasis terms. questionwd should always be added to the TagSet line.
  • title - The title placed on the web page that is generated.
  • ytime - The time in minutes since 1-Jan-2007 GMT (for tracking sequential events in a log).
  • username - The user's authenticated ID (PIMS or Apache).
  • password - The user's password for authentication (PIMS only).
  • alexuser - The user's ID passed from ALEX to Keysearch.
  • pubkey - The key for the ALEX to Keysearch handoff.
  • lockey - The "local key" to be used for a Keysearch-to-Keysearch handoff (with "alexuser").
  • sessionid - The ALEX sessionID. A reserved tag, normally unused.

E-mail Feedback

KEYSEARCH uses the "course" and "subject" tag names to define aliases for the course director and the individual subject instructors. Mail to the course director is sent to: <course@mailpath>. Mail to the instructor is sent to: <course-subject+qunn@mailpath>. Note that the question number is included as a sub-address ("+qunn"), which your mail server will either ignore, or process intelligently.

'Rithmetic Lines

R-lines are used to compute logical expressions built from the results of searches on the student answers, L-lines, the results of earlier computations either logical (R-lines) or arithmetic (V-lines), and logical operations, such as .and., .or., etc... Simple examples are shown above.

R-lines are computed in order, starting with R1 then R2, ..., Rn, etc. Note that R-lines do not have to be paired with A-lines - if they are not paired they get computed and the answer is available to use in subsequent calculations. Also, an unpaired R-line that evaluates as .true. does not halt processing (and an "M" is not needed to continue): after evaluation processing continues with the next R-line in sequence (or the line pointed to by a "Jx".)

The syntax of the R-line is RPN (reverse Polish notation), which will be familiar to users of Hewlet-Packard handheld calculators. With this calculator, the user "pushes" values onto a "stack" and then performs operations on them: this method allows very simple notation and eliminates the need for brackets for complex computations. For example, on the HP if you enter the numbers and operations 2,3,4,-,+ you get the answer 1 (conventionally, this is the equivalent of (3-4)+2). Once you get the hang of it, it is easy.

R-lines evaluate logical expressions, where the input is the logical values of the L-, V- and R-lines previously calculated. Additionally, tags and logicals can be interpreted to have logical values. These are coded in the R-line as $temp (a tag called "temp") and %temp (a logical called "temp"). The string pointed to by the tag or logical will have a logical value as follows: if that string starts with t,T,1,+,.t or .T then the value will be .true., anything else will be interpreted as false. Note that if you are using a tag or logical, such as $temp, then if that tag or logical is missing or empty its logical value is .false..

For example, the line

R1: L1, R2, L2, $temp, L4

causes the logical values L1,R2,L2,$temp,L4 to get pushed onto the stack which will be 5 values deep: L4 is lowest on the stack (last pushed on) and L1 is the highest. If I were now to push the operation "N" (.not.) onto the stack, I would reverse the logical value of the lowest member, L4 in this case. If I were to push "A" (.and.) onto the stack I would compute the logical value (L4.and.$temp) and reduce the stack depth by one leaving the new value on the bottom and the other two above it. Remember that V-lines have a logical value of .true. if they are ≥0.0 and .false. if they are <0.0.

String comparisons can also be used when evaluating R-lines. See the paragraph below.

The way processing works, generally, is to calculate R-lines in ascending numerical order. As each R-line is evaluated each of the terms in the expression is evaluated: if an L-line is referenced it is evaluated if it has not been evaluated previously. Similarly, if a V-line is referenced it is evaluated if it has not been evaluated previously. If an R-line references L or V lines that do not exist in the RLF, those values are assumed to be .true.. By default, the logical value assigned to R-lines is .true., so if you use an R-value that you have not computed (e.g. use R6 when you're calculating R2) the value that will be used is .true. (It does not jump forward to compute R6, then come back to finish the current calculation).

The logical expression evaluates to the logical value of the lowest stack member when all the terms in the line have been processed. (HINT: If you're finding odd results, maybe you're not calculating with all the data in the manner you think you are.)

The following table sets out the logical operations that can be used:

Action on StackOperationSymbolStack DepthDescription
BeforeAfter
Add a MemberTrueTnn+1Push "True" onto the stack
DuplicateDnn+1Duplicate last stack member
"Ident" I / IA / IX / IPnn+1Push authentication status onto the stack (1)
RandomCmnn+1Push "True" onto the stack 1/m (m≥2) of the time.
Restart$Rnn+1Push "True" onto the stack if a restart file is available.
Lowest member.not.NnnReverse logical sense of lowest stack member
Lowest Two Members.and.Ann-1Logical .and. of last two members
.or.Onn-1Logical .or. of last two members
.eqv.=nn-1Logical .eqv. of last two members
.neqv.Xnn-1Logical .neqv. of last two members (eXclusive .or.)
Several MembersCountKx|ynn-y+1"True" if x or more members of the lower y stack members are true; "False" otherwise.(2)
Entire Stack"Sum"Sn1Logical .and. of all stack members
"Purge"Pn0Empty the stack.
Output & Process
Management
"Branch" BnnnEvaluate Rm next where m=Vn i.e. the value stored in Vn
"Fail"F[n]nnTell KEYSEARCH that if the expression evaluates as true, the program should nevertheless treat it as false and print the "Failure" text to the screen when the R/A pairs are exhausted. If a number "n" is present this is the number of the A-line to be used as the "default" in a "Give up" line.
"GiveUp"G nnIf the line evaluates as true toggles the "Give Up" option off and on so that not all "Fail" results need offer the answer to the student. (3)
"JSON" JSnnOutput a JSON field.(4)
"Jump" JnnnEvaluate Rn next(5)
"More" M nnTell KEYSEARCH not to quit processing if the expression evaluates as true. It should print the answer and move on to the next R/A pair.
"QnLog" Q / QC / QEnnWrite authenticated question line to the log; "C" causes the log to be closed immediately after the write; E encrypts it. (6)
"Restart"RnnWrite a "Restart" file
"ResetAuth"RAnnSet the authentication status to the result of the R-line
"Work" W / WHnnSubmit the corresponding "A-line" to the shell; "H" causes any output to be hidden. (7).
"copY" Y / YHnnOpen and copy the file(s) referenced by the corresponding "A-line" to the browser (8).
"AltPrmt" ZnnUse alternate "Give up" prompt.
(1) "I" is .true. if the session is authenticated. The other operators are true if the authentication came from Apache (IA), ALEX (IX), or PIMS(IP), allowing you to distinguish the authentication source.

(2) The "K" operator counts "trues". If "y" is missing (or zero) the entire stack is counted. If "y>0" then the lower "y" stack members are counted and removed from the stack, the result is pushed onto the bottom of the stack and the upper members that were not counted are available for further computations. Note that "K1|y" is the same as an .or. on the bottom y stack members and "Ky|y" is the same as an .and. . And of course if x>y, the result can only be .false. This is a very powerful operator. Enjoy!

(3) The "G" operator toggles the "Give Up" option. If a "GU" line is coded then the option is on and all failures will cause the "Give Up" button to be posted into the output. The "G" operator will toggle the option off, and on again, if required. It can't toggle on if there is no "GU" line provided, of course.

(4) The "JS" operator causes the "A" line to be output into JSON. The A-line needs to be formatted as "field-name":"data". Substitutions can be made into the line as per normal (see below).

(5) The "Jn" operator causes the "nth" R-line to be evaluated next (rather than whatever was next in the sequence). After evaluating "n" it evaluates "n+1", and so on. Jumps can be forward or backwards, but the rule is that no line can be evaluated twice: if a line has been evaluated already, it is skipped. This stops endless loops (and sloppy programming).

(6) Normally the log file remains open until KEYSEARCH exits but this may mean that it cannot be accessed by utility programs. "C" allows the log file to be closed and therefore made available for other uses. "E" causes the output line to be encrypted - on VMS the algorithm will be AES128 CBC - UNIX it is pending.

(7) With this command, a sub-process can be spawned to the shell. This is potentially dangerous option since it might be subverted to execute malicious code on the server. Unless there is a need for it, disabling this option is recommended (see below).

(8) The A-line contains a comma-separated list of file names to be copied. Each file name must be rooted (e.g. /full/path/to/my-file.txt) and readable by the web server. The "Y" directive allows you to avoid using "W" for one common requirement. The "H" option (hide) allows you to test that all the files can be opened with the result being returned in R0.

One additional resource available is the R-line accessed as "R0". At the beginning of processing "R0" is initialized to .true. if there is an authenticated user (the same value as "I"). Subsequently, if a "Work" operation is executed, "R0" will contain the completion status of this operation: .true. if it executed without error, .false. if it did not. Additionally, "R0" will be true or false depending on whether a file has been copied to output without error (the "Y" operator: see above). Examples include testing whether a log entry has been created that indicates that a user has already accessed some material, such as an exam or test, or whether a copy has completed correctly: if the file to be copied is missing "R0" will be .false..

Value Lines

V-lines are used to compute real-number arithmetic expressions (in RPN) built from numerical values the student enters as part of an answer, the results of earlier computations of either logical (L-lines and R-lines) or arithmetic (V-lines), of constants and of arithmetic operators, such as +,-,*, etc... Simple examples are shown above.

V-lines are computed "on demand", triggered by being referenced in an R-line or by being needed for insertion into an A-line about to be sent to the student. L-lines used in a V-line will be computed if they are needed. But V- and R-lines must be evaluated before they are used in an expression calculating a V-line: e.g. if V2 is being calculated and V6 and R23 are terms in the expression that have not yet been evaluated then they will be assumed to be zero (even though they might come to be evaluated to something else later). Once a V-line is evaluated is will not be re-evaluated subsequently (e.g. if one of its terms changes its value from zero).

An example of a V-line is as follows:
V2: V1,2,*,$temp,+,R3,*,%logout,+
In this example the "+" and "*" are arithmetic operators, "2" is a constant (obviously) and "V1", "R3", "$temp" and "%logout" are values from various sources. Specifically, "V1" is the value of an expression computed earlier (presumably) and "R3" is the result of logical evaluation of a R-line: R-lines (and L-lines) have a numerical value of "+1.0" if they are .true. and "-1.0" if they are .false..

The "$" indicates the individual name of the tag of a text field that the user will have typed a number into. The value of that number is indicated by, for example, "$temp". Note that these are individual tag names, not TagSets that the L-lines work with.

The "%" indicates the name of a symbol (UNIX) or logical (in the LNM$JOB table; VMS). Typically, the logical will have been defined in a spawned task in order to transmit a numerical value that KEYSEARCH will decode and use. Note that symbols/logicals are not limited to numbers, but KEYSEARCH assumes that they are numbers in this context: a pure character string will be evaluated as zero.

There are two additional features provided. First, hexadecimal numbers can be used, indicated by the "#" sign: for example, "#15abff" is a hex constant. By extension in "#$fred" the content of the tag "fred" will be evaluated as a hexadecimal number. Second, hashing can be very valuable as a means of hiding things in plain sight. ^string causes the evaluation of Keysearch's internal hash of the string "string". Once again, by extension, ^$fred is the hash of the string pointed to by the tag "fred". Using this you can check for a "correct" input answer with the anticipated correct result:

V3: ^$answer,#$result,=

There are many operators that can be part of a V-line. The table presents them.

Action on StackOperationSymbolStack DepthDescription
BeforeAfter
Add a MemberDateDnn+1Push the date ("YYMMDD") onto the stack. e.g. "060913"
DuplicateDPnn+1Duplicate last stack member.
KeyTagInn+1Push the KeyTag onto the stack.
RandomCnn+1Push a random number 0.0<x<1.0 onto the stack.
TimeTnn+1Push the time in seconds from midnight onto the stack.
YTimeTYnn+1Push the time in minutes since 1-Jan-2007 onto the stack.
Lowest memberNegateNnnReverse sign of lowest stack member
FixFnnDiscard fractional part of the number
Lowest Two Members SwapWnnSwap last two members
Add+nn-1Add the last two members
Subtract-nn-1Subtract last two members: s(2)-s(1)
Multiply*nn-1Multiply last two members
Divide/nn-1Divide last two members: s(2)/s(1)
MaximumXnn-1The maximum of the last two members
MinimumMnn-1The minimum of the last two members
Exponentiate^nn-1s(2)**s(1). ("**" does the same thing)
Equals=nn-1s(1)=+1. if s(1)=s(2), -1. otherwise: drop s(2)
ModulusUnn-1s(1)=amod(s(1),s(2)): drop s(2)
Greater thanGnn-1s(1)=+1. if s(1)>s(2), -1. otherwise: drop s(2)
Lowest Three Members RangeYnn-2s(1)=+1. if s(2)<s(3)<s(1), -1. otherwise: drop s(2) & s(3)
RollOnnRoll bottom three stack members so that s(1) becomes s(3), s(2)→s(1), s(3)→s(2)
RanNumQnn-2For a given seed, s(3), provide the s(1)th member of s(2) random integers in the range [1,s(2)]
Entire Stack "Branch"Bn1Set s(1) equal to the values (s(n)), such that 1≤s(n)≤n-1
"Count"Kn1Count stack members >0.0
"Sum"Sn1Sum of all stack members

One thing you may have noticed is that the operators: "=","G", "Y" and "K" are (in a way) "logical" operators to produce values that will be seen as .true. or .false. when used in a R-line, or which count "trues" in the case of "K". This is a very useful feature and allows the use of R- and V-lines to be tightly integrated.

Note that V-lines lack the output control operators. These operators are the exclusive province of R-lines because R-lines are the only things that cause output to be produced.

Finally, V0 is also available and is used to store the return status from system calls. This feature allows you to get more detailed information about the outcome of a program that has been called, provide that that program actually offers something useful. Remember that the return status for "success" from a VMS program (one) is different from that from UNIX (zero). So scripts testing V0 need to be different for different systems.

"Dereferencing": V-lines can contain computed pointers to other V- and R-lines to be used in a calculation. The symbols used to indicate this are "[]" for V-lines and "{}" for R-lines. For example, the line..
V7: [5],{3},*
.. means: Set V7 to the result of the product of the value of the V-line pointed to by V5 with the value of the R-line pointed to by V3. Note that the V- and R-lines pointed to must have been computed already: they are not evaluated as a result of being the targets of a dereferencing operation.

Tag Substitutions

Tag substitution, as described above for A-lines, can be used in a number of other RLF lines as well as a way to customize parts of an RLF by allowing the contents of a tag to be embedded into a text string. As KEYSEARCH is extended more of the lines of the RLF are being made available for tag substitution.

String Operations

Case-sensitive string search operations can be included in an R-line. In some sense this duplicates functionality provided by the L-lines, but it permits some more specific tests. String comparisons are introduced by the "=" or "-" symbols, and look like this:
=stringA|stringB

If "=" (equals) is used to introduce the expression it will be evaluated as .true. only if "stringA" is indentical to "stringB": the vertical bar "|" separates the two strings.

If "-" (minus) is used then the expression will be evaluated as .true. if "stringA" is a substring of "stringB".

If the second character is the circumflex "^", then the strings are compared in a case-insensitive manner. For example....
=^dogbreath|DogBreath
In this example the result of the comparison is .true.: if the "^" were omitted, the result would be .false..

The strings to be compared are either strings pointed to by named tags, indicated by $tagname, or simple strings. For example, the expression
-dog|$animal
will be true if the string the user inputs into the field tagged as "animal" contains the word "dog".

There is a special tag called "$USER" which contains the authenticated user name which you can use to probe for a specific user name. An example of its use would be =$$USER|Jones. The would be .true. if the authenticated username was "Jones".

A second special tag is "$USER_AGENT" which contains the string that identifies the browser to the server. These strings are long and complicated and not useful elsewhere, but sometimes you need to have tests to discriminate between working and broken browsers so that you can stop people using them. For example, the test: -MSIE|$$USER_AGENT will be .true. if the browser being used is Microsoft Internet Explorer. Knowing this you can react appropriately :-)

A helpful feature is the ability to test for empty (null) strings. If $alexuser and $empty are both empty, then the expression =$alexuser|$empty will evaluate as .true..

These string comparisons can be used in both R- and V-lines. In the case of a V-line, the .true./.false. values are pushed onto the stack as +/- 1.0.

An example of an R-line using a string operation might be ....
R5: R1,L2,=$x20|$y20,S
In this example, R5 is true if R1 and L2 are true and the content of tag "x20" is identical to the content of tag "y20".

IP-Masking

An important extension to the software allows access decisions to be made on the basis of the IP of the remote user's browser. Masking comparisons are introduced by the "=" and can be one of the following types:

=maskedIP/bitmask
or =maskedIP||mask

The "maskedIP" is the required value of the IP after masking with the mask. If the user's IP number is masked and the result is equal to "maskedIP" then the mask operation evaluates as .true.. "bitmask" is the count of most significant bits in the mask, so 16 is 16 bits (two bytes). For example, if the mask operation requested is =128.122.0.0/16 then if the user's IP is 128.122.135.4 the mask operation would be .true. because the top two bytes in the user's IP address are 128.122. "mask" allows an explicit description of a mask such as 255.255.0.0 which is equivalent to a bitmask of 16. Similarly, 255.255.255.255 is equivalent to a bitmask of 32. But a mask of 255.255.192.128 does not have a corresponding bitmask.

The two options for mask operations require the "=" and a dotted string separating 4 or 6 byte values followed by either "/number" or "||dotted string".

You'd use a mask operation to permit more sophisticated access rules to be defined. For example, if you wanted to allow users at your site free access to a quiz but require people to authenticate who were coming in from outside 128.122, you could write a string like this:-

R5: =128.122.0.0/16,I,O,N
A5: Sorry, you cannot access this resource remotely without authentication.

TagSets

It is sometimes convenient to write questions where the student enters an answer via several different means, e.g. check-boxes and textareas, or radio buttons and pull-downs, etc. Each of these form elements must be tagged with a different name and KEYSEARCH needs a way to test each of these separately (if that is what you want to do). This is done by defining tagsets like this.

Lets assume we collect two blocks of text input from the web page
<textarea name="one"></textarea>
<textarea name="two"></textarea>
then KEYSEARCH will have these two available via tags called "one" and "two". We need to tell the program that these are to be kept separate by defining a tagset line in the RLF as follows:

TS: one; two;

It can be made more flexible as follows. If there are 5 text blocks input like this:
<textarea name="one_1"></textarea>
<textarea name="two_1"></textarea>
<textarea name="one_2"></textarea>
<textarea name="two_3"></textarea>
<textarea name="two_2"></textarea>
the effect of the tagset line will be to cause the two blocks names "one_1" and "one_2" to be concatenated and marked as "one", and the three blocks names "two_1", "two_2" and "two_3" to be concatenated and marked as "two" for processing in the logic lines.

By default KEYSEARCH prints out all the data collected from the web form, excepting that from the named tags. But some of these data, for example information about a checkbox being checked, might be thought just to clutter the output. So the printing of data collected from a TagSet can be switched off by appending "|N" to the name. e.g.

TS: one; two|N; three;

Of course, if you select "NORE" then all such information is suppressed.

In some cases you might not want the content of a TagSet to be written to the log file. TagSets named "nologxxxx" (e.g. "nolog_1", "nolog_2", etc.) will not be logged. A date and time stamp is placed in the log, however. Note that the TagSet must be declared in the "TS" statement or the tag content will get pushed into the "Miscellaneous" set and get logged.

Before leaving TagSets you need one last piece of information. KEYSEARCH defines two tagsets automatically, these are the "response" and "miscellaneous" sets. Any data with named tags that aren't called "response" and aren't named in the tagset line are concatenated into the miscellaneous set in the form tag_name_1=tag_value_1; tag_name_2=tag_value_2; ... A tag_name gets written only if the value associated with it is non-blank. This is useful if you are processing data from a single set of checkboxes which may have various names (it won't matter) so long as the data returned allows the detection of whether that box got checked. Lets move on...

Restart

The R operator in an R-line causes the corresponding A-line to be written into a "Restart" file which can be used to re-establish the context of a series of linked questions in an exam (for example). You should only write one restart file in each call to Keysearch. The content of the A-line needs to have in it all the hidden variables that will be needed to restart. An example might be similar to this:-

R12: T,M,R
A12: <input type="hidden" name="course" value="<<$course>>">
     <input type="hidden" name="title" value="<<$title>>">
...etc...

If you wish to restart a session you need to recreate the context by reading in the restart file. You can test whether the file is available using the $R symbol in an "R-line". $R will be .true. if the restart file exists and .false. otherwise. For example:-

R13: $R,N
A13: Woe and lamentation! No RESTART file is available.

If the file exists you can copy it to the web page using the "Y" operator (copy) using the symbol $RESTART to point to the location of the restart file.

R14: T,M,Y
A14: $RESTART

Remember, you can restart only if you are carrying context along through a series of linked pages. If you are running a quiz with all the questions on a single page then the is no context you can preserve and if the connection is lost while a student is in the middle of it, then they'll have to start over again.

If you create a restart file you should be sure to clean up and not leave them littering the disk. Copying the restart data to the output causes the $RESTART file to be deleted. You can use this to remove the file. But it also constitutes a risk because if you are in the middle of a "test" and something happens after the file has been deleted, you can't restart.

To help manage restarts better there are some alternate "versions" of $RESTART. If you use $RESTART-KEEP, then the restart file will be copied to output but not deleted. So if you are running a quiz, students could see the question, break the connection, research the answer and go back to re-read it. You choose whether you want to use this feature. You can also use $RESTART-REMOVE which has only one function that of deleting the restart file: nothing is output. This is the best way to clean up at the end of a test.

Finally, remember that $R has special significance. Avoid using tags called "R" since you can't use $R as a variable carrying the logical value of the tag R. If you must use a tag called R its logical value can be coded as $$R.

"Work"

The W operator in an R-line causes the corresponding A-line to be spawned to the shell if the line evaluates as .true.: if the spawned task produces output, that will be passed to the browser. This is potentially very useful since it allows other software to be invoked to do useful work such as creating images, graphs, charts or other complex computations that can be incorporated into the response sent to the student using parameters taken from V-lines. It also has the potential to be very dangerous by providing a "back-door" to allow malicious code to be run on the server. The risk of inadvertent use of "W" is reduced by requiring that the WORK directive be included in a control line. Note that the completion status of the shell operation is returned as "R0" which will be .true. to indicate "success", .false. otherwise. More detailed information can sometimes be obtained from the actual return status value which is placed in V0. Usually V0 is set to zero. Additional signalling between spawned tasks can be obtained using logicals/symbols set by the task and read in V-lines using the "%LOGICAL_NAME" symbols: this is the preferred method in VMS where the return status is filtered by the task handler and may not be practically usable as a way to transmit information.

Finally, the WH operator can be used to hide the output of the spawned task. This is useful if that task produces status messages that obtrude into the resulting web page. It isn't useful if the purpose of the task is to contribute to the page the user views.

Authentication

Authentication using PIMS:
KEYSEARCH allows users to authenticate with their KID and e-mail password at NYUSoM. If the user authenticates then information can be written to the log file with the results of a question or quiz with the user's KID attached. This allows a record to be kept of these results. KEYSEARCH isn't an exam system, so there is no big effort here to be secure. However, if the "POST" method is used with a secure server there should be little chance that a password escapes. Note that the directive "AUTHENTICATE" is needed to cause a username/password pair to be tested: this is a slow step so don't authenticate if you don't need to. "I" causes the authentication status to be pushed to the stack for an R-line: .true. if authentication was successful and .false. otherwise. "Q" causes the corresponding A-line to be written to the log file with the authenticated user's KID attached: nothing is sent to the output web-page if "Q" is used. The output is formatted with fields separated by semicolons (";") so that the log file can be uploaded into Excel and sorted to provide a table with student results for, say, a pop quiz.

A word about the authenticator: at NYU we use a database called PIMS and the authentication step consists of submitting a command like:

/path/pimsauth username password

to the shell. KEYSEARCH looks at the exit status of this command. An exit status of "success" indicates that the username/password pair authenticated, and "failure" indicates that the pair did not. Any program available locally that can do this can be used as the authenticator; you don't need PIMS. Just code the path to the authenticator program in the "setup" module and re-compile.

Finally, "IP" specifically tests PIMS authentication and will be true if this was successful. So, for example, if ALEX authenticated the session, "I" would be .true. but "IP" would be .false..

Authentication using ALEX:
If authentication is requested (with the directive) and PIMS authentication isn't available (or fails), then KEYSEARCH will look for the "user" that ALEX has authenticated. ALEX (NYU's instance of Sakai) is used as the student portal and it needs to hand off student sessions to Keysearch. This is achieved by ALEX by handing off the user's name in the tag "alexuser" plus a key called "pubkey": if these "match" then the user is authenticated. Note, that when additional care is needed to ensure that the ALEX-to-Keysearch handoff is authentic, the control directive STRI[CT]CT can be coded which causes the "pubkey" to be verified by a call back to the ALEX verification server - which is hopefully not spoofable.

This "alexuser/pubkey" pair is used excusively at the ALEX-to-Keysearch handoff. When requiring authentication when passing Keysearch-to-Keysearch you use instead the tag "lockey", which is created and regenerated internally: the programmer uses a "alexuser/lockey" pair in this circumstance. There is no call back to ensure the validity of the "lockey", but the internal generation and validation checking ensures that the key is only valid for a single user within a 20 minute window after its generation: you can't steal a "alexuser/lockey" and expect it to work for a different user or outside the time window. Note that the "alexuser/lockey" pair is valid for calls to Keysearch between different machines, provided those machines are time-synchronized.

Finally, "IX" specifically tests ALEX authentication and will be true if this was successful.

Authentication using Apache:
If authentication with ALEX fails, then KEYSEARCH will look for the "user" that Apache has authenticated via a login to access the page. This name is saved in the environmental REMOTE_USER and is taken as the authenticated user. Whether it is really a user or just an access keyword can't be established by KEYSEARCH, so you should be aware that this is seen as less secure than using authentication by the PIMS (or equivalent) method. Note that if Apache authentication is being used the programmer does not need to pass either "username" or "alexuser".

Finally, "IA" specifically tests Apache authentication and will be true if this was successful.

Using the KeyTag
When authentication succeeds KEYSEARCH creates a KeyTag, which is a random, one-time 8-character number that is associated with the username of the user that has authenticated for this KEYSEARCH session. The KeyTag is written out to the log file with the username if "Q" is used to create such a log record. The KeyTag can be pushed onto the Math stack and used to create hidden input to the next form in a series that essentially knits a series of forms together. In fact, you, the programmer must do this if you are to be able to track a single user as s/he navigates a chain of forms (see the examples just below). If a valid KeyTag is available (positive number) this means that this string of forms was authentiated at some point. If no KeyTag was available the default is a negative number, equivalent to a .false., and this can be used to drive the output. For example:

V1: I
R4: V1,M
A4: <input type="hidden" name="keytag" value="<<1|I>>">

will print the keytag line only if there is a valid tag. If you know you are going to have a valid KeyTag you could alternatively do this:

R4: T,M
A4: <input type="hidden" name="keytag" value="<<$keytag>>">

Use of logging with "Q" in subsequent forms allows a chain of forms to be followed since the same KeyTag will appear in all.

A final point: KEYSEARCH creates a KeyTag only if it authenticates a user. But if you can't or don't want to authenticate users you can program your own "private" KeyTag, log it with a "Q" log line (it won't appear in the KeyTag spot, but who really cares..), then pass this KeyTag on to subsequent forms where it will behave just like the real thing.

Logic Lines

The general format of a logic line is:

Lx: "CountMethod"; ("Pattern-1"; "Pattern-2"); etc....

With each of the terms separated by semicolons. Lets look at each term in the line:

Lx - "L" means the logic line and "x" is the reference number

"CountMethod" - This term is built from optional components:
[^][C,U,O,S]count[|tagset]
^ - If the "^" character is included in the count method it means that all the pattern matches in that line are to be done via SOUNDEX, a method to defeat bad spelling: sort-of. We'll discuss it later.
O - If the "O" character is included it means that the patterns are to be found in order in the student response.
S - If the "S" character is found it means that the patterns are all to be found within a single sentence, defined by periods.
C - If the "C" character is found it means that there must be at least "count" characters in the student response. The patterns are ignored - this is a simple length measurement of the response.
U - If the "U" character is found it means that there must be no more than "count" characters in the student response. The patterns are ignored - this is a simple length measurement of the response.
"count" this is a number defining how many pattern matches there must be (excepting for the "C" and "U" options above). There must be a "count" value.
"|tagset" - This component is present it means that the search for pattern matching will be done on the set "tagset". If the term is missing, then the search is done on the default set, which is "response" (i.e. it is equivalent to "|response"). If a single "|" is coded, then the search is done on the miscellaneous set. (Take a deep breath..) If the tag called "response" is empty then the default set becomes the miscellaneous set, and the "|" term can be omitted (i.e. then searches are done on the miscellaneous set even though you don't have a "|" coded). Which is why the "non-text" example above worked - you were probably wondering. If both "response" and the miscellaneous set are empty, the default set becomes the first set listed in the tagset line.

Some valid CountMethods are:
2 - find 2 matches;
^5 - find 5 matches using SOUNDEX;
O3|one - find 3 matches, in order, inside the response tagged "one"
C40|two - true if the response tagged "two" contains at least 40 characters.

"Pattern" - These are the words or strings that must be found inside the response the student provides. There are four ways to designate these:

  1. Simple string - e.g. fred - search for the string "fred"
  2. SOUNDEX search - e.g. ^fred - search for the word "fred" using SOUNDEX: this allows you to mix SOUNDEX and non-SOUNDEX searches in the same L-line. Note also that if you put the "^" in the CountMethod, then a "^" in the pattern is redundant because all searches are via SOUNDEX in that L-line.
  3. Weights - In another twist you can add a weight to the word that you are searching for. The weight is designated by an asterisk, such as "fred*5": in this case, if "fred" matches in the answer the student provides, that match counts for "5" towards the total required for the logic line to evaluate as true. The instructor therefore has the ability to give greater weight to some answers over other similar answers and therefore to be more discriminating. Note that is a weight is added to a pattern it must be the last item inside any other delimiters: e.g. fred*5, "tiny dogs"*2, 'big green chicken'*2.
  4. Strings containing wild-cards - The number sign ("#") is the wild-card character, so "fred#jones" will match "Frederick Alexander Leyland Jones" and "cat#dog#chicken" will match "Cats, dogs and chickens" and also "catdogchicken" but not "dog chicken and cat". Wild-card strings don't work with SOUNDEX (obviously).
  5. Double-quoted string - e.g. "fred jones" - search for the phrase including embedded spaces.
  6. Single-quotes - e.g. 'fred or fred' - search for the word fred, including a leading and/or trailing space (the quote is turned into a space).
The point to bare in mind is that pattern searches are for strings anywhere in the text. So "fred" matches "fred" and "Frederick" and "damnfoolfredthenumbskull". This is just terrific, provided thats what you want. Use the different options to get the search you need.

Grouping - You can use brackets to group similar patterns. For example:
L1: 2; (artery; arteriole); (vein; venule);
In this case you want two hits for L1 to be true, but you want one hit for a vessel on the venous side chosen and one from the arterial side, not both from the same. If a match is found for one member of the group the rest of the group is skipped and the other patterns are tested. You can mix SOUNDEX and regular pattern matching in groups.

Finally, a word about SOUNDEX. This is a method for encoding text to increase the chance of a search match when there is a high likelihood that the student will mis-spell the word(s): it was originally designed to help with finding names in a phone book. The algorithm converts all words to 4-character tokens, which was probably fine for searches on surnames, but isn't ideal for technical and medical terminology. Nevertheless, it does improve the chances that you "match" and that is really the goal here. Use it, but be aware of the limitations. Don't use it everywhere - students are expected to spell something! Note that you can't do SOUNDEX searches on word fragments since SOUNDEX obliterates the word when it builds the token. So single-quoted strings and SOUNDEX don't mean anything. Finally, you can't mix SOUNDEX and non-SOUNDEX searches and have them ordered or partitioned by sentences: it can't work (and shouldn't) so if you request a sentence search and specify SOUNDEX anywhere in the search, SOUNDEX will be used throughout.

Text Searching

A final point on text searching. Before searching, the text entered by the student is processed to strip extra white-space, odd characters and punctuation, and is converted to lower case. So a search for "3'carboxy..." will find the "'" removed. The only exception is the "%" character, so "100%" will work, but other stuff, like "#", will go. Note that SOUNDEX simply drops numbers (so "100%" just would not be used in a SOUNDEX search). Don't design searches that rely on discriminating these things.

Response Logs

KEYSEARCH logs student responses to a file that can be viewed from the web. Student identities are not logged, only the date-time, the response and the answer that the RLF designated for that response. Instructors should review the logs regularly and use this feedback to tune the RLF to be sure that students get the feedback they need: the success or failure of KEYSEARCH depends on the manner it dispenses the instructor's wisdom. If it is too difficult to get it to feed back information, then students won't use the tools.

JSON and Web Services

KEYSEARCH is able to generate its output in JSON (JavaScript Object Notation), which is a lightweight data-interchange format. This format is most efficient when interacting with web services that need to use KEYSEARCH as an engine for student answer manipulation and scoring. The JSON code essentially places the body of what would have been an html response into a JSON package for return to the service requester. The JSON package looks like this: -
{"keysearch" : "<html-payload>"}
Any html comments bracketed by <!-- and --> are stripped out of the payload unless DEBUG is turned on.

KEYSEARCH can also write other lines to the data package using the "JS" keyword in an R-line. These lines are output before the "keysearch" line in the package. The author is responsible for constructing a valid JSON line, which should look something like this: -
A5: "score" : <<V3|I>>,  or
A7: "result" : "Answer is CORRECT", 
In the first case the value of V3 is substituted into the text for A5 and is output. The trailing comma is required unless NOKE is coded and the A-line is the last to be written out. Also, while numbers may be un-quoted, text strings must always be quoted and embedded quotes must be escaped with the backslash "\" character. If "JS" is used for html output rather than JSON it has no effect.

Two markers, <!--datastart--> and <!--dataend-->, are included in html output to delimit the data payload; they can be used to facilitate parsing of the output in the case that html, rather than JSON, needs to be output to a web service.

Size Limits

The program needs to pre-allocate storage and one of the byproducts is that variables have to be coded with fixed lengths. These should be sufficiently generous for any reasonable needs. (Most can be changed by editing the "params" module and recompiling the source.) The limits are:-

VariableMax Length
tag name24 chars
filepath128 chars
htmlpath128 chars
mailpath128 chars
banner128 chars
title80 chars
username24 chars
password24 chars
Question Line4096 chars
A-Lines32000 chars
E-Lines256 chars
L-Lines256 chars
R-Lines256 chars
V-Lines128 chars
SC-Lines256 chars
Base1024 chars
"Give Up" Line32000 chars
Control Line256 chars
Failure Text1024 chars
number of Tags40
number of TagSets20
number of Edit Lines20
number of Logic Lines100
number of R/A-Line pairs150
number of V-Lines60
logic stack depth20 levels
math stack depth20 levels
A-line substitution depth10 levels
internal scratch memory6144 bytes


Development History

The KEYSEARCH project was started in January 2001 as the new Histology course was beginning. It was realized that we needed a way to extend the faculty's ability to support student study by providing better resources on the web. The first working version of KEYSEARCH was completed in May, 2001 and was tested in the 2002 Histology course. The results were very encouraging and Virginia Black and Ross Smith published a short paper in the proceedings of ELEARN, 2002 which described those results and the basics of KEYSEARCH. This was followed by a paper providing a technical description of KEYSEARCH itself:

Smith, P. Software Implementing Web-Based Study Questions for Medical Students. International Journal on E-Learning 2(2), 29-34 (2003)   [Abstract online at:   http://dl.aace.org/12680 ].

The version track for KEYSEARCH follows:-

Version tracking
v0.0 May-2001 Creation date: basic functionality
v1.0 Aug-2001 Started creating questions
v1.1 Mar-2002 Virtual memory added
v1.2 Apr-2002 Numerous functional enhancements + bug-fixes
v2.0 May-2002 Math lines added
v2.1 May-2003 Work lines added
v2.2 Jun-2003 Authentication added, plus services to support it
v2.3 Aug-2003 Addition of Keysearch "Lite"
v2.4a Sep-2003 Adaption to multiple platforms
v2.5 Dec-2003 A-line substitutions, better authentication
v2.6 Jan-2004 Wild-card searches, better logging
v2.7 Feb-2004 Better privacy, debug control, author tag
v2.8 May-2005 Status checks, string scans, HASH, bugs fixed
v2.9 Jul-2006 Alt RLF, hashing, new operators, robust logging
v2.10 Feb-2007 Cross-platform fixes, logicals, quiz examples
v3.0 Aug-2007 ALEX integration, new logical operators
v3.1 Oct-2007 Log encryption, "packed" substitutions.
v3.2 Jan-2008 Script insertion, multi-file copies.
v3.3 Jul-2008 IP-masking, alternate authentication, null strings.
v3.4 Dec-2012 Emphasis strings, output using "json" code.

License

KEYSEARCH is distributed according to the GNU General Public License, a copy of which should be included with the software distribution.


KEYSEARCH "Lite"

KEYSEARCH is complex, so it seemed wise to try to produce a version that would make it easy for individuals to program very simple questions. The result was "KEYSEARCH Lite" which provides very basic searching and a way to indicate the right and wrong answers. "Lite" requires a web-page set up that is essentially the same as regular KEYSEARCH: that is a requirement of using the web not a complication we have introduced. However, "Lite" builds an RLF using only four types of lines; "Question", "Right", "Wrong" and "Token"; provided there is either a "Right" or "Wrong" line, all the others are optional. It is that simple ...

Question: The question line contains the question asked (optionally).

Token: The token line contains the characters or phrase that is to be searched for in the student answer. If "Token" is omitted the letter "C" is searched for and the result will be seen as correct if it is found. Token can be more complex, such as a list of keywords separated by semicolons (e.g. "red; green; blue;".

Right: The response to be printed if the answer was right, which means that one of the strings defined in "Token" was found.

Wrong: The response to be printed if the answer was wrong, which means that none of the strings defined in "Token" were found.

"Lite" works by creating L-,R- and A-lines internally to produce the result expected given the description here. Since these lines are created internally, if you are using "Lite" you cannot use other L-,R- or A-lines in the RLF; you are strictly limited to the set described above. You can, however, use a "Control:" line with the single word "Response" in it to allow the student answer to be printed.

Example

Here is an example of an RLF for a simple question coded for KEYSEARCH "Lite".


Question: Who is the mayor of New York City?
Token: Bloomberg
Right: Yes, Mr. Bloomberg is indeed our mayor.
Wrong: No, that wasn't correct.  Mr. Bloomberg is the mayor. Prior \
   to him there was mayor Giuliani and before him mayor Dinkins.


If you skipped here from the top of the document, then go back now. Even if you've decided that "Lite" is what you need at this moment, skimming the document so that you know a bit more about what the full version of KEYSEARCH can do will be a big help. Because eventually you'll discover that "Lite" will limit how questions get constructed and then you'll need to use the full version of KEYSEARCH and deal with a bit more complexity.


KEYSEARCH Installation

KEYSEARCH is available on several platforms. So far Red Hat Linux is in active use, and MacOSX and OpenVMS have been tested and appear to work well.

The software is most conveniently obtained from its SourceForge repository using "git"; going forwards this will be the only source for up-to-date code. Once the local "git" repository is created the "git pull" command causes the latest updates to the software, including the new executables, to be copied to the local repository from SourceForge. This process saves the time involved in acessing the web page, downloading a ".gtz" archive and decompressing it.

Once you have downloaded the software, the easiest way to use it is to copy the pre-built executable to the cgi-bin area on the target machine. Read the platform-specific instructions in the folder for any additional instructions. If you wish to compile the FORTRAN and re-link, read the instructions at the beginning of the source plus the platform-specific notes. The main motivation for a re-compilation would be to change the size parameters for key variables (in the "params" module) or, more importantly, to change the default paths to the RLF, logs, server name, etc. (in the "setup" module). These two modules are found at the beginning of the source code: nothing else should need to be edited. The package should be self-contained and the code has been compiled successfully and runs on OpenVMS (AXP and I64), Tru64-UNIX, and Solaris and using the Absoft and Intel compilers for OSX and Red Hat Linux and I presume it will compile with the Absoft and Intel compilers for Windows.

The main source of potential problems is that of getting the permissions correct for the executable and the directories and paths to where the RLFs are read and the reviews are written. An experienced system manager for your platform can be invaluable for getting these issues resolved. Note also, that if JSON output is required KEYSEARCH uses an intermediate scratch file that it puts into /tmp/Keysearch/: the location of the scratch area can be fixed in the "params" module if necessary.

On UNIX the web server, usually Apache, generally runs under the account "nobody". So the log entries can only be written if "nobody" has owner or group write permission for the directories where logs are to be written: it isn't a good idea to allow "world" write to the log areas (although this may be what you have to have set to get it running initially.)

On OpenVMS with CSWS 2.1-1 and above (Apache v2.0-x) proper ACLs to provide (read+write+execute) access as appropriate are needed for ALL the files that KEYSEARCH uses, including the utility programs and scripts. This can be a bit challenging to achieve but HP's installation web page for CSWS is a big help. The result is a very secure installation of the software which should meet any server-side security requirements that might be needed for web-based student testing (securing desktops is another issue of course). Good luck!


KEYSEARCH Maintenance

Before starting out you need to understand the issues associated with maintaining material written to use "KEYSEARCH". There are, essentially, three levels with different degrees of "maintainer" sophisication needed for each level. We'll take as an example, a quiz written for a course that needs to be kept fresh and useful.

The base level is straight forward. To keep the quiz up-to-date the base-level maintainer needs to be able to create and refresh questions. In the examples provided we have a simple text file containing the quiz items drawn from an item bank and formatted so that a script can "explode" the file into the quiz component files that will be seen by the quiz scripts. The maintainer's tasks are, therefore, (1) to extract questions from the item bank; (2) to verify that the format of the items are "html-ready"; (3) to run the reformatting script provided by the original designer and (4) to adjust the introductory web pages to reflect any change in the number of quiz items, test duration and quiz start and stop dates. Maintainer skills (for these tasks) include the ability to run a script at the command line and to perform basic editing of a web page. While reading this document would be interesting and informative, the base-level maintainer doesn't really need to know anything about "Keysearch" per se and need not ever read this manual.

The designer level maintainer is an individual who creates scripts for new applications: a designer will have written the quiz that the base level maintainer works with. As such, the designer needs to understand the workings of "Keysearch": this is the individual for whom this document has been written.

The engineer level maintainer is someone who's goal it is to extend or repair the "KEYSEARCH" engine. An engineer would need to be a competent programmer who could read the code for the engine and understand how to modify, compile and link a new version. This will require time and expertise, but should be required very infrequently. The help of an engineer would be needed in the rare circumstance that the base maintainer came up with a new question type that could not be coded by the designer without a new language element being inserted into the engine. An engineer needs to read to code for guidance to its modification. For the most part, the code is well commented and is logcally set out, so no real issues should present themselves.


An Example

Lets look at the most complex of the examples above, which is sketched here. First the web page:-


List some of the animals likely to be found in NYC's better homes.<p>
<form method="POST"
      action="http://example-server.med.nyu.edu/cgi-bin/keysearch"
      target=test>
<input type="hidden" name="course" value="NYC-Life">
<input type="hidden" name="subject" value="Apartments">
<input type="hidden" name="qunn" value="2">
<textarea name="response" rows="5" cols="60" wrap="on">
Replace this text with your response to the study question.
Please make your response brief, but precise.
</textarea><br/>
<input type="submit" value=" Submit Response ">    
<input type="reset" value=" Clear Input ">
</form>

This code, when entered into a web page produces something that looks like this:-


List some of the animals likely to be found in the better homes in NYC.



The RLF that we decided to write is here:-

QN: List some of the animals likely to be found in NYC's better homes.
L1: 2; dog; cat; hamster; parrot
L2: 2; rat; (mouse; mice); squirrel
L3: 1; child; children;
R1: L1,M
A1: Cats, dogs, hamsters and parrots are common animals living in \
NYC apartments.
R2: L1,L2,A,M
A2: Rats, mice and squirrels are not that common, except as pets.
R3: L2,L1,N,A,M
A3: Really! in the better homes we have "nice" animals, not rodents. \
Think cats and dogs for heavens sake!
R4: L3
A4: We generally DON'T consider humans to be "animals" in the \
context of apartment dwellers.

This file is stored in the default location.

The page above is live, so you can enter information into it, as described above, to test how it works. Try constructing responses using the logic lines as a guide to elicit each answer.

© Phillip Ross Smith, 2001 - 2013