PICK, a program for analyzing contra dances -- Stan Swanson, January 2013; License: GPL PICK helps in the process of picking dances with specific moves or with a desired set of moves. It transforms dances from a dance card format to a spreadsheet format, one line per dance. Information available in columns of the spreadsheet includes title, author, type (improper, proper, etc.), a summary, and statistics columns definable by the user. Possible statistics columns include lists of who swings, who allemandes, direction of circles or stars (R,L), a count of wave figures, type of heys, out of minor set moves, etc. The order of occurence of a figure in a dance can be tabulated. An estimate of the piece count has been attempted. The dance cards are transformed into shortened symbols which are user definable. Usage and Output (default input dance file: "dances.txt", output: "out.csv") (example input dance file "select.dances", example output: "select.csv") PICK is a command line program. If the default file names are used for dances, dictionary, and analysis, only the program name need be typed: "> pick" (">" is the system command prompt). An alternative dance file can be specified as a single argument: "> pick mydances.in". or copied to the default file "dances.txt", or referenced by a link. Other options can be specified as: "> pick [options]". These options include "-h" to get help, and will be discussed below. Sample input files for dances.txt, dance-dict.txt, and dance-stat.txt are furnished with the distribution, and their formats are discussed in greater detail below. The most important output file is "out.csv" which is in the portable "comma-separated-variable" format which can be read into most spreadsheets (e.g. Open Office, Excel). The optional information output file "info.out" contains five-line listings of the dances along with two summaries and debug and diagnostic messages (helpful if the interpretation seems weird). "info.out" reproduces the counts of output symbols seen on the terminal screen as the program finishes. These counts give you an idea of the frequency of figures and qualifiers (who, direction, etc.) in a collection of dances. We now discuss the presentation of the analysis as seen in the example spreadsheet, "select.csv", generated from "select.dances", using "select-dict.txt" and "select-stat.txt". We comment on the columns seen in "select.csv" under the column headers following "title" and "author". "type" gives a single character code for the dance type (I P B ...) and an indication of double or triple progression (2 3) "summary" is the dance reduced to a abbreviated code (usually between 40 and 100 characters). Figures are separated by commas within a 16 beat section. Semicolons separate A1: from A2: and so on. The persons doing the figure are given first in caps (P = partner, N = neighbor, 1 = actives, W = ladies, etc.), then a lower case abbreviation for the figure (sw = swing, a = allemande, o = circle, * = star, etc.), followed by the direction (-L = left, -R = right, -x = across, etc.), and finally a digit giving the amount of turning in quarters ( 3 = 3/4, 6 = 6/4 (1 1/2), etc.). These abbreviations are given in the file dance-dict.txt and are user definable. The seventeen columns between the explicitly blank columns headed by "_" and before the columns headed by "pieces" contain the position(s) in the dance of the figures indicated by the column headings ("sw a o oR ... out rare). These columns were generated with the statistic "order". Note that sometimes a number is absent from the sequence in a given row. This may happen if pseudo-phrase does not contain a figure, or if the figure has not been specified in any of the "order" statements. "sw" says where swings occur. For example. in "The Baby Rose", "' 1 4" tells us that there are two swings in the dance, one at the beginning. "a" says whether and where allemandes occur. "o" and "*" are for circles and stars. "oR" is a more specific indication for "circle right". See "select-stat.txt" for how this was done. ... ... ... "out" are moves out of the minor set, like "contra corners", shadow interactions, a chain or "R&L trhu" on a diagonal. "rare" attempts to catalog infrequent moves like "contra corners", "petronella", "mad robin", or "orbit". "pieces" Two columns, the first of which reproduces any explicit estimate of the piece count from the title line ("#..."). The second is pick's calculation according to the scheme discussed in the section "STATISTICS and ANALYSIS". Following the "pieces" columns are other statistics on the dance figures, illustrating the statistics "qualify count first-char text short group". "swing" tells which persons swing (P = partners, N = neighbors, S = shadow) Uses statistic "qualify". "allem" tabulates who does the allemande (M = men, W = ladies, ...). "circle" and "star" indicates the direction (L = left, R = right). "ch_rl_pr" counts the number chains (ladies or otherwise), R&L's, and promenades. Uses statistic "count". "lines" counts LLFB, or just LL. "D/R" counts the traditional "down 4 in line", "return", etc. These can be hard to determine from dance card descriptions. More discussion of this problem later. "dosido" gives dosido (d), gypsy (g), and seesaw (z). "hey" indicates the type of hey (h = full, 2 = half, g = gypsy, ... ). "wave" counts the appearance of the string "wav" in the dance description. "rare" indicates the appearance of specific (unusual) figures (e.g. bfy = butterfly whirl, pet = petronella, cc = contra corners). "out** " attempts to flag out of minor set interactions by looking for shadow (S), diagonal (\), contra corners (c), grand (g) as in grand chain or grand right and left. We have found this sort of figure particularly troubling for beginners. "hwro" uses the "group" statistic which reports whether specified columns are non-blank. (1 = hey, 2 = wave, 3 = D/R, 4 = out**, 0 = none of these, 12 = both wave and hey, etc.). Sorting on this column results in relatively large groupings of "similar" dances. Simple, more or less glossary, dances are indicated by "0". "*cD" is another "group" reporting stars (1 = *), (2 = chain, r&l, or promenade), and (3 = D/R [down and return] ). "synopsis" is another single character representation of figures specified with the "short" statistic. Unspecified (presumably low frequency) figures appear as "X", "title" repeats the title for reference with the statistics columns. "AA BB AABB" [appears optionally with the command line flag "-e"] An attempt to give sorting keys for the unique figures in A1+A2, B1+B2, and in the whole dance, coded as the first character of the figure name. Some ambiguity occurs. It was hoped that these sorts would give groups of dances with similar moves and structures, but the reality is that they are more diverse than that. Groupings are better found with the statistic "group", to be discussed below. "0" the last column is the original order, so that a sort on this column will restore the spreadsheet to its original dance card sequence. INPUT FORMAT FOR DANCES (sample file: dances.txt) The format of dances in the input file "dances.txt" is easily convertible from files specifying figures for A1:, A2:, B1:, and B2: The program works fairly well with existing dance cards (tested on several collections of 130, 200, and 440 dances). In some cases the syntax can be sharpened to help with the analysis. Of particular difficulty are waves and balances, and the traditional "down 4 in line" with a variety of return possibilities. If the cards are in MS Word format, they can be exported in .txt format and massaged slightly with a text editor. A comparison of the summaries with the original input helps to proof read dance cards, as will an examination of the alphabetized file "occ.out" generated by the "-w" option. Title, author, and dance type information is expected by the program to be on a single line starting with the character "=". Title and author are separated by a space delimited dash, " - ". Two words are taken from the author field, then the rest of the line is scanned for words like "proper, improper, mixer, circle, triplet, Sicilian, longways, x face x, square", and for double or triple progression. Capital letters are preserved in the title and author, but ignored for type and progression. Some abbreviations are recognized (imp prop prog). We have added an option for "#nn" on this line, where "nn" is the piece count, since estimation of "nn" is hard. Other lines in the file describe either dance figures or are ignored as comments. Comment lines start with the character "%" in the first column. Lines between the title lines and A1: and after the B2: sequence are also treated as comments, even without the "%". The sequence of lines containing A1:, A2:, B1:, and B2: is taken as dance information, until broken by a blank line or a comment line following B2:. The section markers A1:, A2:, B1:, and B2: must have three characters, the first being "A" or "b" (in either case), the second being a digit ("1" or "2"), and the last being a punctuation character. Within the dance description, instructions or hints enclosed within parentheses "(...)" are ignored. The description is broken into pseudo phrases delineated by commas, semicolons, or ends of lines. Words are separated by white space or by one of the separators " , ; end-of-line ( ) ". Other punctuation is left as written, with the possibility of being incorporated into strings used in the analysis (e.g. "3/4" "R&L"). For example, =Sample Dance - anony mous improper A1: circle left 1x; ladies chain (to partner) A2: hey (ladies start, R shoulder) B1: partner balance and swing B2: circle left 3/4 balance the ring, pass thru (to next neighbor) is summarized as o-L4, W ch; hey; P bal_sw; o-L3, ring_b, pass Note that the amount of a circle, star, or allemande is given by the number of fourths: "4" means 4/4 or once around, "3" means 3/4. THE DANCE DICTIONARY (default file: dance-dict.txt) (example file: select-dict.txt) The words or word fragments which are extracted from the input text are specified in the file "dance-dict.txt". A sample dictionary is furnished, but it can be customized by the user. To get a count of the unique "words" in a dance file, the program has the option "-w" which counts separate words and fragments, putting the information in "word.out" and "occ.out". Separating characters are space, tab, end-of-line, comma, semicolon, and parentheses. Punctuation other than separating characters and the comment flag "%" is left as written and can be part of a word (or string). For example: colon (e.g. A1:), slash (e.g. 3/4), ampersand (e.g. R&L) The general structure of a dictionary line is outsymbol # category = insymbol[s] e.g. P # who = partner ptr p % multiple inputs all go to "P" sw # action = swing sw " appearing in the dance is put in the spreadsheet cell for that dance. out** # first-char = S \ cc gr-ch gr-rl gr-RL We take the occurrence of "shadow", "diagonal", "contra corners" or a "grand chain" or "grand right and left" as indicating an out of the minor set interaction (there may be other markers). We find these figures hard for beginners. hwro # group = 31 32 33 34 Reports whether the columns designated on the right are non-blank. Column 31 is non-blank if there is a hey, column 32 is for waves, column 33 is for "rare" movements, and column 34 is for the out-of-minor-set figures. A new statistic "order" has been added, which records the position in the dance where a figure occurs. The syntax is similar to that for "qualify": swing # order = sw > sw % records when any swing occurs o* # order = o * > o * % occurrences of circles and stars oR # order = o > R % only circle rights The first part of "select-stat.txt" gives position or sequence of various moves in the dances, using the "order" statistic The "position" in the dance is only approximate, since the descriptions do not adhere to a rigid format from which 8 beat phrases can be deduced. The second part illustrates various counts and other statistics. Strain boundaries (16 beats) are defined by the sections A1:, etc. But the number of pseudo moves or figures within a strain is not restricted (e.g. wave balances and allemandes, hey hints, down and return (with turns)). We have taken the divisions indicated by commas, semicolons, and line ends to define pseudo phrases. We count these from the beginning of the dance. The first move will always be "1", and others will be in sequence. Occasionally a pseudo phrase will not contain an "action" and will not appear in the sequence. Some moves like "balance and swing" will be counted as one move, even though they take 16 beats. Another requested statistic is piece count, similar to that given in Zesty Contras by Larry Jennings. He states "piece count is subjective and dependent on the locale". Moreover, we have the problems discussed above with the order position, and problems of combining identical or similar moves (such as a Ladies chain; Ladies chain back). A preliminary implementation is given with the statistic "pieces". pc # pieces = ret % plus other flags This produces two columns. The first contains any title line strings in the format #nnxxx" as in Zesty Contras. The "#" is mandatory, "nn" is a one or two digit number, and "xxx" are any addition letter codes, to a limit of ten characters. The second contains our naive estimate of the piece count of the dance. Any strains with a single figure (e.g. balance and swing, hey) have one piece. Any strains with only one type of figure (e.g. circle L, circle R) are also one additional piece. Finally, any strain containing any of the flags to the right of "=" counts as one piece. In this example, "ret" signifies "return" so that "down 4-in-line, turn alone; return, bend the line" counts as one piece, even though it nominally has four. RUNNING and COMPILING the PROGRAM In addition to "> pick" with all default file names and "> pick dances", other options can be specified. All output goes to the default files: csv.out, info.out, words.out, and occ.out, These default output file names cannot be changed from the command line. The resulting files must be renamed to preserve them. I still STRONGLY recommend making backup copies of your dance archive. Command line: "> pick [options]". These options include "-h" to get help and exit. "-s your_statistics_file" reads an alternative statistics file. "-d your_dictionary_file" reads an alternative dictionary. "-i" prints additional parsing and debug information to "info.out". "-w" tabulates word/fragments, and writes sorted output to default files "words.out" and "occ.out". Useful for proof reading the dance file and for modifying your dictionary. "-e" enables experimental features. At this writing, this is the "AA BB AABB" single character figure lists as discussed above. "-b [nn]" prints detailed parsing information for the first "nn" lines of input. Output to the info file. The command line "> pick {options] dances" also works and should pose no danger to the file "dances", since one can not redefine output files. Alternatively, one can copy or rename input files to the default names, or use soft links, "> ln -s mydict.txt dance-dict.txt", in Linux or MAC OS. [What is the equivalent in Windows???] -------------------- The distribution can be downloaded as a single ZIP file, or individually downloadable files. It is more thoroughly described in the README file. This includes executables for Linux (Fedora), Mac OS 10.6 (Snow Leopard), and Windows (compiled with gcc in MinGW, which run on 2000 XP, Vista Home, and Windows 7 Starter), along with the compilable code in C. Compilation is straight forward: "> gcc -g -o pick pick.c". There are a number of spreadsheet tricks involving sorting of the entire sheet (or parts thereof), cutting and pasting, which I am still learning. Suggestions are solicited, along with comments on the program and on the dictionary and statistics files.