
NEWS 
====

Versioning
----------

Releases will be numbered with the following semantic versioning format:

<major>.<minor>.<patch>

And constructed with the following guidelines:

* Breaking backward compatibility bumps the major (and resets the minor 
  and patch)
* New additions without breaking backward compatibility bumps the minor 
  (and resets the patch)
* Bug fixes and misc changes bumps the patch


CHANGES IN qdap VERSION 1.1.0
----------------------------------------------------------------

A version bump necessary for Re-Sumbission to CRAN.  

CHANGES

*  Down graded the version requirement for the reports package to 
  reports (>= 0.1.2) in order to upload to CRAN.  reports (>= 0.2.0) is not yet
  available on CRAN.

CHANGES IN qdap VERSION 1.0.0
----------------------------------------------------------------

The word lists and dictionaries in `qdap` have been moved to `qdapDictionaries`. 
Additionally, many functions have been renamed with underscores instead of the 
former period separators.  These changes break backward compatibility.  Thus 
this is a **major** release (ver. 1.0.0).

It is the general practice to deprecate functions within a package before 
removal, however, the number of necessary changes in light of qdap being 
relatively new to CRAN, made these changes sensible at this point.


BUG FIXES

* `qheat`'s  argument `by.column = FALSE` resulted in an error.  This behavior 
  has been fixed.

* `question_type` did not work because of changes to `lookup` that did not 
  accept a two column matrix for `key.match`.  See GitHub issue #127 for more.

* `combo_syllable.sum` threw an error if the text.var contained a cell with an 
  all non-character ([a-z]) string.  This behavior has been fixed.

* `todo` function created by `new_project` would not report completed tasks if 
  `report.completed = TRUE`.

* `termco` and `termco.d` threw an error if more than one consecutive regex 
  special character was passed to `match.list` or `match.string`.  See GitHub 
  issue #128 for more. 

* `trans.cloud` threw an error if a single list with a named vector was passed 
  to `target.words`.  This behavior has been fixed.

* `sentSplit` now returns the "tot" column when `text.place = "original"`.  

* `all_words` output dataframe FREQ column class has been changed from factor to 
  numeric.  Additionally, the WORDS column prints using `left.just` but retains
  traditional character properties (print class added).  `all_words` also picks
  up apostrophe.remove and ldots (for `strip`) arguments.

* `gantt_plot` did not handle `fill.vars`, particularly if the fill was nested 
  within the `grouping.vars`.  This behavior has been fixed with corresponding 
  examples added.

* `url_dl` - Downloaded an empty file when not using a dropbox key.  This 
  behavior has been fixed.

* The `cm_code.` family of functions had a bug in the output due to 
  `cm_long2dummy` and `cm_dummy2long`'s handling of stretching spans.  This has 
  been corrected.

* `cm_code.exclude` did not output the correct excluded spans.  This behavior
  has been corrected.

* The use of `comment` to convey object characteristics has been replaced with 
  the use of `class`.

* `question_type` did not include question words ending in 'd as part of the 
  category.  For instance "How'd you like it?" was not classified as a how 
  question.

* `beg2char` would not include the `char` if `include = TRUE` and `noc = 1`.

* `cm_range2long` returned `NA`s for vectors containing multiple single values.  
  See GitHub issue #144 for more.

* `termco` family of functions did not handle `NA` values.  This has been fixed. 
  (Matt Williamson) See GitHub issue #147 for details.

* `pos` threw an error for vectors of length 1.  This has been fixed. (Kurt 
  Hornik) See GitHub issue #150 for details.

* `formality` threw an error for vectors of length 1.  This has been fixed. (Kurt 
  Hornik) See GitHub issue #151 for details.

NEW FEATURES

*  The `cm_xxx2long` family of functions (`cm_df2long`, `cm_range2long` and 
  `cm_time2long`) now have a generic wrapper, `cm_2long`, to generate the long
  formats.

* `hash_look` (and `%ha%`) a counterpart to `hash` added to allow quick access 
  to a hash table.  Intended for use within functions or multiple uses of the 
  same hash table, whereas `lookup` is intended for a single external (non 
  function) use which is more convenient though could be slower.

* `boolean_search`, a Boolean term search function, added to allow for indexed 
  searches of Boolean terms.

* `trans_context` is a printing function desired to grab the context (n rows 
  before and after) an event (an index from a vector of indices).  The function 
  prints the indices around the episode from a transcript to the console or a 
  .csv, .xlsx, .txt, or .doc file. 

* `colpaste2df` is a wrapper for `paste2` that pastes dataframe columns together 
  and outputs a dataframe.

* `colcomb2class` quickly combines columns for  number of qdap classes including 
  output from: `termco`, `question_type`, `pos_by`, adn 1character_table`.

* `lview` a function to unclass a list output that has a special print method 
  that returns only a portion of the output.  `lview` reclasses to "list".

* `word_cor` added to find words within grouping variables that are associated
  based on correlation.

* `tm2wfm` a function to convert `"TermDocumentMatrix"` and 
  `"DocumentTermMatrix"` to a `wfm` added to allow easier integration with the 
  `tm` package.

* `apply_as_tm` a function to allow functions intended to be used on the `tm` 
  package's `TermDocumentMatrix` to be applied to a `wfm` object.

* `tm_corpus2df` and `df2tm_corpus` added to convert a tm package corpus to a 
  dataframe for use in qdap or vice versa.

* `tdm` and `dtm` are now truly compatable with the `tm` package.  `tdm` and 
  `dtm` produce outputs of the class `"TermDocumentMatrix"` and 
  `"DocumentTermMatrix"` respectively.  This change (coupled with the renaming 
  of `stopwords` to `rm_stopwords`) should make the two packages logical 
  companions and further extend the qdap package to integrate with the many 
  packages that already handle `"TermDocumentMatrix"` and 
  `"DocumentTermMatrix"`.

* `cm_distance` now uses resampling of data from the null model to generate
  pvalues for the mean code distances.  Useful for determining if an association 
  (small distance) between codes is likely to happen if the null is true.

* `word_proximity` added to compliment `dispersion_plot` and `word_cor` 
  functions.  `word_proximity` gives the average distance between words in 
  the unit of sentences.

MINOR FEATURES

* `url_dl` now takes quoted string urls supplied to ... (no url argument is 
  supplied)

* `condense` is a function that condense dataframe columns that are a list of 
  vectors to a single vector of strings.  This outputs a dataframe with 
  condensed columns that can be wrote to csv/xlsx.

* `mcsv_w` now uses `condense` to attempt to attempt to condense columns that are 
  lists of vectors to a single vector of strings.  This adds flexibility to 
  `mcsv_w` with more data sets.  `mcsv_w` now writes lists of dataframes to 
  multiple csvs (e.g., the output from `termco` or `polarity`).  `mcsv_w` picks
  up a dataframes argument, an optional character vector supplied in lieu of 
  \ldots that grabs the dataframes from an environment (default id the Global
  environment).

* `ngrams` now has an argument ellipsis that passes further arguments supplied 
  to `strip`

* `dtm` added to compliment `tdm`, allowing for easier integration with other R 
  packages that utilize tdm/dtm.

* `dir_map` picks up a `use.path` argument that allows the user to specify a 
  more flexible path to the created pre-formed `read.transcript` scripts based 
  on something like `file.path(getwd(), )`.  This means portability of code on 
  different machines.

* `polarity_frame` a function to make a hash environment lookup for use with the 
  `polarity` function.

* `DATA.SPLIT` a `sentSplit` version of the `DATA` dataset has been added to 
  qdap.

* `gantt_plot` accepts NULL for `grouping.var` and figures for "all" rows as a 
  single grouping var.

* `replace_number` now handles 10^47 digits compared to 10^14 previously.

* The `new_project` function gains a `github` argument that optionally sends the 
  repo to GitHub public account upon creation.

* `qheat`, `polarity.plot` and `formality.plot` pick up the argument `plot` 
  which optionally suppresses the plotting.  This is useful if the user is 
  operating in knitr, sewave, etc. and wishes to alter/add onto the plot.

* `lookup` now takes `missing = NULL`.  This results in the original values in
  `terms` corresponding to the missing elements being retained.

* `cm_time.temp` picks up a `grouping.var` argument that works similarly to 
  `cm_range.temp`'s `grouping.var`.  `cm_time.temp` also takes hour values for
  `start` and `end` as in `end = "01:22:03"`.

* `gantt_rep` picks up a generic `plot` method.

* Functions in the `cm_code.xxx` and `cm_xxx2long` pick up a generic plot method
  that utilizes `gantt_wrap` to plot a Gantt plot of the span data.

* Functions in the `cm_code.xxx` and `cm_xxx2long` pick up a generic summary 
  method.  This summary method has its own plot method that utilizes `qheat` to 
  plot a heatmap of the summary statistics.  The generic print method 
  (`print.sum_cmspans`) is useful for output intended for publication.

* `qheat` picks up a `facet.vars` argument that allows a character vector of 
  length 1 or 2 to facet by.

* `question_type` gives the indices of questions via `$inds`.

* `colsplit2df` not splits multiple columns to match the capabilities of 
  `colpaste2df`.

* `sentSplit` now handles repeated measures and picks up a turn of talk plot 
  method.

* `tot_plot` now handles repeated measures and `grouping.var` to be nested 
  within the turn of talk.

* `wfm` now uses `mtabulate` and is ~10x faster.

* `plot.polarity` gains arguments for optional error bars using the standard 
  error of the mean polarity.

* `exclude` now works with `wfm` and the `tm` package's `DocumentTermMatrix` and
  `TermDocumentMatrix` classes.

* `rm_url` removes/repalces URLs in a string(s).

CHANGES

* The dictionaries and word lists for qdap have been moved to their own package, 
  `qdapDictionaries`.  This will allow easier access to these resources beyond 
  the qdap package as well as reducing the overall size of the qdap package.  
  Because this is a major change that make break the code of some users the 
  major release number has been upped to 1.  The following name changes have 
  occurred:

    - increase.amplification.words -> became -> amplification.words

    - The deamplification.words and env.pol wordlist and dictionary were added as 
        well.

* qdap gains an HTML package vignette to better explain the intended work flow 
  and function use for the package.  This is not currently a part of the build 
  but can be accessed via:

  http://htmlpreview.github.io/?https://github.com/trinker/qdap/blob/master/vignettes/qdap_vignette.html

  *Note* that the vignette may include development version functions not yet 
  available in the current CRAN version

* `polarity` utilizes a new, unbounded algorithm based on weighting to determine 
  polarity.

* `gantt_wrap` no longer accepts unquoted strings to the `plot.var` argument.

* `cm_df.temp` loses the logical `csv` argument.  `file.name` have been replaced 
  with `file` to fit conventional R naming schemes.

* The plotting feature of `gantt` has been removed and a `plot` method has been 
  added.  The user can plot the output from `gantt` in `base` or `ggplot2` 
  graphics.

* `cm_time2long` loses the argument `start.end` to ensure that the `cmspans` 
  class produced would operate as expected.

* Most exported functions utilizing a period separator have been replaced with 
  underscore named versions.

* `wf_combine` renamed `wfm_combine` to be consistent.

* `question_type` algorithm improvements including implied do/does/did handling.

* `list2df` and `mtabulate` now exported.

* `stopwords` has been renamed to `rm_stopwords`(`rm_stop` shorthand) to better 
  fit what the action the function performs and to avoid conflicts with the 
  `tm` package.

* `replace_number`'s `num.paste` becomes logical rather than character input.
  This makes use easier as the user doesn't need to remember arguments.

* `matrix2df` added (under `list2df`) to convert rownnames of matrix to a 
  dataframe column.


CHANGES IN qdap VERSION 0.2.5
----------------------------------------------------------------

Patch release.  This version deals with the changes in the `openNLP` package 
  that effect qdap.  Next major release scheduled after `slidify` package is 
  pushed to CRAN.

qdap 0.2.3
----------------------------------------------------------------
BUG FIXES

* `new_project` placed a report in the CORRESPONDENCE directory rather than 
  CONTACT_INFO

* `strip` would not allow the characters "/" and "-" to be passed to 
  `char.keep`.  This has been fixed. (Jens Engelmann)

* `beg2end` would only grab first character of a string after n -1 occurrences of 
  the character.  For example: 
  `beg2char(c("abc-edw-www", "nmn-ggg", "rer-qqq-fdf"), "-", 2)` resulted in
  "abc-e" "nmn-g" "rer-q" rather than "abc-edw" "nmn-ggg" "rer-qqq"

NEW FEATURES

* `names2sex` a function for predicting gender from name.

* Added `NAMES` and `NAMES_SEX` datasets, based on 1990 U.S. census data.

* `tdm` added as an equivalent to TermDocumentMatrix from the tm package.  This 
  allows for portability across text analysis packages.

MINOR FEATURES

* `mgsub` now gets a `trim` argument that optionally removes trailing leading 
  white spaces.

* `lookup` now takes a list of named vectors for the key.match argument.

CHANGES

* `new_project` directory can now be transferred without breaking paths (i.e.,
  `file.path(getwd(), "DIR/file.ext")` is used rather than the full file path).


CHANGES IN qdap VERSION 0.2.2
----------------------------------------------------------------

BUG FIXES

* `genXtract` labels returned the word "right" rather than the right edge string.
  See http://stackoverflow.com/a/15423439/1000343 for an example of the old 
  behavior.  This behavior has been fixed.

* `gradient_cloud`'s `min.freq ` locked at 1.  This has been fixed. (Manuel 
  Fdez-Moya)

* `termco` would produce an error if single length named vectors were passed to 
  match.list and no multi-length vectors were supplied.  Also an error was thrown 
  if an unnamed multi-length vector was passed to `match.list`.  This behavior has 
  been fixed.

NEW FEATURES

* `tot_plot` a visualizing function that uses a bar graph to visualize patterns 
  in sentence length and grouping variables by turn of talk.

* `beg2char` and `char2end` functions to grab text from beginning of string to a
  character or from a character to the end of a string.

* `ngrams` function to calculate ngrams by grouping variable.

MINOR FEATURES

* `genX` and `bracketX` gain an extra argument `space.fix` to remove extra 
  spaces left over from bracket removal.

* Updated out of date dropbox url download in `url_dl`.  `url_dl` also takes the 
  dropbox key as well.

CHANGES

* qdap is now compiled for mac users (as openNLP now passes CRAN checks with no
  Errors on Mac).

CHANGES IN qdap VERSION 0.2.1
----------------------------------------------------------------

BUG FIXES

* `word_associate` colors the word cloud appropriately and deals with the error 
  caused by a grouping variable not containing any words from 1 or more of the 
  vectors of a list supplied to match string

* `trans.cloud` produced an error when expand.target was TRUE.  This error has 
  been eliminated.

* `termco` would eliminate > 1 columns matching an identical search.term found 
  in a second vector of match.list.  termco now counts repeated terms multiple 
  times.

* `cm_df.transcript` did not give the correct speaker labels (fixed).

NEW FEATURES

* `gradient_cloud`: Binary gradient Word Cloud - A new plotting function 
  that plots and colors words for a binary variable based on which group of 
  the binary variable uses the term more frequently.

* `new_project`: A project template generating function designed to increase 
  efficiency and standardize work flow.  The project comes with a .Rproj file 
  for easy use with RStudio as well as a .Rprofile that makes loading and sourcing 
  of packages, data and project functions.  This function uses the reports package
  to generate an extensive reports folder.


MINOR FEATURES

* `stemmer`, `stem2df` and `stem.words` now explicitly have the argument 
  `char.keep` set to "~~" to enable retaining special character formerly stripped 
  away.

* `hms2sec`: A function to convert from h:m:s format to seconds.

* `mcsv_w` now takes a list of data.frames.

* `cm_range.temp` now takes the arguments text.var and grouping.var that will 
  automatically output these (grouping.var) columns as range coded indices.

* `wfm` gets as speed boost as the code has been re-written to be faster.

* `read.transcript` now reads .txt files as well as text similar to read.table.

CHANGES

* `sec2hms` is the new name for `convert` 

* `folder` and `delete` have been moved to the reports package which is imported 
  by qdap.  Previously `folder` would not generate a directory with the 
  time/date stamp if no directory name was given; this has been fixed, though 
  the function now resides in the reports package.

CHANGES IN qdap VERSION 0.2.0
----------------------------------------------------------------

* The first installation of the qdap package

* Package designed to bridge the gap between qualitative data and quantitative 
  analysis
