Num top ten roadmap goals:
Create more functionality -- Such as for rankings, histograms, outliers, modes, jarque, dpo, and larger numbers.
Create input-scrubbing capabilities -- Such as extracting numbers from mixed-type data, parsing numbers that look like currencies or percentages, handling missing data or malformed data, alerting if non-numbers are disrupting results, etc.
Upgrade options for input and output -- Add columns and rows, headers and labels, output formats for HTML/JSON/JSONB, Unicode symbols, etc.
Improve help -- Continue enhanching the Num website, tutorials, examples. Add an IRC channel for people who want help and want to contribute. Collect documentation as a PDF book focused on command-line statistics, because our experience shows this is an important way to drive adoption in larger organizations including academia and enterprises.
Optimize speed -- Use caching, memoization, heuristics, and input hinting.
Implement on more systems -- Build pure Mac OS X compatibility, and pure POSIX compatibility, and Cygwin compatibility for Windows. Package Num by using various package managers, including apt, brew, yum, etc.
Create appendable statistics, such as taking an input of an existing count and mean, and appending new numbers to the statistic s. We believe this is a killer feature for combining batch-oriented processing with stream-oriented processing. Our testing so far shows that additive statistics can give speed increases of 2x-5x for the real-world data we're using in real-world projects.
Implement in a fast compiled language -- We expect this make Num run 2x-5x faster, and also open up long term possibilities for advanced data structures. We hope to be able to jumpstart this by working with existing open source statistics programmers and code bases, such as datamash and qsort.
Encourage use of Num -- Such as working with teaching groups (e.g. edX, Khan Academy, Coursera, Udacity), coding groups (e.g. RedHat, Canonical, Apple, Google), and publishing groups (e.g. Amazon, O'Reilly, Pragmatic). These organizations can help us achieve the best success for the project, and can also help the most people.
Long term we want to advocate for Num to become a Unix command that is automatically installed on all Unix systems, much like common command line tools such as grep and sed. We want this to include working with system vendors and also with programmers who can make the code faster and better for cross-platform uses.
reject-* for positive, negative, zero, even, odd, unique.
possibly filters based on quantile, such as
input-header - tell Num to ignore the first line of input because has header labels.
output-header - tell Num to print a first line of output that has header labels.
printf - enable output string substitutions.
zero-termination - enable null-terminated data.
Other functions such as those in other math stats tools.
kurtosis. See http://www.johndcook.com/blog/skewness_kurtosis/
Research implementing the pivot by using Tukeys ninther, which is a median of medians, and may be a faster heuristic. See http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/
Research implementing the small size sort using Shell sort, which is similar to insertion sort yet better for typical data. See https://en.wikipedia.org/wiki/Shellsort
Research upgrading from single pivot to dual pivot, with the main benefit being faster speed because of fewer long scans.
Research sorting code on comp.lang.awk: https://groups.google.com/forum/#!searchin/comp.lang.awk/quicksort/comp.lang.awk/c6IFVx3nxgA/jJjZrKYh7aoJ
Real world examples to try:
Consider similar projects:
Heap sort: https://groups.google.com/forum/#!searchin/comp.lang.awk/quicksort/comp.lang.awk/mJ0EiUZTb-o/8-kR6wVe-F8J
Data Science at the Command Line: http://datascienceatthecommandline.com/