Basic Usage
The goal for this package is to make most tables extremely easy to assemble on the fly. In the next few sections, I'll demonstrate some of the basic usage, primarily using several convenience functions that make it easy to construct common tables. However, these functions are a small subset of what TexTables
is designed for: it should be easy to programmatically make any type of hierarchical table and and print it to LaTeX. For more details on how to easily roll-your-own tables (or integrate LaTeX tabular output into your own package) using TexTables
, see the Advanced Usage section below.
Making A Table of Summary Statistics
Let's download the iris
dataset from RDatasets
, and quickly compute some summary statistics.
julia> using RDatasets, TexTables, DataStructures, DataFrames
julia> df = dataset("datasets", "iris");
julia> summarize(df)
| Obs | Mean | Std. Dev. | Min | Max
------------------------------------------------------
SepalLength | 150 | 5.843 | 0.828 | 4.300 | 7.900
SepalWidth | 150 | 3.057 | 0.436 | 2.000 | 4.400
PetalLength | 150 | 3.758 | 1.765 | 1.000 | 6.900
PetalWidth | 150 | 1.199 | 0.762 | 0.100 | 2.500
Species | | | | |
If we want more detail, we can pass the detail=true
keyword argument:
julia> summarize(df,detail=true)
| Obs | Mean | Std. Dev. | Min | p10 | p25 | p50 | p75 | p90 | Max
----------------------------------------------------------------------------------------------
SepalLength | 150 | 5.843 | 0.828 | 4.300 | 4.800 | 5.100 | 5.800 | 6.400 | 6.900 | 7.900
SepalWidth | 150 | 3.057 | 0.436 | 2.000 | 2.500 | 2.800 | 3.000 | 3.300 | 3.610 | 4.400
PetalLength | 150 | 3.758 | 1.765 | 1.000 | 1.400 | 1.600 | 4.350 | 5.100 | 5.800 | 6.900
PetalWidth | 150 | 1.199 | 0.762 | 0.100 | 0.200 | 0.300 | 1.300 | 1.800 | 2.200 | 2.500
Species | | | | | | | | | |
We can restrict to only some variables by passing a second positional argument, which can be either a Symbol
or an iterable collection of symbols.
The summarize function is similar to the Stata command summarize
: it reports string variables all entries missing, and skips all missing values when computing statistics.
To customize what statistics are calculated, you can pass summarize
a stats::Tuple{Union{Symbol,String},Function}
(or just a single pair will work too) keyword argument:
# Quantiles of nonmissing values (need to collect to pass to quantile)
julia> nomiss(x) = skipmissing(x) |> collect;
julia> new_stats = ("p25" => x-> quantile(nomiss(x), .25),
"p50" => x-> quantile(nomiss(x), .5),
"p75" => x-> quantile(nomiss(x), .75));
julia> summarize(df, stats=new_stats)
| p25 | p50 | p75
------------------------------------
SepalLength | 5.100 | 5.800 | 6.400
SepalWidth | 2.800 | 3.000 | 3.300
PetalLength | 1.600 | 4.350 | 5.100
PetalWidth | 0.300 | 1.300 | 1.800
Species | | |
Stacking Tables
It's easy to stack two tables that you created at different parts of your code using calls to hcat
or vcat
:
julia> t11 = summarize(df, :SepalLength)
| Obs | Mean | Std. Dev. | Min | Max
------------------------------------------------------
SepalLength | 150 | 5.843 | 0.828 | 4.300 | 7.900
julia> t21= summarize(df, :SepalWidth)
| Obs | Mean | Std. Dev. | Min | Max
-----------------------------------------------------
SepalWidth | 150 | 3.057 | 0.436 | 2.000 | 4.400
julia> t12 = summarize(df, :SepalLength, stats=new_stats)
| p25 | p50 | p75
------------------------------------
SepalLength | 5.100 | 5.800 | 6.400
julia> t22 = summarize(df, :SepalWidth, stats=new_stats)
| p25 | p50 | p75
-----------------------------------
SepalWidth | 2.800 | 3.000 | 3.300
julia> tab = [t11 t12
t21 t22]
| Obs | Mean | Std. Dev. | Min | Max | p25 | p50 | p75
------------------------------------------------------------------------------
SepalLength | 150 | 5.843 | 0.828 | 4.300 | 7.900 | 5.100 | 5.800 | 6.400
SepalWidth | 150 | 3.057 | 0.436 | 2.000 | 4.400 | 2.800 | 3.000 | 3.300
You can also group statistics together with a call to the function join_table
. This constructs a new table with a column multi-index that groups your data into two column blocks.
julia> join_table( "Regular Summarize" =>vcat(t11, t21),
"My Detail" =>vcat(t12, t22))
| Regular Summarize | My Detail
| Obs | Mean | Std. Dev. | Min | Max | p25 | p50 | p75
------------------------------------------------------------------------------
SepalLength | 150 | 5.843 | 0.828 | 4.300 | 7.900 | 5.100 | 5.800 | 6.400
SepalWidth | 150 | 3.057 | 0.436 | 2.000 | 4.400 | 2.800 | 3.000 | 3.300
There is an analogous function for creating multi-indexed row tables append_table
. You can see it in action with a call to the function summarize_by
, which calculates summary statistics by grouping on a variable.
julia> c1 = summarize_by(df, :Species, [:SepalLength, :SepalWidth])
| | Obs | Mean | Std. Dev. | Min | Max
-------------------------------------------------------------------
setosa | SepalLength | 50 | 5.006 | 0.352 | 4.300 | 5.800
| SepalWidth | 50 | 3.428 | 0.379 | 2.300 | 4.400
-------------------------------------------------------------------
versicolor | SepalLength | 50 | 5.936 | 0.516 | 4.900 | 7.000
| SepalWidth | 50 | 2.770 | 0.314 | 2.000 | 3.400
-------------------------------------------------------------------
virginica | SepalLength | 50 | 6.588 | 0.636 | 4.900 | 7.900
| SepalWidth | 50 | 2.974 | 0.322 | 2.200 | 3.800
julia> c2 = summarize_by(df, :Species, [:SepalLength, :SepalWidth],
stats=new_stats)
| | p25 | p50 | p75
-------------------------------------------------
setosa | SepalLength | 4.800 | 5.000 | 5.200
| SepalWidth | 3.200 | 3.400 | 3.675
-------------------------------------------------
versicolor | SepalLength | 5.600 | 5.900 | 6.300
| SepalWidth | 2.525 | 2.800 | 3.000
-------------------------------------------------
virginica | SepalLength | 6.225 | 6.500 | 6.900
| SepalWidth | 2.800 | 3.000 | 3.175
Now, when we horizontally concatenate c1
and c2
, they will automatically maintain the block-ordering in the rows:
julia> final_table = join_table("Regular Summarize"=>c1, "My Detail"=>c2)
| | Regular Summarize | My Detail
| | Obs | Mean | Std. Dev. | Min | Max | p25 | p50 | p75
-------------------------------------------------------------------------------------------
setosa | SepalLength | 50 | 5.006 | 0.352 | 4.300 | 5.800 | 4.800 | 5.000 | 5.200
| SepalWidth | 50 | 3.428 | 0.379 | 2.300 | 4.400 | 3.200 | 3.400 | 3.675
-------------------------------------------------------------------------------------------
versicolor | SepalLength | 50 | 5.936 | 0.516 | 4.900 | 7.000 | 5.600 | 5.900 | 6.300
| SepalWidth | 50 | 2.770 | 0.314 | 2.000 | 3.400 | 2.525 | 2.800 | 3.000
-------------------------------------------------------------------------------------------
virginica | SepalLength | 50 | 6.588 | 0.636 | 4.900 | 7.900 | 6.225 | 6.500 | 6.900
| SepalWidth | 50 | 2.974 | 0.322 | 2.200 | 3.800 | 2.800 | 3.000 | 3.175
Tabulate Function
TexTables
also provides a convenience tabulate
function:
julia> tabulate(df, :Species)
| Freq. | Percent | Cum.
---------------------------------------
setosa | 50 | 33.333 | 33.333
versicolor | 50 | 33.333 | 66.667
virginica | 50 | 33.333 | 100.000
---------------------------------------
Total | 150 | 100.000 |
In the future, I may add support for two way tables (it's a very easy extension).
StatsModels Integrations
Let's say that we want to run a few regressions on some data that we happened to come by:
using StatsModels, GLM
df = dataset("datasets", "attitude")
m1 = lm(@formula( Rating ~ 1 + Raises ), df)
m2 = lm(@formula( Rating ~ 1 + Raises + Learning), df)
m3 = lm(@formula( Rating ~ 1 + Raises + Learning + Privileges), df)
m4 = lm(@formula( Rating ~ 1 + Raises + Learning + Privileges
+ Complaints), df)
m5 = lm(@formula( Rating ~ 1 + Raises + Learning + Privileges
+ Complaints + Critical), df)
We can construct a single column for any one of these with the TableCol
constructor:
julia> t1 = TableCol("(1)", m1)
| (1)
-----------------------
(Intercept) | 19.978*
| (11.688)
Raises | 0.691***
| (0.179)
-----------------------
N | 30
$R^2$ | 0.348
But in general, it is easier to just use the regtable
function when combining several different models:
julia> reg_table = regtable(m1, m2, m3, m4, m5)
| (1) | (2) | (3) | (4) | (5)
-------------------------------------------------------------------
(Intercept) | 19.978* | 15.809 | 14.167 | 11.834 | 11.011
| (11.688) | (11.084) | (11.519) | (8.535) | (11.704)
Raises | 0.691*** | 0.379* | 0.352 | -0.026 | -0.033
| (0.179) | (0.217) | (0.224) | (0.184) | (0.202)
Learning | | 0.432** | 0.394* | 0.246 | 0.249
| | (0.193) | (0.204) | (0.154) | (0.160)
Privileges | | | 0.105 | -0.103 | -0.104
| | | (0.168) | (0.132) | (0.135)
Complaints | | | | 0.691*** | 0.692***
| | | | (0.146) | (0.149)
Critical | | | | | 0.015
| | | | | (0.147)
-------------------------------------------------------------------
N | 30 | 30 | 30 | 30 | 30
$R^2$ | 0.348 | 0.451 | 0.459 | 0.715 | 0.715
Currently, TexTables
works with several standard regression packages in the StatsModels
family to construct custom coefficient tables. I've mostly implemented these as proof of concept, since I'm not sure how best to proceed on extending it to more model types. By default, TexTables
will display significance stars using p-value thresholds of 0.1 for 1 star, 0.05 for 2 stars, and 0.01 for 3 stars (as is standard).
I think that I may spin these off into a "formulas" package at some point in the future.
If you are interested in integrating TexTables
into your regression package, please see the topic below under "Advanced Usage."
Row and Column Blocks
As you can see, the summary statistics are kept in a separate row-block while the columns are being merged together. We can do this either with unnamed groups (like in the previous example), or with named groups that will be visible in the table itself.
Suppose that our first 3 regressions needed to be visually grouped together under a single heading, and the last two were separate. We could instead construct each group separately and then combine them together with the join_table
function:
group1 = regtable(m1, m2, m3)
group2 = regtable(m4, m5)
grouped_table = join_table( "Group 1"=>group1,
"Group 2"=>group2)
This will display as:
julia> grouped_table = join_table( "Group 1"=>group1,
"Group 2"=>group2)
| Group 1 | Group 2
| (1) | (2) | (3) | (1) | (2)
------------------------------------------------------------------
(Intercept) | 19.978 | 15.809 | 14.167 | 11.834 | 11.011
| (11.688) | (11.084) | (11.519) | (8.535) | (11.704)
Raises | 0.691 | 0.379 | 0.352 | -0.026 | -0.033
| (0.179) | (0.217) | (0.224) | (0.184) | (0.202)
Learning | | 0.432 | 0.394 | 0.246 | 0.249
| | (0.193) | (0.204) | (0.154) | (0.160)
Privileges | | | 0.105 | -0.103 | -0.104
| | | (0.168) | (0.132) | (0.135)
Complaints | | | | 0.691 | 0.692
| | | | (0.146) | (0.149)
Critical | | | | | 0.015
| | | | | (0.147)
------------------------------------------------------------------
N | 30 | 30 | 30 | 30 | 30
$R^2$ | 0.348 | 0.451 | 0.459 | 0.715 | 0.715
If instead, we wanted to maintain a consistent numbering from (1)-(5), we could do it using the regtable
function:
julia> regtable("Group 1"=>(m1, m2, m3), "Group 2"=>(m4, m5))
| Group 1 | Group 2
| (1) | (2) | (3) | (4) | (5)
-------------------------------------------------------------------
(Intercept) | 19.978* | 15.809 | 14.167 | 11.834 | 11.011
| (11.688) | (11.084) | (11.519) | (8.535) | (11.704)
Raises | 0.691*** | 0.379* | 0.352 | -0.026 | -0.033
| (0.179) | (0.217) | (0.224) | (0.184) | (0.202)
Learning | | 0.432** | 0.394* | 0.246 | 0.249
| | (0.193) | (0.204) | (0.154) | (0.160)
Privileges | | | 0.105 | -0.103 | -0.104
| | | (0.168) | (0.132) | (0.135)
Complaints | | | | 0.691*** | 0.692***
| | | | (0.146) | (0.149)
Critical | | | | | 0.015
| | | | | (0.147)
-------------------------------------------------------------------
N | 30 | 30 | 30 | 30 | 30
$R^2$ | 0.348 | 0.451 | 0.459 | 0.715 | 0.715
And in latex, the group labels will be displayed with \multicolumn
commands:
\begin{tabular}{r|ccc|cc}
\toprule
& \multicolumn{3}{c}{Group 1} & \multicolumn{2}{c}{Group 2}\\
& (1) & (2) & (3) & (4) & (5) \\ \hline
(Intercept) & 19.978 & 15.809 & 14.167 & 11.834 & 11.011 \\
& (11.688) & (11.084) & (11.519) & (8.535) & (11.704) \\
Raises & 0.691 & 0.379 & 0.352 & -0.026 & -0.033 \\
& (0.179) & (0.217) & (0.224) & (0.184) & (0.202) \\
Learning & & 0.432 & 0.394 & 0.246 & 0.249 \\
& & (0.193) & (0.204) & (0.154) & (0.160) \\
Privileges & & & 0.105 & -0.103 & -0.104 \\
& & & (0.168) & (0.132) & (0.135) \\
Complaints & & & & 0.691 & 0.692 \\
& & & & (0.146) & (0.149) \\
Critical & & & & & 0.015 \\
& & & & & (0.147) \\ \hline
N & 30 & 30 & 30 & 30 & 30 \\
$R^2$ & 0.348 & 0.451 & 0.459 & 0.715 & 0.715 \\
\bottomrule
\end{tabular}
The vertical analogue of join_table
is the function append_table
. Both will also accept the table objects as arguments instead of pairs if you want to construct the row/column groups without adding a visible multi-index.
Display Options
You can recover the string output using the functions to_latex
and to_ascii
. But, it is also possible to tweak the layout of the tables by passing keyword arguments to the print
, show
, to_tex
, or to_ascii
functions. For instance, if you would like to display your standard errors on the same row as the coefficients, you can do so with the se_pos
argument:
julia> print(to_ascii(hcat( TableCol("(1)", m1), TableCol("(2)", m2)),
se_pos=:inline))
| (1) | (2)
-------------------------------------------------
(Intercept) | 19.978* (11.688) | 15.809 (11.084)
Raises | 0.691*** (0.179) | 0.379* (0.217)
Learning | | 0.432** (0.193)
-------------------------------------------------
N | 30 | 30
$R^2$ | 0.348 | 0.451
Similarly, if you want to print a table without showing the significance stars, then simply pass the keyword argument star=false
:
julia> print(to_ascii(hcat( TableCol("(1)", m1), TableCol("(2)", m2)),
star=false))
| (1) | (2)
----------------------------------
(Intercept) | 19.978 | 15.809
| (11.688) | (11.084)
Raises | 0.691 | 0.379
| (0.179) | (0.217)
Learning | | 0.432
| | (0.193)
----------------------------------
N | 30 | 30
$R^2$ | 0.348 | 0.451
Currently, TexTables
supports the following display options:
pad::Int
(default1
) The number of spaces to pad the separator characters on each side.se_pos::Symbol
(default:below
)- :below – Prints standard errors in parentheses on a second line below the coefficients
- :inline – Prints standard errors in parentheses on the same line as the coefficients
- :none – Suppresses standard errors. (I don't know why you would want to do this... you probably shouldn't ever use it.)
star::Bool
(defaulttrue
) If true, then prints any table entries that have been decorated with significance stars with the appropriate number of stars.
Changing the Default Formatting
TexTables
stores all of the table entries using special formatting aware container types types that are subtypes of the abstract type FormattedNumber
. By default, TexTables
displays floating points with three decimal precision (and auto-converts to scientific notation for values less than 1e-3 and greater than 1e5). Formatting is done using Python-like formatting strings (Implemented by the excellent Formatting.jl package) If you would like to change the default formatting values, you can do so using the macro @fmt
:
@fmt Real = "{:.3f}" # Sets the default for reals to .3 fixed precision
@fmt Real = "{:.2f}" # Sets the default for reals to .2 fixed precision
@fmt Real = "{:.2e}" # Sets the default for reals to .2 scientific
@fmt Int = "{:,n}" # Sets the default for integers to use commas
@fmt Bool = "{:}" # No extra formatting for Bools
@fmt AbstractString= "{:}" # No extra formatting for Strings
Note that this controls the defaults used when constructing a FormattedNumber
. If you want to change the formatting in a table that has already been constructed, you need to manually change the format
field of each entry in the table:
julia> x = FormattedNumber(5.0)
5.000
julia> x.format
"{:.3f"}
julia> x.format = "{:.3e}";
julia> x
5.000e+00