/usr/share/doc/python-patsy-doc/html/_sources/categorical-coding.txt is in python-patsy-doc 0.4.1-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | .. _categorical-coding:
Coding categorical data
=======================
.. currentmodule:: patsy
Patsy allows great flexibility in how categorical data is coded,
via the function :func:`C`. :func:`C` marks some data as being
categorical (including data which would not automatically be treated
as categorical, such as a column of integers), while also optionally
setting the preferred coding scheme and level ordering.
Let's get some categorical data to work with:
.. ipython:: python
from patsy import dmatrix, demo_data, ContrastMatrix, Poly
data = demo_data("a", nlevels=3)
data
As you know, simply giving Patsy a categorical variable causes it
to be coded using the default :class:`Treatment` coding
scheme. (Strings and booleans are treated as categorical by default.)
.. ipython:: python
dmatrix("a", data)
We can also alter the level ordering, which is useful for, e.g.,
:class:`Diff` coding:
.. ipython:: python
l = ["a3", "a2", "a1"]
dmatrix("C(a, levels=l)", data)
But the default coding is just that -- a default. The easiest
alternative is to use one of the other built-in coding schemes, like
orthogonal polynomial coding:
.. ipython:: python
dmatrix("C(a, Poly)", data)
There are a number of built-in coding schemes; for details you can
check the :ref:`API reference <categorical-coding-ref>`. But we aren't
restricted to those. We can also provide a custom contrast matrix,
which allows us to produce all kinds of strange designs:
.. ipython:: python
contrast = [[1, 2], [3, 4], [5, 6]]
dmatrix("C(a, contrast)", data)
dmatrix("C(a, [[1], [2], [-4]])", data)
Hmm, those ``[custom0]``, ``[custom1]`` names that Patsy
auto-generated for us are a bit ugly looking. We can attach names to
our contrast matrix by creating a :class:`ContrastMatrix` object, and
make things prettier:
.. ipython:: python
contrast_mat = ContrastMatrix(contrast, ["[pretty0]", "[pretty1]"])
dmatrix("C(a, contrast_mat)", data)
And, finally, if we want to get really fancy, we can also define our
own "smart" coding schemes like :class:`Poly`. Just define a class
that has two methods, :meth:`code_with_intercept` and
:meth:`code_without_intercept`. They have identical signatures, taking
a list of levels as their argument and returning a
:class:`ContrastMatrix`. Patsy will automatically choose the
appropriate method to call to produce a full-rank design matrix
without redundancy; see :ref:`redundancy` for the full details on how
Patsy makes this decision.
As an example, here's a simplified version of the built-in
:class:`Treatment` coding object:
.. literalinclude:: _examples/example_treatment.py
.. ipython:: python
:suppress:
with open("_examples/example_treatment.py") as f:
exec(f.read())
And it can now be used just like the built-in methods:
.. ipython:: python
# Full rank:
dmatrix("0 + C(a, MyTreat)", data)
# Reduced rank:
dmatrix("C(a, MyTreat)", data)
# With argument:
dmatrix("C(a, MyTreat(2))", data)
|