/usr/share/doc/python-biom-format-doc/html/_sources/documentation/adding_metadata.rst.txt is in python-biom-format-doc 2.1.5+dfsg-7build2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 | .. _adding_metadata:
====================================================
Adding sample and observation metadata to biom files
====================================================
Frequently you'll have an existing BIOM file and want to add sample and/or observation metadata to it. For samples, metadata is frequently environmental or technical details about your samples: the subject that a sample was collected from, the pH of the sample, the PCR primers used to amplify DNA from the samples, etc. For observations, metadata is frequently a categorization of the observation: the taxonomy of an OTU, or the EC hierarchy of a gene. You can use the ``biom add-metadata`` command to add this information to an existing BIOM file.
To get help with ``add-metadata`` you can call::
biom add-metadata -h
This command takes a BIOM file, and corresponding sample and/or observation mapping files. The following examples are used in the commands below. You can find these files in the ``biom-format/examples`` directory.
Your BIOM file might look like the following::
{
"id":null,
"format": "1.0.0",
"format_url": "http://biom-format.org",
"type": "OTU table",
"generated_by": "some software package",
"date": "2011-12-19T19:00:00",
"rows":[
{"id":"GG_OTU_1", "metadata":null},
{"id":"GG_OTU_2", "metadata":null},
{"id":"GG_OTU_3", "metadata":null},
{"id":"GG_OTU_4", "metadata":null},
{"id":"GG_OTU_5", "metadata":null}
],
"columns": [
{"id":"Sample1", "metadata":null},
{"id":"Sample2", "metadata":null},
{"id":"Sample3", "metadata":null},
{"id":"Sample4", "metadata":null},
{"id":"Sample5", "metadata":null},
{"id":"Sample6", "metadata":null}
],
"matrix_type": "sparse",
"matrix_element_type": "int",
"shape": [5, 6],
"data":[[0,2,1],
[1,0,5],
[1,1,1],
[1,3,2],
[1,4,3],
[1,5,1],
[2,2,1],
[2,3,4],
[2,5,2],
[3,0,2],
[3,1,1],
[3,2,1],
[3,5,1],
[4,1,1],
[4,2,1]
]
}
A sample metadata mapping file could then look like the following. Notice that there is an extra sample in here with respect to the above BIOM table. Any samples in the mapping file that are not in the BIOM file are ignored.
::
#SampleID BarcodeSequence DOB
# Some optional
# comment lines...
Sample1 AGCACGAGCCTA 20060805
Sample2 AACTCGTCGATG 20060216
Sample3 ACAGACCACTCA 20060109
Sample4 ACCAGCGACTAG 20070530
Sample5 AGCAGCACTTGT 20070101
Sample6 AGCAGCACAACT 20070716
An observation metadata mapping file might look like the following. Notice that there is an extra observation in here with respect to the above BIOM table. Any observations in the mapping file that are not in the BIOM file are ignored.
::
#OTUID taxonomy confidence
# Some optional
# comment lines
GG_OTU_0 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__ 0.980
GG_OTU_1 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae 0.665
GG_OTU_2 Root;k__Bacteria 0.980
GG_OTU_3 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae 1.000
GG_OTU_4 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae 0.842
GG_OTU_5 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae 1.000
Adding metadata
===============
To add sample metadata to a BIOM file, you can run the following::
biom add-metadata -i min_sparse_otu_table.biom -o table.w_smd.biom --sample-metadata-fp sam_md.txt
To add observation metadata to a BIOM file, you can run the following::
biom add-metadata -i min_sparse_otu_table.biom -o table.w_omd.biom --observation-metadata-fp obs_md.txt
You can also combine these in a single command to add both observation and sample metadata::
biom add-metadata -i min_sparse_otu_table.biom -o table.w_md.biom --observation-metadata-fp obs_md.txt --sample-metadata-fp sam_md.txt
In the last case, the resulting BIOM file will look like the following::
{
"columns": [
{
"id": "Sample1",
"metadata": {
"BarcodeSequence": "AGCACGAGCCTA",
"DOB": "20060805"
}
},
{
"id": "Sample2",
"metadata": {
"BarcodeSequence": "AACTCGTCGATG",
"DOB": "20060216"
}
},
{
"id": "Sample3",
"metadata": {
"BarcodeSequence": "ACAGACCACTCA",
"DOB": "20060109"
}
},
{
"id": "Sample4",
"metadata": {
"BarcodeSequence": "ACCAGCGACTAG",
"DOB": "20070530"
}
},
{
"id": "Sample5",
"metadata": {
"BarcodeSequence": "AGCAGCACTTGT",
"DOB": "20070101"
}
},
{
"id": "Sample6",
"metadata": {
"BarcodeSequence": "AGCAGCACAACT",
"DOB": "20070716"
}
}
],
"data": [
[0, 2, 1.0],
[1, 0, 5.0],
[1, 1, 1.0],
[1, 3, 2.0],
[1, 4, 3.0],
[1, 5, 1.0],
[2, 2, 1.0],
[2, 3, 4.0],
[2, 5, 2.0],
[3, 0, 2.0],
[3, 1, 1.0],
[3, 2, 1.0],
[3, 5, 1.0],
[4, 1, 1.0],
[4, 2, 1.0]
],
"date": "2012-12-11T07:36:15.467843",
"format": "Biological Observation Matrix 1.0.0",
"format_url": "http://biom-format.org",
"generated_by": "some software package",
"id": null,
"matrix_element_type": "float",
"matrix_type": "sparse",
"rows": [
{
"id": "GG_OTU_1",
"metadata": {
"confidence": "0.665",
"taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
}
},
{
"id": "GG_OTU_2",
"metadata": {
"confidence": "0.980",
"taxonomy": "Root;k__Bacteria"
}
},
{
"id": "GG_OTU_3",
"metadata": {
"confidence": "1.000",
"taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
}
},
{
"id": "GG_OTU_4",
"metadata": {
"confidence": "0.842",
"taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
}
},
{
"id": "GG_OTU_5",
"metadata": {
"confidence": "1.000",
"taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
}
}
],
"shape": [5, 6],
"type": "OTU table"
}
Processing metadata while adding
================================
There are some additional parameters you can pass to this command for more complex processing.
You can tell the command to process certain metadata column values as integers (``--int-fields``), floating point (i.e., decimal or real) numbers (``--float-fields``), or as hierarchical semicolon-delimited data (``--sc-separated``).
::
biom add-metadata -i min_sparse_otu_table.biom -o table.w_md.biom --observation-metadata-fp obs_md.txt --sample-metadata-fp sam_md.txt --int-fields DOB --sc-separated taxonomy --float-fields confidence
Here your resulting BIOM file will look like the following, where ``DOB`` values are now integers (compare to the above: they're not quoted now), ``confidence`` values are now floating point numbers (again, not quoted now), and ``taxonomy`` values are now lists where each entry is a taxonomy level, opposed to above where they appear as a single semi-colon-separated string.
::
{
"columns": [
{
"id": "Sample1",
"metadata": {
"BarcodeSequence": "AGCACGAGCCTA",
"DOB": 20060805
}
},
{
"id": "Sample2",
"metadata": {
"BarcodeSequence": "AACTCGTCGATG",
"DOB": 20060216
}
},
{
"id": "Sample3",
"metadata": {
"BarcodeSequence": "ACAGACCACTCA",
"DOB": 20060109
}
},
{
"id": "Sample4",
"metadata": {
"BarcodeSequence": "ACCAGCGACTAG",
"DOB": 20070530
}
},
{
"id": "Sample5",
"metadata": {
"BarcodeSequence": "AGCAGCACTTGT",
"DOB": 20070101
}
},
{
"id": "Sample6",
"metadata": {
"BarcodeSequence": "AGCAGCACAACT",
"DOB": 20070716
}
}
],
"data": [
[0, 2, 1.0],
[1, 0, 5.0],
[1, 1, 1.0],
[1, 3, 2.0],
[1, 4, 3.0],
[1, 5, 1.0],
[2, 2, 1.0],
[2, 3, 4.0],
[2, 5, 2.0],
[3, 0, 2.0],
[3, 1, 1.0],
[3, 2, 1.0],
[3, 5, 1.0],
[4, 1, 1.0],
[4, 2, 1.0]
],
"date": "2012-12-11T07:30:29.870689",
"format": "Biological Observation Matrix 1.0.0",
"format_url": "http://biom-format.org",
"generated_by": "some software package",
"id": null,
"matrix_element_type": "float",
"matrix_type": "sparse",
"rows": [
{
"id": "GG_OTU_1",
"metadata": {
"confidence": 0.665,
"taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
}
},
{
"id": "GG_OTU_2",
"metadata": {
"confidence": 0.98,
"taxonomy": ["Root", "k__Bacteria"]
}
},
{
"id": "GG_OTU_3",
"metadata": {
"confidence": 1.0,
"taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
}
},
{
"id": "GG_OTU_4",
"metadata": {
"confidence": 0.842,
"taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
}
},
{
"id": "GG_OTU_5",
"metadata": {
"confidence": 1.0,
"taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
}
}
],
"shape": [5, 6],
"type": "OTU table"
}
If you have multiple fields that you'd like processed in one of these ways, you can pass a comma-separated list of field names (e.g., ``--float-fields confidence,pH``).
Renaming (or naming) metadata columns while adding
==================================================
You can also override the names of the metadata fields provided in the mapping files with the ``--observation-header`` and ``--sample-header`` parameters. This is useful if you want to rename metadata columns, or if metadata column headers aren't present in your metadata mapping file. If you pass either of these parameters, you must name all columns in order. If there are more columns in the metadata mapping file then there are headers, extra columns will be ignored (so this is also a useful way to select only the first n columns from your mapping file). For example, if you want to rename the ``DOB`` column in the sample metadata mapping you could do the following::
biom add-metadata -i min_sparse_otu_table.biom -o table.w_smd.biom --sample-metadata-fp sam_md.txt --sample-header SampleID,BarcodeSequence,DateOfBirth
If you have a mapping file without headers such as the following::
GG_OTU_0 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__ 0.980
GG_OTU_1 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae 0.665
GG_OTU_2 Root;k__Bacteria 0.980
GG_OTU_3 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae 1.000
GG_OTU_4 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae 0.842
GG_OTU_5 Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae 1.000
you could name these while adding them as follows::
biom add-metadata -i min_sparse_otu_table.biom -o table.w_omd.biom --observation-metadata-fp obs_md.txt --observation-header OTUID,taxonomy,confidence
As a variation on the last command, if you only want to include the ``taxonomy`` column and exclude the ``confidence`` column, you could run::
biom add-metadata -i min_sparse_otu_table.biom -o table.w_omd.biom --observation-metadata-fp obs_md.txt --observation-header OTUID,taxonomy
|