/usr/share/perl5/OpenOffice/OODoc/XPath.pod is in libopenoffice-oodoc-perl 2.125-3.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 | =head1 NAME
OpenOffice::OODoc::XPath - Low-level navigation in the documents
=head1 DESCRIPTION
This module is a low-level class which uses OODoc::File (without
inheriting anything from it) along with the classes defined in the
XML::Twig module. It's a common basis for the other, more user-
friendly, document-oriented modules. It uses XPath expressions in
order to retrieve any document element (but it doesn't provide a
full implementation of the XPath standard). In addition, while the
most part of the provided methods are OpenDocument-aware, this module
could be used against any other kind of XML documents, simply because
it benefits from all the features of XML::Twig. Such a possibility
may prove useful for applications that simultaneously process OpenDocument
and non-OpenDocument XML files.
The OpenOffice::OODoc::XPath class should not be explicitly used in the
applications, because all its features are available in more user-friendly
classes such as OODoc::Text, OODoc::Styles, OODoc::Image, OODoc::Document
and OODoc::Meta. The present manual page is provided to describe the
common methods and properties that are available with all these classes.
This chapter can be skipped by programmers who are only interested
in upper level methods provided by the OODoc::Text, ::Styles, ::Image and
::Meta modules. Understanding these modules is easier and using them
requires less Perl and XML expertise. However, calling OODoc::XPath methods
remains a good rescue option as it allows all kinds of operations on all types
of XML elements contained in any OpenDocument-compliant file.
OODoc::XPath is the common foundation of OODoc::Meta, OODoc::Text,
OODoc::Styles and OODoc::Image. It contains the lowest layer of
navigation services for XML documents and handles the link with
OODoc::File for file access. Its primary role is as an interface
with the XML::Twig API.
In the present manual chapter, you will see "elements" often mentioned.
When it says that a module expects a parameter or returns an element
(either singly or as a list), it is referring to an XML element.
It is important to distinguish elements from their content
(elements being simply references to XML data structures). To read
or modify the content of an element such as its text or XML
attributes, use the accessors also available within OODoc::XPath.
In most cases where XPath methods require a reference to an element
as an argument, there are two ways of proceeding:
- reference the element directly (obtained previously)
- or give an XPath expression and a position, being a string and an
integer respectively; for example, the pair ('//office:body/text:p', 12)
or ('//text:p', 12) represents the thirteenth occurrence of the 'text:p'
element, i.e. the 13th paragraph (occurrences are numbered starting from 0).
The second way requires the knowledge of an appropriate XPath
expression (according the OpenDocument XML format specification).
And a given XPath expression is not necessarily the same with an
OpenDocument as in an OpenOffice.org document. So you should
preferently use high level accessors (provided by derivative classes
such as OODoc::Document) and avoid XPath hardcoding. However, you
know you can at any time reach any element with XPath.
Of course, you will never need to use XPath expressions in order to
reach the most common text elements (such as paragraphs), because the
OODoc::Text module provides more friendly accessors (for example, you
will probably use the getParagraph() method and forget "//text:p").
Some methods accept both forms which means that if the first
parameter is recognised as an element reference, the position does
not need to be given. Therefore the number of arguments for certain
OODoc::XPath methods can vary.
For those who really want to access all areas there are also
OODoc::XPath methods which allow unrestricted access to every
element or XML attribute via an access path in XPath syntax. If you
are into this kind of thing, we recommend you obtain good syntax
reference manuals for XPath and OpenDocument and a supply of
aspirin.
Methods which may return several lines of text (e.g. getTextList) do
so either in the form of an unique character string containing "\n"
separators or in table form.
Unless otherwise stated, the word 'document' in this chapter only
refers to XML documents contained within OODoc::XPath objects and
not, say, OpenDocument files (as an end user would use).
Amongst the different methods which return elements, attributes or
text, some are called getXxx, others selectXxx or findXxx. Read
methods whose names start with "get" generally refer to an
unfiltered object or list, whereas others return an object or list
filtered according to a parameter value. In this latter case the
search parameter is treated as a standard expression and not an
exact value. This means that if the search criteria is "xyz", all
text containing "xyz" will be considered a match. To restrict the
search to text exactly equal to "xyz", use "^xyz$" as the search
criteria (following Perl regular expression syntax).
Several methods allow you to place copies of or references to
elements (from other documents or from other positions in the same
document) in any position in the current document. This offers
powerful manoeuvrability but only if these placements conform with
the destination position's context.
For example, you can easily copy a paragraph from one document
to another but only if you knowingly modify the paragraph's style
attribute if that style is not already defined in the destination
document. You can also copy the style but only if you are sure that
this style is not already defined by another unknown style in the
destination document (and so on).
For advanced users familiar with the XML::Twig API, it might be
interesting to know that all the objects called "elements" in the
following chapters are objects of the OpenOffice::OODoc::Element
class, which is an XML::Twig::Elt derivative. So all methods associated
with this class are directly applicable to these elements, on top of the
functionality described in this manual. However, the knowledge of XML::Twig
is not mandatory.
Important note: The applications should not explicitly work with this
class. We recommend using OODoc::Meta and OODoc::Document (which are both
OODoc::XPath derivatives). These two objects provide highest-level methods
which are neater and more productive. Explicit use of OODoc::XPath methods
(which sometimes require large numbers of parameters) should only be
considered as a last resort in unexpected circumstances for access to any
element or XML attribute not handled by more friendly methods. However,
the present manual chapter could prove helpful because all the common
features of OODoc::Meta and OODoc::Document are described here.
=head2 Methods
=head3 Constructor : OpenOffice::OODoc::XPath->new(<parameters>);
Short Form: odfXPath(<parameters>)
Returns a new OpenDocument connector, i.e. an interface which
can be used for subsequent operations on a well-formed document.
This constructor should not be called directly; it's implicitly
triggered each time a Meta or Document object is created. So the
following description apply to odfMeta() and odfDocument().
The document is loaded and parsed according to various options.
The most used option is 'file'; it simply allows the application
to process an OpenDocument file selected by its path/name in the
file system.
Example:
my $doc = odfXPath
(
file => "myfile.ods",
part => "content"
);
# ... lot of processing ...
$doc->save;
Returns a new document connector. In the example above, the object
is loaded from a regular OpenDocument file, that is the most current
option, but there are other possibilities. It's possible to use
flat XML (available as a string in memory, or loaded from a file).
In addition, this constructor is able to create a new document
from scratch.
The value of the 'file' option may be an open IO::File object,
that allows the application to use an application-provided file
handle. However, you should prefer file paths/names when possible,
and read the explanations about the constructor and the save() method
in the OpenOffice::OODoc::File manual page before using open file
handles. Remember that, as soon as the given file or handle is
an ODF container, OODoc::XPath uses OODoc::File.
Parameters are named (hash key => value). The constructor must get
at least one parameter giving a means of obtaining the XML document
that it will represent. Several options are available; each one is
represented through the following examples:
# option 1 (using an existing flat XML document)
my $doc = odfXPath(xml => $xml_string);
# option 2 (using a previously created ODF file interface)
my $oofile = odfContainer('source.odt');
my $doc = odfXPath(container => $oofile, part => 'meta');
# option 3 (using a regular ODF file directly)
my $doc = odfXPath(file => 'source.odt', part => 'content');
# option 4 (multiple instances against a single file)
my $content = odfXPath(file => 'source.odt', part => 'content');
my $meta = odfXPath(file => $content, part => 'meta');
my $styles = odfXPath(file => $content, part => 'styles');
Remember "odfXPath()" represents "OpenOffice::OODoc::XPath->new()"
in the instructions above, and you can (and should) use this shortcut
provided that you have loaded the main OpenOffice::OODoc module, and
not only and explicitly the OpenOffice::OODoc::XPath module.
The first form uses an XML string directly (previously loaded or
created by the program). To be used for very specific applications
working with flat XML documents exports and not with standard
OOo/OpenDocument files.
The second method links OODoc::XPath to an existing OODoc::File
object (through the "container" option) and indicates which XML part it
is to extract (metadata, content, styles, etc). The OODoc::File is an
abstraction of an already open ODF container. It can be shared, i.e.
several OODoc::XPath objects can be instantiated with the same
OODoc::File object, and this possibility must be used when
several OODoc::XPath objects have to bring consistent changes in
a single file (see option 4 below). In order to create the
required OODoc::File object, simply use odfFile() with a filename
as argument (for advanced use, see OpenOffice::OODoc::File).
The third method is the easiest, because the user just provide
a filename and a member, and all the file interface is run silently
(i.e. an invisible OODoc::File object is automatically created and
used to get the content). It's probably the most used approach; its
recommended when the user doesn't need to get more than one member
in the same file.
The 'part' option is a selector that tells what component is needed
(content, styles, metadata, ...) knowing that an OODoc::XPath object
can handle only one component. Its default value is 'content'.
Note that the 'part' option replaces the deprecated 'member' option.
However, for compatibility reasons, 'member' is supported yet (if
both 'member' and 'part' are erroneously provided, 'member' prevails).
If the application needs to process, say, the content and the styles
in the same session, it must create two, or more, OODoc::XPath objects
possibly associated with the same file interface. The appropriate way
is shown in our last example above. The first instance is associated
with a filename. Then the other instances are created with the first
one, provided as the value of the 'file' option instead of a filename.
The constructor tries to be user-friendly: if the 'file' value is
a character string, it's regarded as a filename, but if this value,
is an existing OpenOffice::OODoc::XPath object, the new object is
automatically connected to the same file interface as the other one.
The file interface is transparently provided by a common shared
OpenOffice::OODoc::File object (you can safely ignore the features
of this object, but a corresponding manual chapter is available for
more details).
Be careful: creating more than one OpenOffice::OODoc::XPath objects
linked by their 'file' parameters to the same explicit filename (and
not linked with each other) produces useless extra I/O operations and
possible conflicts.
Caution: being associated with a common interface via OODoc::File,
none of these OODoc::XPath objects should be deleted before the final
save() call for this archive. So by calling a save, the File object
"calls up" all the XPath objects which were "connected" to it in order
to "ask" each of them for the changes which were made to the XML
(content, styles, meta, etc.). The results are unpredictable if any
of them is absent when called.
If the provided filename has a ".xml" or ".XML" suffix, or whatever
the name if the 'flat_xml' option is set to 1, the file is processed
as flat XML and not as a regular OOo file. No OODoc::File object is
created, and the result of a subsequent call of the save() method
produces a flat XML export (and not a regular OOo/OpenDocument file).
You can pass the optional parameter 'element' in any case where the
constructor is called without the 'xml' parameter. Bearing in mind
that an OODoc::XPath object will not necessarily handle an entire
XML document, this extra parameter indicates the name of the XML
element to be loaded and handled. If the 'element' parameter is not
given for an OpenDocument file, a default element will be chosen
according to the following table:
'meta' => 'office:document-meta'
'content' => 'office:document-content'
'styles' => 'office:document-styles'
'settings' => 'office:document-settings'
'manifest' => 'manifest:manifest'
Conversely, the 'element' parameter becomes mandatory if the chosen
XML element is not listed above. Through OODoc::File, OODoc::XPath
can actually access archives which are not necessarily in
OpenDocument format and may be, for example, "databases" of
presentation and content templates.
If the application needs to create a new document, and not process
an existing one, an additional option must be passed:
create => "<class>"
where "class" must be one of the following list: "text",
"spreadsheet", "presentation" or "drawing", according to the needed
content class. And, for very special needs, the user can pass an
additional "template_path" to select an ad hoc directory of XML
templates instead of the default one. This user-provided directory
must have the same kind of structure and content as the "templates"
subdirectory of the OpenOffice::OODoc installation.
An additional 'opendocument' option can be provided and set to 'true'
or 'false'. If this option is 'false', the new document is created
according to the OpenOffice.org 1.0 format instead of the OASIS
OpenDocument format. The default format is OpenDocument. The
'opendocument' option works for new documents only and is ignored
unless the 'create' option. This module can create and process either
OpenOffice.org 1.0 documents or ODF documents but can't directly
convert a document from one format to the other one.
OODoc::XPath can process ODF documents provided through XML flat
files as well as in the compressed (zip) format. The given file is
automatically processed as flat XML if either it's name ends by ".xml"
or the 'flat_xml' option is set to '1'. When processing a flat XML
file, OODoc::XPath doesn't load the OODoc::File zip interface. So,
a subsequent call of the save() method can only export the document
as flat XML.
An optional 'readable_XML' can be passed. If this option is provided
and set to 'on' or 'true', the resulting XML will be smartly indented
(and, of course, more space-consuming). This feature is intended for
debugging purposes and should not be used in production.
The 'local_encoding' option can be set with the appropriate value
when a particular character set (and not the default one) must be
used for a document.
A 'read_only' can be provided and set to 'true' in order to prevent
the current member from being written back to the physical ODF file
when the save() method is called.
Other optional parameters can also be passed to the constructor (see
Properties below).
=head3 appendElement(path, position, name/xml, [options]);
=head3 appendElement(element, name/xml, [options]);
Adds a new element or existing element to the list of child elements
of an existing parent element given first (by [path, position] or by
reference).
The argument after the position argument can be an XML element name.
Example:
$content->appendElement
(
'//office:body', 0, 'text:p',
text => "New text"
);
adds a paragraph containing the phrase "New text" to the end of the
document body. (Remember that in the case of an OpenDocument text
file (Writer), it would be better to use the appendParagraph method of
OpenOffice::OODoc::Text as this requires fewer parameters.
If the 'text' option is omitted, an empty element is created (in the
above example it would be an empty paragraph or line feed).
You can pass the 'attributes' option which is a hash whose keys are the
XML attribute names and whose values are the XML attribute values. Use
of these options depends on the type of document and the type of element
and requires knowledge of OpenDocument conventions.
Example:
$my_style =
{
'style:name' => 'P1',
'style:family => 'paragraph'
};
$content->appendElement
(
'//office:automatic-styles', 0, 'style:style',
attributes => $my_style
);
creates a new paragraph style called 'P1' in the list of "automatic
styles" ("automatic styles" are styles which are not explicitly
indicated in the styles list as it appears to the end user).
This method lets you add any kind of element into a document, even
exotic ones. With the most common OpenDocument objects (e.g.
paragraphs), though, it is easier to use the specialist methods
contained in other modules.
The 'name' argument can be replaced by an existing element in the
same OODoc::XPath object or in another. In which case no element is
created but the existing element is simply referenced with a new
position even though it remains in its old position. Caution: any
modification of an element which is referenced several times in one
or more documents is made to all references. If you want to add a
similar but separate element, you must use replicateElement which
produces a new element from the content of an existing one.
The 'name' argument can also be replaced by an XML string. This
string must correspond to the correct XML description of a UTF-8
encoded OpenDocument element. For example, it could be a
string which had been previously exported using the exportXMLElement
method of OODoc::XPath, or extracted from an OpenDocument file by
some other application. If for any reason you absolutely have to
use a non-UTF8 XML string which contains 8-bit characters (accented
letters, etc.), you can always convert the string using the
encode_text method before passing it to appendElement. Of course,
the problem will not arise if you are absolutely sure that the string
only contains ASCII (7 bit) characters. XML syntax is checked, but it
is up to the user to verify that the element import conforms to
OpenDocument XML grammar.
The following piece of code produces the same result as the first
example:
$xml = '<text:p text:style-name="Standard">' .
'New text' .
'</text:p>';
$content->appendElement
(
'//office:body', 0, $xml
);
Using this method, after one or more element creations by direct
importation of XML strings, it might be useful to call the
reorganize method (but not absolutely necessary).
=head3 appendBodyElement(element [, options])
Copies an existing element of any type and appends it to the end of
the document body. No new element is created.
=head3 appendLineBreak(element)
Appends a line break to a text element. This method allows the user
to create a single text element (ex: a paragraph) including one or
more breaks, instead of separate elements.
The example below appends a new text in a new line to the end of
an existing paragraph:
my $p = $doc->getElement('//text:p', 5);
$doc->appendLineBreak($p);
$doc->extendText($p, 'A new line in the same paragraph');
=head3 appendSpaces(element, length)
Appends a sequence of multiple spaces to a text element, knowing that
a string containing repeated spaces shouldn't be stored as is in a
document (see setText() and spaces() for details about repeated
spaces).
=head3 appendTabStop(element)
Appends a tab stop ("\t") to a text element.
=head3 blankSpaces(length)
See spaces().
=head3 cloneContent(oodoc_xpath_object)
Cancels the entire document contents of the current instance and
replaces it with a reference to the contents of another OODoc::XPath
object.
Example:
$doc1 = OpenOffice::OODoc::XPath->new
(
file => 'template.ods',
member => 'styles'
);
$doc2 = OpenOffice::OODoc::XPath->new
(
file => 'sheet.ods',
member => 'styles'
);
$doc2->cloneContent($doc1);
$doc2->save;
This sequence replaces the styles and page layout of 'sheet.ods'
with those of 'template.ods'.
The above example could easily have been written without even using
OODoc::XPath by acting directly on the files. For example, extract
the 'styles.xml' member from 'template.ods' and insert it into
'sheet.ods'. The use of OODoc::XPath and the cloneContent method
guarantees that the transferred content corresponds to an
OpenDocument document and allows reads/writes to it on the fly.
Caution: the "cloned" content is not physically copied. Calling this
method references one single physical content in two documents. Any
modifications made to the content of either of these two documents
applies equally to the other and vice-versa.
=head3 contentClass([class name])
Accessor to get or set the class of the document content. If the
current member is a document content, returns its class according
to the OpenDocument terminology, i.e. one of the following values:
"text", "spreadsheet", "presentation", or "drawing".
Returns an empty string if the current member is not a document
content (if it's, for example, the "meta" or "styles" member).
This accessor is read-only.
=head3 createSpaces(length)
See spaces().
=head3 createElement(name, text)
=head3 createElement(xml)
Creates a new element without attributes which is not inserted in a
document.
Example:
my $element =
$doc->createElement
('my_element', 'its content');
creates a new XML element without attributes and returns its
reference.
Instead of a name, the first argument can be the full XML
description of the element. Example:
my $element = $doc->createElement
('<text:p>My text</text:p>');
This new element is temporary: it is not linked to any document. It
is destined to be used later by another method.
The name can contain a namespace prefix which would look like this:
'namespace:name'.
In its second form, a well-formed XML string can be supplied as a
single argument. The recognition criteria is the presence of the "<"
character at the beginning of the argument. See appendElement for
comments on the direct insertion of XML.
Explicit calls to createElement() should be rare. This method is
normally called silently by higher-level methods which are capable
of creating an element, inserting it in a document's XML tree and
giving it attributes (see appendElement and insertElement).
=head3 createFrame(name => frame_name [, options])
Creates an empty frame. A frame is an OpenDocument object which
controls a rectangular area where a visible content is displayed.
Possible contents for a frame are text boxes or images.
This method works is not focused on a particular document class
(for example, it works on text documents as well as on presentations),
but the visible effects of some options are not always exactly the
same.
Possible options are:
'name' => unique name
The 'name' is an identifier; if provided, it should be unique for
the document.
'attachment' => existing container
The value of this option, if provided, must be an existing element
which can contain a text box according to the OpenDocument rules.
Such an object may be, for example, a draw page if the current
document class is 'presentation' or 'drawing', or a paragraph if
this class is 'text'.
'page' => page number or name
The effects of the 'page' option depends on the content class of the
current document. If this option is used, it indicates that the frame
will be anchored to a page, and the given value is a page number.
It does not matter if, when createFrame() is called, this number is
beyond the end of the document or not. If the content class of the
document is "presentation" (Impress) or "drawing" (Draw), then the
page option must be either the visible name or the object reference
of an existing draw page. Caution: the 'page' option is ignored if
'attachment' is provided; in the other hand, either 'page' or
'attachment' nust be provided in order to really include the new frame
in the document.
'position' => coordinates
The coordinates are provided as a string. They go from left to right
and top to bottom. Coordinates should be given here in the form of a
string "x,y", and the default unit is centimeter. You can choose
any other OpenDocument-supported unit instead by attaching the
corresponding usual abbreviation, such as "12.5cm, 35mm" which is the
same as "125mm, 3.5cm" or "12.5,3.5", etc. The point ("pt") unit is
allowed as well. The default coordinates are "0, 0". By default,
the coordinates are relative to the anchor point. So, the coordinates
are directly page-related if a valid 'page' option is provided only,
but if the box is attached to, say, a paragraph, the origin of the
coordinates is the beginning of the paragraph. However, the real
interpretation of the coordinates depends on the style. With some
style definitions, the coordinates may just be ignored (ex: if the
style says "the frame is centered", OpenOffice.org will center the
frame whatever its stored coordinates). According to other possible
style definitions, the coordinates could be counted from the right
and/or from the bottom and not from the left/top.
'size' => the size of the box
Provided using as a string using the same syntax and units as the
position, the 'size' option is strongly recommended knowing that a
sizeless frame couldn't be properly displayed. The width comes
first in the string. The height is sometimes ignored, according to
the style of the frame: by default, the display height of a text box
(which is a particular frame) is automatically adjusted to the
content.
'style' => style name
The 'style' option allows the application to set the frame style.
Caution, a text style can't be used as a frame style. A frame
style controls the box properties only (border, background, shadow,
and so on), and not the content properties. Reusing an existing frame
style through this option is generally a good idea.
=head3 currentContext([context])
Accessor allowing the application to change the context for some
search methods (including getElement()).
The default context is the root of the document. By setting the
current context to a lower level object, the application can restrain
the search to the descendants of this object.
In the example below, the getElement() method retrieves a paragraph
by order number in a previously selected section, and not in the whole
document.
my $section = $doc->getElement("//text:section", $s_number);
$doc->currentContext($section);
my $paragraph = $doc->getElement("//text:p", $p_number);
Without argument, simply returns the previous current context.
See also resetCurrentContext().
=head3 decode_text(utf8_string)
Caution: this method is a non-exported class method. It must be used
like this:
OpenOffice::OODoc::XPath::decode_text($utf8_string);
and not from an OODoc::XPath instance.
Decodes a UTF-8 string and returns an 8 bit character translation
of it out of the user's character set, as defined by the following
variable:
$OpenOffice::OODoc::XPath::LOCAL_CHARSET
for which the default value is 'iso-8859-1'. See the Perl/Encode
manual for the list of supported character sets.
OpenDocument uses UTF-8 XML encoding.
Explicit calls to this method should be rare. It is used internally
by methods which return text extracted from document content (e.g.
getText).
Warning to contributors: any method which returns text extracted
from ODF documents is based on decode_text; so any modification or
improvement of the decoding logic should be made there.
=head3 encode_text(editable_string)
Class method.
Encodes "local" character strings (for writing to ODF documents).
Example:
$string = OpenOffice::OODoc::encode_text($local_string);
The local character string is defined by the following global
variable:
$OpenOffice::OODoc::XPath::LOCAL_CHARSET
for which the default value is 'iso-8859-1'.
Explicit calls to this method should generally be avoided. It is
used internally by methods which insert text or attribute values
into documents (e.g. setText).
=head3 dispose()
Deletes the calling document object. Recommended as soon as the
object is no longer needed by the application, and sometimes
mandatory to avoid memory leaks, especially in long-running processes.
=head3 exportXMLBody()
Returns the XML string for use by another application representing
the body of a document, without UTF8 decoding.
=head3 exportXMLContent()
See getXMLContent()
=head3 exportXMLElement(path, position)
=head3 exportXMLElement(element)
Returns the XML string which represents a particular document
element (style definition, paragraph, table cell, object, etc.) for
use by another application without UTF8 decoding.
This method is principally designed to allow remote exchanges of
elements between programs using any XML storage or transfer method.
It acts as "sender" whilst the "receiver" can use appendElement or
insertElement (for example) to insert any exported elements into a
document. Example:
# sender programme
# ...
open (EXPORT, "> transfer.xml");
print EXPORT $doc->exportXMLElement('//text:p', 15);
close EXPORT;
# receiver programme
# ...
open (IMPORT, "< transfer.xml");
$doc->appendElement('//office:body', 0, <IMPORT>);
close (IMPORT);
In this example, a paragraph is transferred but it could just as
easily be any content, presentation or metadata element.
Conversely, this method is not needed when transferring an element
from one document to another in the same program (or from one
document position to another). An element can be copied directly
from within the same program by reference or replication without
going via its XML (see appendElement(), insertElement() and
replicateElement()).
=head3 extendText(path, position, text [, offset])
=head3 extendText(element, text [, offset])
Appends the given text to the previous content of the given
element. If the optional 'offset' element is provided, the
new element is inserted at the given position.
Example:
$doc->setText($p, "Initial content");
$doc->extendText($p, " extended");
Assuming $p is a regular text element (ex: a paragraph), its
content becomes "Initial content extended".
If the second argument is an element itself, it's appended
as is to the first element. This feature can be used, for
example, in order to append sequences of repeated spaces:
$doc->setText($p, "Begin");
$spaces = $doc->spaces(6);
$doc->extendText($p, $spaces);
$doc->extendText($p, "End");
After the code sequence above, the $p element contains:
"Begin End"
knowing that a single string containing repeated spaces could
not be properly processed by extendText(), even if the
'multiple_spaces' property is set (this property affects the
setText() method only).
(See also setText()).
=head3 findElementList(element, filter [, replacement])
Returns all the children of the given element whose content matches
the given filter (regexp).
If the third argument ('replacement') is given, every string which
matches the filter in each child element will be replaced by this
'replacement' value. This 'replacement' argument can be a character
string or a function reference. (See replaceText() method below.)
Filtering and possible replacement only affects an element's content
and not its attributes.
This method is mostly for internal use. We recommend using other
methods for the selective extraction of elements.
=head3 flatten(element)
Converts in place the content of the given element to a flat string,
removing any structure. Same as $element->flatten() (see flatten()
in the "Element methods" section below). If no element is provided,
"flattens" the current context element, which is, by default, the
root of the document (be careful !).
=head3 getAttribute(path, position, name)
=head3 getAttribute(element, attribute_name)
Returns the value of a given attribute in a given element.
The element is the first argument, the name of the attribute the second
one. The return value is undef if the attribute is not defined in the
given element.
Example:
my $element = $doc->getElement('//text:p', 15);
my $style = $doc->getAttribute($element, 'text:style-name');
returns the style for paragraph 15.
If the given attribute name doesn't include a namespace prefix, the
namespace of the attribute is automatically supposed to be the same as
the namespace of the element. In addition, any blank space within the
attribute name is regarded as a '-'. So, the same example could be
be written more concisely as shown below:
my $element = $doc->getElement('//text:p', 15);
my $style = $doc->getAttribute($element, 'style name');
=head3 getAttributes(path, position)
=head3 getAttributes(element)
Returns a list of the element's attributes in the form of a hash
whose keys are the attributes' XML names.
=head3 getBody()
Returns the root of the document body. The document body is the
main container of all the displayable content not including page
headers, page footers, and page backgrounds.
=head3 getDescendants(tag [, context])
Returns the list of the descendants of the given context element
matching the given tag. Example:
my $section = $doc->getSection("SectionName");
my @paragraphs = $doc->getDescendants('text:p', $section);
Here, @paragraphs is the list of all the paragraphs which are the
descendants (at every level) of a given section (the getSection()
method is described in the OpenOffice::OODoc::Text chapter).
If the second argument is not provided, the current context of the
document is used (see currentContext()).
=head3 getElement(path [, position [, context]])
This method is provided in order to allow the user to retrieve any
element in any kind of XML document (ODF-compliant or not) using an
application-provided XPath expression. It should be used with elements
whose type is not explicitly supported by the more focused (and more
user-friendly) methods, described in other manual chapters (::Text,
::Styles, ::Meta, and ::Document).
This method returns an element's reference.
The position argument is used to select a particular element, in the
order of the document, knowing that the given XPath expression could
select a set of elements. Without it, getElement() returns the first
element matching the given XPath.
The XPath expression applies in the current context, and not always
in the whole document (see currentContext()). However, if the
reference of a previously selected element is provided as a third
argument, the given element is used as the context.
Position indicators start at 0 just like in Perl tables (and some
other programming languages).
Example:
my $p = $doc->getElement('//table:table', 0)
indicates an element containing the first table of a text document
or first sheet of a spreadsheet.
Positions can also be counted backwards from the end by giving
negative values, i.e. position -1 being the last element. Thus:
my $h = $doc->getElement('//text:h', -2);
indicates the second-last header of a text document.
Note: None of the two examples above should be used in a real
application, knowing that the ::Text module provides getTable() and
getHeading() that do the job without XPath coding.
When successful, this method ensures that the returned object is
indeed an element and not another type of node (e.g. attribute,
text, comment, etc.). Such an object is never a printable text; it's
either a text container (whose content may be extracted using
getText() or getFlatText()) or a non-text element (such as a style,
a font declaration, a variable field, a document properties container,
etc).
Limit: getElement() doesn't implement the full XPath specification,
while it supports a large subset (see the XML::Twig documentation for
details about the current XPath coverage).
=head3 getElementByIdentifier(id [, options])
Returns an element according to the given identifier, if any, or undef
otherwise.
Note that, according to the ODF 1.1 standard, some elements have
identifiers (i.e. text:id attributes), while most haven't, so
this method can't work with any object.
Allowed options are:
tag => restricts the search to a given element tag
context => restricts the search to a given context
Example:
$section = $doc->getElement('//text:section', 0);
$note = $doc->getElementByIdentifier(
"id004",
tag => 'text:note',
context => $section
);
This sequence selects the note (i.e. footnote or endnote) identified by
"id004" if such a note appear in the first section of the document.
Without the 'context' option, the search space would be the current
context (that is the whole document by default). Without the 'tag'
option, the first object that owns the given identifier is selected,
whatever its tag.
See also getIdentifier(), setIdentifier(), identifier().
=head3 getElementList(path [, context])
Returns a list of all elements at a specified path.
Example:
my @ref_summary = $doc->getElementList('//text:h');
The above example returns a table containing all header elements of
a text document.
The path can of course be a more complex XPath expression
stipulating, for example, a selection of attribute values. In most
cases, you should avoid complicating things unnecessarily
(especially in Text, Image and Styles modules), as there are methods
for searching by element type, attribute and content which are much
easier to use and avoid the need to supply XPath expressions.
An optional context argument may be provided in order to restrict the
search space.
Note: the returned list contains elements in the sense of getElement()
and not a list of element contents.
=head3 getFirstTextRun(path, position)
=head3 getFirstTextRun(element)
Returns the first text segment of an element whose text content is
segmented due to one or more child elements. In other words, returns
the beginning of the text content up to the first child element, if
any. If the given element just contains flat text, without any child
element, returns the whole text, just like getText() introduced below.
=head3 getFlatText(path, position)
=head3 getFlatText(element)
Like getText() below, but without rendering of possible tab stops,
line breaks, repeated spaces, or any other markup. The returned text
is just a decoded flat string.
=head3 getFrameElement(name/number)
Selects the frame identified by the given name, or by the given order
number in the document context.
=head3 getIdentifier(path, pos)
=head3 getIdentifier(element)
Returns the identifier (text:id) of the given element, if any.
See also identifier(), setIdentifier(), selectElementByIdentifier().
=head3 getNodeByXPath(xpath_expression)
=head3 getNodeByXPath(xpath_expression, context)
=head3 getNodeByXPath(context, xpath_expression)
A low-level method which returns the node corresponding to the given
XPath expression, if it exists in the document. This method (which
gives unrestricted access to the entire content of a document) is
designed for use with the unexpected. You will obviously need to be
familiar with XPath syntax (not documented here) as well as
OpenDocument structure. See also selectNodesByXPath().
=head3 getObjectCoordinates(object)
Returns the coordinates (X, Y) of the target object, if any. This
method makes sense with "positioned" objects, i.e. with frames and
frame-like objects (images, text boxes).
In an array context, the coordinates are returned as two distinct
strings (horizontal, then vertical position). In a scalar context,
the values are returned in a single string, and separated by a comma.
See createFrameElement() for details about the coordinates and size
units and notation.
=head3 getObjectDescription(object)
Returns the litteral description of a visible object. This method
makes sense for frames or frame-like objects (such as images or
text boxes).
=head3 getObjectName(element)
Returns the name of the given element, if any.
=head3 getObjectSize(object)
Returns the size of the given object, if any. This method works with
frames and other frame-based objects, such as images and text boxes.
In the returned data, the width comes first, followed by the height.
The size is returned in the same way as the coordinates with
getObjectCoordinates().
=head3 getPartName()
Returns the name of the document part, i.e. 'content', 'styles', 'meta',
and so on.
=head3 getRoot()
Returns the absolute root element of the document. The root element
contains any other visible or non visible object, including the
document body (see getBody) and style definitions.
=head3 getText(path, position)
=head3 getText(element)
Returns text in the local character set, possibly UTF-8 decoded,
contained in the element given as an argument (by path/position or
by reference). See also getFlatText().
Two equivalent examples:
# version 1
my $element = $doc->getElement('//text:p', 4);
my $text = $doc->getText($element);
# version 2
my $text = $doc->getText('//text:p', 4);
Version 2 is better if the only aim is to get the text from
paragraph 4. Version 1 is better, however, if during the course of
the program you want to perform other operations on the same
paragraph. Giving an element's reference will mean avoiding element
handling methods having to recalculate a reference from the XPath
path.
=head3 getTextList(path)
Returns text from all elements in the specified path.
Example:
my $summary = $doc->getTextList('//text:h');
my $report = $doc->getTextList('//text:span');
The $summary variable contains a concatenation of all headers.
$report contains all the words or character strings that "stand out"
which the user has designated by their context, e.g. words in
italics in a non-italic paragraph.
In a list context, the returned data is a table, each of whose
elements contains the text of an XML element. In a scalar context
(as in our two examples), the returned value is a unique piece of
editable text and each element's content is separated from that of
the following element by a line feed.
=head3 getTextNodes(context [, filter])
Returns the text nodes belonging (at any level) to the given context
element. So-called text nodes are low-level text runs, without
attributes, that populate text containers such as paragraphs, knowing
that a paragraph may contain one or more text nodes. For an example,
as soon as a bookmark is put within a pararaph, there is (at least) one
text node before the bookmark and another one after the bookmark.
The textnodes are returned as a list in the order of the context.
Note that a text node is not an element, but that every text node in
a regular document is a child of a text element (generally a paragraph,
a heading or a text span). So, the node-based parent() method may be
used to get the element that contains a given text node.
The second argument (optional) specifies a search filter. If it's
provided, only the matching text nodes are returned.
The example below uses getTextNodes() in order to count the text nodes
that contain "foo" and that belong to elements whose style is "bar" in
the whole document body (beware, this examples uses methods which are
introduced in the OpenOffice::OODoc::Text manual chapter):
my $context = $doc->getBody;
my @list = ();
foreach my $tn ($doc->getTextNodes($context, "foo")) {
my $style = $doc->getAttribute
($tn->parent, 'style name');
next unless $style;
push @list, $tn if $style eq "bar";
}
=head3 getUserField(name [, context])
Returns the element (if defined) representing a user-defined field,
and corresponding to the given name. See also userFieldValue().
By default, this method works with the first user field declaration
matching the given name in the whole document. However, if the calling
object is a 'styles' document part, the search is restricted to a given
context (provided through an optional 2nd argument) or to the current
context. This feature allows the applications to look for user fields
whose declarations are associated to page styles.
=head3 getUserFields([context])
Returns the list of the declared user-defined fields.
The example below prints the names of all the user-defined fields:
foreach my $field ($doc->getUserFields)
{
print $doc->getObjectName($field);
}
By default, this method returns all the user fields at the document
level. However, if the active document part is 'styles', the search is
restricted to a given context (provided through an optional 2nd
argument) or to the current context. This feature allows the
applications to look for user fields whose declarations are associated
to page styles.
=head3 getVariable(name)
Returns the user-defined variable identified by the given name.
[Contribution by Andrew Layton]
=head3 getVariableElement()
See getVariable().
=head3 getXMLContent([filehandle])
Without argument, returns a document's entire XML content.
Exports the entire XML content of the current member to a flat file,
if a file handle is provided.
Note: the exported data are UTF8-encoded.
Example:
open my $fh, ">:utf8", "myfile.xml";
$doc->getXMLContent($fh);
close $fh;
Synonym: exportXMLContent()
=head3 getXPathValue(xpath_expression)
=head3 getXPathValue(context, xpath_expression)
=head3 getXPathValue(xpath_expression, context)
A low-level method which allows direct access to the value
corresponding to the given XPath expression in a document. Character
decoding is handled in the same way as with getText.
Example:
$expression = '//office:automatic-styles' .
'/style:style' .
'[@style:style-name="P1"]' .
'/@style:parent-style-name';
print $doc->getXPathValue($expression);
This sequence displays the name of the parent style of automatic
style "P1" (if it exists within the document). Remember that more
simple methods in Text and/or Styles modules would indeed produce
the same result.
The optional element reference "context" can be given as an argument
either in first or second place. In this case, the search is limited
to the section of the document tree below this given element. The
default search area is the entire document.
Just as with other methods which require XPath paths, this one is
primarily for internal use. It should not be used by the majority of
applications.
=head3 identifier(path, pos [, value])
=head3 identifier(element [, value])
Gets or sets the identifier of the given element.
If the value argument is not provided, does the same as getIdentifier().
If provided, the value argument replaces the previous element identifier
or creates it if it was not set.
This method can change the identifier, but can't remove it, unlike
setIdentifier().
See also getIdentifier(), setIdentifier(), getElementByIdentifier().
=head3 insertElement(path, position, name/xml [, options])
=head3 insertElement(element, name/xml [, options])
Inserts a new element before or after the element specified by
[path, position] or by reference.
If the "name" argument is a literal, a new element with the name
given is created and then inserted. If the same argument is a
reference to an existing element, this element is then simply
inserted at the position indicated. This method is useful either for
adding new elements or for copying elements from one document to
another or from one position to another within the same document.
The position option allows you to choose the insertion point of the
new element. Possible values are "before", "after" and "within" (the
default is "before").
If "position" is set to "within", the new element is inserted within
the text of the target element, so an additional "offset" option (i.e.
a numeric position in the string) is required.
However, for insertion within a text container, setChildElement(),
described later, is much more powerful.
Other options are:
text => "text of element"
attributes => $attributes
The "attributes" option is itself a hash reference containing one or
more attributes in the form [name => value] as in appendElement.
When successful, this method returns the inserted element's
reference (else undef).
Example:
my $attributes =
{
'text:style-name' => 'Heading 2',
'text:level' => '2'
};
$doc->insertElement
(
'//text:p', 4, 'text:h',
position => 'after',
text => 'New section',
attribute => $attributes
);
This sequence (in a text document) inserts a level 2 header
'New section' immediately after paragraph 4.
The $name argument can be replaced by an existing element. In this
case a new reference to the existing element is inserted, without
creating a whole new element. In this way you can display an element
at several locations or in several documents which is held in memory
only once. See the appendElement section for the consequences of
having multiple references to the same physical element. Better to
use replicateElement to insert separate copies of an element.
In the same conditions as in appendElement, the 'name' argument can
be replaced by an XML string which describes the element.
Note: to add an element to the end of a document, it would obviously
be better to use appendElement(), and to insert an element at a selected
position within an existing element, see setChildElement().
=head3 isOpenDocument()
Returns 1 (true) if the current document is an OASIS Open Document.
To be used every time the application needs to know the format of
the document, knowing that some differences between the two formats
can't be completely hidden by the API.
=head3 lineBreak
Returns a special line break element, available for insertion within
an existing text element (knowing that "\n" is not recognized as a
line break if stored "as is"). The returned element is free, so it
could/should be inserted later within a text element.
=head3 makeXPath(expression)
=head3 makeXPath(context, expression)
Low-level method allowing the creation or direct modification
without restriction (almost) of any document element. It allows
"query" expressions in a language similar to XPath. If the given
XPath expression crosses several levels of hierarchy, intermediate
nodes can be created or modified "on the fly" by creating the
necessary path which in turn creates the final node.
Example:
$doc->makeXPath
(
'//office:body/text:p[4 @text:style-name="Text body"]'
);
This "query" applies the "Text body" style to paragraph 4 in the
body of the document. (In reality you will probably never use it
because the setStyle method of the Text module would do the same
thing much more simply.)
If, as in the above example, a node is accompanied by a position
indicator, it cannot be created but must simply act as a mandatory
"passage". This method cannot therefore be used to create, for
example, an Nth paragraph if there is already an N-1.
The only restrictions apply to namespaces which are given as
prefixes to element and attribute names. They must be defined in the
document i.e. conform to OpenDocument specifications. For the rest,
this method allows the creation of almost anything anywhere within a
document. Its use is reserved for OpenDocument XML specialists.
In its second form, a context node can be given as the first
argument. If present, the path is sought (and if necessary created)
starting from its position. By default, the path begins from the
root.
The returned value is the final node's reference (found or created).
The full "query language" syntax used in this method is not
documented here. makeXPath is designed to act more as a base for
other OpenOffice::OODoc methods than to be used in applications.
=head3 moveElements(target_element, element_list)
Moves a list of existing elements to a new attachment.
One more elements are cut from their previous place and appended
as children of the target element.
This method can be used to move elements from one place to another
place in the same document, as well as from one document to another
one (caution, the elements are moved, not copied).
=head3 newTextNode(text)
Creates a free text node (to be inserted later within a text element).
A text node is a piece of flat text, without any attribute, that may be
a part or the text content of an element.
Note that it's a low level method for special uses; there are various
text-oriented methods in the API (mainly described is the ::Text manual
page), and the explicit use of text nodes should be avoided.
=head3 objectName(element [, name])
Returns the name of the given element. Changes this name is a new name
is provided as the 2nd argument.
=head3 odfLocaltime()
Class method.
Converts the numeric time given in argument to an OpenOffice-compliant
date (ISO-8601). The argument type is the same as for the standard
Perl localtime() function, i.e. a number of seconds since the "epoch".
It can be, for example, a value previously returned by a time() call.
Without argument, returns the current local time in ISO-8601 format.
The result of this function can be used as is in order to set the
value of an ODF-compliant date-time element or attribute.
=head3 odfTimelocal()
Class method.
Translates an ODF-formatted date (ISO-8601) into a regular Perl
numeric time format, i.e. a number of seconds since the "epoch". So,
the returned value can be processed with any Perl date formatting or
calculation function.
Example:
my $date_created = odfTimelocal($meta->creation_date());
$lt = localtime($date_created);
$elapsed = time() - $date_created;
print "This document has been created $date_created\n";
print "$elapsed seconds ago";
This sequence prints the creation date of a document in local time
string format, then prints the number of seconds between the creation
date and now. Note that the creation_date() method used here works
with the meta-data document part only (see OpenOffice::OODoc::Meta for
details about this method).
Note: This function requires the Time::Local Perl module.
=head3 odfVersion([new_version])
See openDocumentVersion()
=head3 ooLocaltime([$time_value])
Class method.
See odfLocaltime()
=head3 ooTimelocal($oodate)
Class method.
See odfTimelocal()
=head3 openDocumentVersion([new_version])
Returns the version of the Open Document Format (ODF) in use in the
current document. If an argument is provided, it's used to set a
new version identifier.
Beware, this method doesn't really check the conformance of the
document to any version of the ODF standard. It just retrieves the
value of the version number attribute as it has been set by the
application which created or modified the document.
If openDocumentVersion() is used to set a new version number
declaration, the given value is not checked. So, this value could
be the number of a real or future ODF version (1.0, 1.1, 1.2, etc),
as well as any other arbitrary value (ex: 99, -1, ...).
=head3 raw_import(member, source)
Physically imports an external file into an OpenDocument archive
associated with an XPath object, if it exists i.e. if the object was
created using file or archive parameters. This method only transmits
the command to the OODoc::File's raw_import method. Caution: it must
not be used with an "active" element i.e. an XML member to which the
current XPath object or another XPath object is already associated.
Remember too that the import is not actually carried out by
OODoc::File until a save and the imported data is therefore not
immediately available.
=head3 raw_export(member, target)
Physically exports a member from an OpenDocument archive associated
with an XPath object, if it exists i.e. if the object was created
using file or archive parameters. This method only transmits the
command to the OODoc::File's raw_import method.
=head3 removeAttribute(path, position, attribute)
=head3 removeAttribute(element, attribute)
Deletes the "attribute" attribute (if found) of the given element by
[path, position] or by reference and returns "true". Has no physical
effect and returns undef if the attribute has not been defined or if
the element does not exist.
=head3 removeElement(path, position)
=head3 removeElement(element)
Deletes the given element (if found) by [path, position] or by
reference and returns "true". Returns undef if the element does not
exist.
=head3 removeIdentifier(path, pos)
=head3 removeIdentifier(element)
Deletes the identifier attribute ('text:id') of the given element.
Be careful, this method should be used in order to delete temporary
element identifiers that don't comply with the ODF specification;
remember that the identifier is mandatory for some elements.
See also getIdentifier(), setIdentifier(), identifier().
=head3 replaceElement(path, position, replacement [, options])
=head3 replaceElement(old_element, new_element [, options])
Deletes the given element by [path, position] or by reference and
inserts another element in its place, either from another location
in the same document or from another document.
A new element can be supplied under the same conditions as for
insertElement.
By default or by using the mode => 'copy' option, it is a copy of
the new element which is inserted. With the mode => 'reference'
option, it is only a reference which is inserted. See the section on
appendElement for comments on the subject of multiple references to
a single physical element.
=head3 replicateElement(original_element, position_object [, options]])
Makes a copy of the first given element and inserts it into the
current document at a position which depends on the second argument
and an optional parameter.
If the second argument is an existing object in the document, then
the copy is inserted according to an optional 'position' parameter:
- if no 'position' option is provided, then the copy is appended
as the last child of the position object;
- if 'position' => 'before' or 'after', then the copy is inserted at
the same hierarchical level as the position object, according to the
same logic as for insertElement().
If the second argument is not an object, but simply 'end', then the
new element is appended as the very last child of the physical root
of the document. See getRoot(). This option should generally be
avoided.
If the second argument is given as 'body', then the new element
is appended at the end of the document body (see getBody), as it was
created through appendElement().
Example:
my $template = $doc_source->selectElementByAttribute
(
'//style::style',
'style:name',
'Text body'
);
my $position = $doc_target->getElement
('//office:styles', 0);
$doc_target->replicateElement($template, $position);
This sequence adds a style 'Text body' to the style set of $doc_target
which copies exactly the style of the same name in $doc_source.
Obviously, the section of code dealing with the search for the element
to copy and its position is the most laborious. (In a real application,
thanks to OODoc::Styles, a more user-friendly coding would be allowed
for style replication.)
This method creates a new element which is an exact copy of the given
element, but which is physically separate from it.
This method is slower than simply modifying an existing element or
inserting an element reference.
If the user needs only a "free" copy of the element (out of the
document structure, to be later attached), the XML::Twig::Elt copy()
method should be preferred:
my $new_element = $old_element->copy;
=head3 resetCurrentContext()
Resets the search context to its default value, which is the root of
the document. See currentContext().
=head3 save([filename|filehandle])
Saves the content of the current document through a physical
output, that is either a regular file specified by path/name, or
an open, application-provided IO::Handle. If no argument is
provided, the document which had been used as the source (if any)
is used as the default target.
Technically, as soon as the document container is a regular ODF
file, this method is a stub for the save() method of the associated
OpenOffice::OODoc::File object, so all the related explanations and
recommendations given in the OpenOffice::OODoc::File manual chapter
apply. So, for example, be careful if the target is an open IO::Handle
instead of a file path/name.
The behaviour of this method depends on the way the current
OpenOffice::OODoc::XPath object has been created.
If the document is explicitly linked (through the 'file' option
of it's constructor) to a regular OOo or OpenDocument file, the
document is saved either in the source file, or (if a filename
is provided as an argument) in a new file.
If the document is linked to the same file interface as one or
more other OpenOffice::OODoc::XPath objects, the behaviour is
the same as in the previous case, but all the changes made by
all the linked objects are automatically saved in the target
file. Example:
my $content = odfXPath
(
file => 'source.odt',
part => 'content'
);
my $styles = odfXPath
(
container => $content,
part => 'styles'
)
my $meta = odfXPath
(
container => $content,
part => 'meta'
);
# ... a lot of content processing
# ... a lot of style processing
# ... a lot of metadata processing
$content->save('target.odt');
At the end of the sequence above, all the changes made through
the $content, $styles and $meta objects are saved in 'target.odt'
because these objects share a common file interface. Note that
in such a situation, the save() method can be issued from anyone
of the objects sharing the file interface (i.e. $content->save
could be replaced by $styles->save or $meta->save).
However, any XML part (content, styles, meta, ...) whose
'read_only' property is set to "true" is not saved. In the example
above, if, say, the $meta object is created (through odfXPath())
with a "read_only" option set to "true", only $content and $styles
are really saved by the last instruction.
If the document is not associated with a regular OpenDocument
compressed file (used through an OODoc::File object), it's saved
as "flat XML" to the given file. In such a situation, if the file name
is not provided, the source XML file (if any) is used as the target.
If the file is "flat XML", OODoc::XPath really effects the physical
output, without using any OODoc::File connector.
Note: if you need to save a document as flat XML while it's associated
with an OpenDocument file, you should use exportXMLContent() with an
application-provided file handle.
=head3 selectChildElementByName(path, position [, filter])
=head3 selectChildElementByName(element [, filter])
Returns the first (or only) element whose name matches "filter" from
within the child elements of the given element indicated by [path,
position] or by reference.
"filter" is taken to be a regular expression. If several values
match the filter, the first of these is returned (in the XML's
physical order which is not necessarily the logical order of the
document). See the comments about selectElementByAttribute if
wanting to select an exact name.
Returns undef if no elements match the condition.
Returns the first (or only) child (if there are more than one)
without anything else if no filter is given or if the filter uses
wildcards (".*").
=head3 selectChildElementsByName(path, position [, filter])
=head3 selectChildElementsByName(element [, filter])
Like selectChildElementByName, but returns a list of all elements
which match the condition.
Example:
my @search_words =
$doc->selectChildElementsByName
('//text:p', 4, 'text:span');
returns a list of elements from paragraph 4 which correspond to text
which has particular attributes which distinguish it from the rest
of the paragraph (colour, font, etc.)
=head3 selectElements([context,] path, filter)
=head3 selectElements([context,] path, filter, replacement)
=head3 selectElements([context,] path, filter, action [, arg1, ...])
Returns a list of elements corresponding to a given XPath path and
whose text matches the filter (regular expression). The "context"
argument, if given, is an element reference which limits the search
to its own child elements. The search is carried out in the entire
document by default.
An element is selected if the search string is found in its own text
or in the text of any element descended from it. E.g. An image
element (draw:image) can be selected from the value of its attached
"description" field.
You can replace all strings matching the search criteria with the
'replacement' string, on the fly, if the latter is given as an
argument after the filter.
Lastly, instead of a replacement string, you can pass a subroutine's
reference which will run (in call back mode) each time the search
string is matched. If this subroutine returns a defined value, this
value is used as the replacement string. The subroutine will
automatically receive the rest of the arguments, in this order:
Caution: this method can't retrieve a character string which is
split into more than one text element or text span. So, for example,
it will never retrieve "My String" as long as "My" and "String" are
presented with different styles, even if the two parts of the string
belong to the same paragraph.
If, as is generally the case, you are working exclusively with text
elements (paragraphs, headers, etc.), you would be better to use
selectElementsByContent() of the Text module which is easier to use
and does not require an XPath expression.
Here is an example which returns the list of images whose
descriptors contain the word "landscape" and displays the name of
each selected image:
sub printMessage
{
my $doc = shift;
my $element = shift;
my $image = $element->parentNode;
print "Name: " . $image->find('@draw:name') . "\n";
}
my @list = $doc->selectElements
(
'//draw:image/svg:desc',
'landscape',
\&printMessage,
$doc
);
Never use this example of code in a real application as it is both
purely for demonstration and unnecessarily complex. You can perform
the same operation much more simply using the OODoc::Image module.
=head3 selectElementByAttribute(path, attribute [, value [, context [, pos]]])
Like selectElementsByAttribute in a scalar context. By default, returns
the first element at the given path which has the given attribute
containing the given value. If the value is omitted, then returns the
first (or only) element that owns the attribute whatever the value.
The context optional argument allows one to restrict the search space to
a given container. The last optional argument, if set, is a positive
integer that specifies the index of the required element if more than
one element match the other conditions (beware: if the specified
position is out of range, the result is undef).
The following example (that apply with the "styles" part of an ODF
document) prints a message if the "Time New Roman" font face is
declared:
print "The Time New Roman font is defined !"
if $styles->selectElementByAttribute (
'style:font-face',
'style:name',
"Times New Roman"
);
Returns undef if no element matches the conditions.
See also selectElementsByAttribute().
=head3 selectElementsByAttribute(path, attribute [, value [, context]])
Like selectElementByAttribute(), but for an array context. Returns
all the elements that match the path/attribute/value/context conditions
as a list.
The following example selects a document section whose name is
"Foreword" then selects the list of all the level 3 headings in this
section (note that $section is used as the optional context argument
in the second instruction):
my $section = $doc->selectElementByAttribute
('text:section', 'text:name', "Foreword");
my @headings3 =
$doc->selectElementsByAttribute
('text:h', 'text:outline-level', 3, $section);
(But remember that the same result could be got without knowledge of the
XML tags and attributes using more user-friendly methods introduced in
other manual chapters !)
See also selectElementByAttribute().
=head3 selectFrameElementByName(name)
Selects the first frame element whose name is exactly the given
argument. A frame is an OpenDocument container which can host a
rectangular object, such as an image or a text box.
=head3 selectNodesByXPath(xpath_expression)
This low-level method returns a list of nodes (which are not
necessarily elements) which match the give XPath expression. See
getNodeByXPath() for options and comments.
=head3 setAttribute(path, position, attribute, value)
=head3 setAttribute(element, attribute, value)
Modifies or adds an attribute to an element.
The element is indicated by reference or by [path, position].
The following arguments are the attribute name and the value.
If the name is provided without namespace prefix, it's automatically
concatenated to the element's namespace prefix. Every space in the
attribute name, if any, is automatically replaced by a '-'.
If the value is undef, the corresponding attribute is deleted if it
exists in the element; nothing is done otherwise.
=head3 setAttributes(path, position, attributes_table)
=head3 setAttributes(element, attributes_table)
Modifies or adds one or more attributes to an element.
The element is indicated by reference or by [path, position].
The list of attributes is given in the form of a hash name => value.
Example:
my $h = $doc->getElement('//text:h', 12);
$doc->setAttributes(
$h,
'text:style-name' => 'My Header',
'text:level' => 3
);
This sequence gives the 'My Header' style and level 3 to the 13th
"header" element in the document.
Any attribute name provided without namespace prefix is automatically
concatenated with the namespace prefix of the target element. So, the
"text:" prefix could have been omitted in the attribute hash of the
example above. In addition, every space in an attribute name is
automatically replaced by a '-'. So the code below produces the same
result as the previous example:
my $h = $doc->getElement('//text:h', 12);
$doc->setAttributes(
$h,
'style name' => 'My Header',
'level' => 3
);
An attribute provided as undef is deleted, if it exists.
=head3 setChildElement(context, tag/element [, options])
Creates a new child element within the text content of an existing one.
The context element may be provided like with insertElement(), either
by [path, position] or directly as the 1st argument. The next argument
is the XML tag of the element to be created, or an existing free
element.
The given context may be any element, including the whole document body;
however, it should be a simple text container in most cases.
If the provided tag doesn't include a namespace prefix, it's
automatically concatenated with the namespace prefix of the context
element (provided as 1st argument). In addition, every space (" ") is
regarded as a "-". For example, knowing that the ODF names of a line
break and a tab stop are respectively 'text:line-break' and 'text:tab',
they may be specified as 'line break' and 'tab' when they are inserted
in a regular text paragraph (that is their right place).
For alternative and very specific purposes, the tag argument may be
replaced by a function reference. If so, the corresponding application-
provided function will be triggered with the following arguments: the
containing document, a text node, a position, and possibly a string
(this last argument will be provided if setChildElement() is called with
a 'replace', 'after', 'before' or 'capture' argument introduced below
and will contain the matching substring). The application-provided
function is supposed to insert one or more contiguous new elements in
the text node at the given position (optionally using the given
substring); it must return an element. However, most users may safely
forget this feature...
Allowed options are
attributes => attribute/value hash for the new element
text => text content for the new element
offset => position
after => search string (regexp)
before => search string (regexp)
replace => search string (regexp)
capture => search string (regexp)
way => search way ('forward' or 'backward')
start_mark => element
end_mark => element
Some of them are mutually exclusive. They work according to the
following logic.
By default, the new element is created without text and attributes.
However, an initial content may be provided through a 'text' optional
parameter. In addition, a 'attributes' option allows one to provide a set
of attributes for the new element as a hash reference; note that every
attribute name provided without namespace prefix is automatically
concatenated with the same namespace prefix as the given element name.
The child element may be inserted at the beginning, at the end, or at a
position within the text content. In the last case, the position may be
specified by a given numeric argument, or looked for according a given
expression.
By default, the new element is inserted at the beginning of the target
element. An arbitrary other position may be specified with the 'offset'
argument, that is either an integer (positive or negative) value, or one
of the 'start' and 'end' special indicators. If 'offset' is set to
'start' or 'end', the new element is inserted at the start or at the
end, and the other position options are ignored. If 'offset' is a
negative integer, the position is counted backward from the end.
Caution: if the text of the target container includes tab stops and/or
multiple contiguous spaces, the effective offset will be larger than
the given one (because ODF tab stops and multiple spaces are special
markup elements and not characters).
Whatever the value of 'offset', a 'way' option, whose possible values
are 'forward' (the default) or 'backward', specifies the search way.
If 'offset' is negative, the 'way' option is ignored because the
way is always backward. If 'offset' is positive and 'way' is 'backward',
then the result is the same as if 'offset' was negative. If 'offset'
is 0 or not set and 'way' is 'backward', then the search is done
backward from the end.
A search string may be provided instead of or in combination with an
offset. If so, the insert point will depend on the position of the first
substring that matches the given optional search expression (if any).
The search expression may be provided through the 'after', 'before',
'replace' or 'capture' option. An expression provided with 'after' or
'before' means that the insert point is immediately after or before the
first matching substring. If the search string is provided through
'replace' or 'capture', the matching string will be replaced by the new
element. If the option is 'replace' the matching string is just deleted
while if 'capture' the same matching string is moved in the new element.
Of course, these search string options are mutually exclusive; if more
than one of them are wrongly set, only one is considered, and the
priority order is 'after', 'before', 'replace', and 'capture'. If both
'capture' and 'text' are set, the result is the same as with 'replace'
and 'text'.
If the insertion point depends on a search string (i.e. if 'after',
'before', 'replace' or 'capture' is used), it's selected according to
the first match. However, it's possible to reverse the search way using
the 'way' option. In addition, the search area may be restricted by the
'offset' parameter: if 'offset' is used in combination with any search
string option, it specifies the limit of the search area instead of
a insertion point; if 'offset' is positive and 'way' is 'forward' (or
not set), the search is done from 'offset' to the end; if 'offset' is
negative or 'way' is 'backward', the search is done backward from the
given offset to the beginning.
The 'start_mark' optional parameter is a child element that already
exists within the context element. If this parameter is set, it
specifies that the search will start from the position of this child
element and not from the beginning of the end of the context element.
If the search way is forward, the insert point (in case of success)
will be located after the start mark, but if the search way is backward
the insert point will be before the start mark. And if an offset is
provided, it's counted from the position of the start mark.
Another existing child element may be used in order to restrict the
search area, through a 'end_mark' parameter. If this parameter is set,
no search will be done beyond it. If both 'start_mark' and 'end_mark'
are provided, the search will run from the first one to the second one
Of course, if the start mark is located after the end mark, nothing
will be done if the search way is not backward, and vice-versa.
The following example inserts a new 'text:time' element (i.e. an ODF
time field) immediately after the first "Clock:" substring appearing
between the 20th character and the end of a given paragraph (specified
by the 1st argument). The new element will be a 'text:time', knowing
that the namespace prefix of a paragraph element (text:p) is "text".
According to the given attributes, the field will display the current
time increased by 15 minutes:
$doc->setChildElement(
$paragraph, 'time',
offset => 20,
after => "Clock:",
attributes => {
'time-value' => odfLocaltime()
}
);
The variant below creates a the same 'time' field after each occurrence
of "Clock:" (probably not very useful, but the aim is to illustrate the
use of 'start_mark' in order to ensure that every field but the first
one will be inserted after the previous field):
my $field = undef;
while (
$field = $doc->setChildElement(
$paragraph, 'time',
after => "Clock:",
start_mark => $field,
attributes => {
'time-value' => odfLocaltime()
}
) {}
Note that the loop body is empty; the start mark, which is undef at the
first round, is then the previously inserted child element.
Caution: without carefully designed offset and/or search option, such
a construct may produce a long or infinite loop (until memory fault);
in addition, the setChildElements() method (see below) is generally
more appropriate for such repetitive element insertions.
The next example creates a text span (i.e. a text area with a special
character style) for the last "ODF" substring of a given paragraph:
$doc->setChildElement(
$paragraph, 'span',
capture => "ODF",
way => 'backward'
attributes => {
'style-name' => "My Style"
}
);
These examples are shown to illustrate the general logic, not
necessarily to be reproduced in real applications, knowing that
setChildElement() is a common basis for more specialized methods
(mainly introduced in the OODoc::Text man page).
See also splitContent().
=head3 setChildElements(context, tag/element [, options])
Like setChildElement() but with a repetitive effect that depends on the
options.
If 'offset' is the only one option, it's used at a regular interval
between insert points. If one of the search string options ('after',
'before', 'capture', or 'replace') is set, 'offset' is used once for all
to exclude an area from the search space, and not as an interval between
the new elements. The other options work like with setChildElement().
The example below inserts a line break after every ";" in a given
paragraph (remember that an ODF line break is an element; it's neither
an end of paragraph nor a "\n" character):
$doc->setChildElements(
$paragraph, 'line break',
after => ";"
);
=head3 setFlatText(path, position, text)
=head3 setFlatText(element, text)
Like setText() described below, but without translation of "\t"
and "\n".
For exceptional use only. Allows, for example, the use of the OODoc
API with non-OpenDocument XML files.
=head3 setIdentifier(path, pos, value)
=head3 setIdentifier(element, value)
Sets (or resets) the identifier of the given element. The identifier is
namely the 'text:id' attribute, that is allowed for some elements and
not for other elements by the ODF standard. OpenOffice::OODoc allows it
with any kind of element, and doesn't check its uniqueness, so it may be
used with care. A non-conformant element identifier is not an issue if,
for example, it's removed before editing or processing the resulting
documents through another application.
This method removes the identifier if the value argument is undef;
however the removeIdentifier() method produces the same result in a
more self-documented way).
=head3 setObjectCoordinates(object, coordinates)
Updates or creates the coordinates (X, Y) attributes of a visible
object (ex: image, text box, frame). See createFrameElement() for the
coordinates units and notation.
=head3 setObjectDescription(object, description)
Updates or creates the litteral description of the given object.
Should be used for frames, images or text boxes. Caution: the
description is not the same as the printable content of a text
box.
=head3 setObjectName(element, name)
Sets or changes the name of the given element according to the given
new name. Deletes the name if the given name is undef.
=head3 setObjectSize(object, size)
Updates or creates the width and height attributes of a given object.
This method makes sense for visible, rectangular objects only, such
as the frames, images or text boxes.
See createFrameElement() for details about the size units and
notation.
=head3 setRangeMark(type, identifier, parameters)
Creates a pair of corresponding delimiting markup elements in place, in
order to set up an identified text range (such as a range bookmark,
an index mark or a table of content mark).
The first argument specifies the type of range; it's mandatory but its
value is not checked. Examples of legal types are 'bookmark',
'toc-mark', 'alphabetical-index-mark'. If the provided type doesn't
contain a semicolon, it's automatically prefixed according to the
content of the 'prefix' parameter (whose default is 'text').
The identifier argument id mandatory; it's an arbitrary (preferently
unique) identifier for the pair. While this identifier is generally
invisible for the end-user, it's sometimes an explicit name (for
example in a range bookmark).
The 'prefix' optional parameter allows the applications to specify a
particular XML prefix; the default prefix for range marks is 'text'.
An arbitrary set of attributes may be provided as a hash through an
optional 'attributes' parameter. This hash will be processed according
to the same logic as with the common setAttributes() method.
The 'context' optional parameter, if provided, specifies the element
(which should be a text container, such as a paragraph, a heading or a
text span) containing the text range to be delimited. However, if
the covered text range is spread across two or more text containers,
this parameter must not be set, and a separate 'context' parameter must
be provided for the start mark and the end mark (see below).
If (and only if) the 'context' parameter is set (meaning that the whole
text content between the marks belongs to the same element), a
'content' optional parameter allows one to provide an expression; if so,
the setRangeMark() will look for the first substring that matches this
expression in the target element, and in case of success the range marks
will be inserted at the beginning and the end of this substring. The
search space of the substring may be restricted using the 'offset' and
'way' parameters, according to the same rules as setChildElement().
Note that the 'replace', 'before' and 'after' parameters don't apply
with setRangeMark().
Unless 'context' and 'content' are defined, there are two mandatory
parameters, namely 'start' and 'end'; each one is a hash of parameters
that apply to the start mark and the end mark, respectively. Each one
allows the same options as the option hash of setChildElement(), i.e.
'offset', 'before', 'after', 'replace' and/or 'way' as described above.
Note that the 'start' and 'end' structures are ignored as soon as
the 'context' and 'content' parameters are set at the first level.
In addition, if the start and end marks are not contained in the same
text element, separate 'context' parameters must be provided with each
one of the 'start' hash and the 'end' hash. However, if the 'end' hash
doesn't contain any 'context' parameter, the end mark is supposed to
be in the same container as the start mark.
The method returns the new start and end marks as a list of elements
in array context, or the start mark only in scalar context.
In case of failure (due to wrong parameters), both are undef, knowing
than setRangeMark() creates the full pair of marks or nothing. Note
that the optional attributes (provided through the 'attributes'
parameter) are stored in the start mark element only.
By default, nothing prevents the applications from creating a range mark
whose start point is (temporarily or not) located after the end point,
so introducing an inconsistency. However, it's possible to set a 'check'
boolean option; if this option is 'true', an order check is done and, if
something is wrong, the range mark creation is cancelled and the method
fails. On the other hand, as long as the application may ensure that it
the start will always be set before the end, the order check should be
avoided for performance reasons.
Caution: The relative positions of the two marks are not checked, so
nothing prevents the users from creating a range whose start point
is (temporarily or not) located after the end point in the document.
The applications should ensure that the 'start' and 'end' options
really specify two locations in the right order.
The following instruction creates an index mark covering a
text area within a single paragraph (previously selected); the range
starts before the "abc" substring and ends after the "xyz" substring;
the mark identifier is 'ind1234'. Nothing is done if one of these
substrings is not present in the target element:
$doc->setRangeMark(
'alphabetical-index-mark', 'ind1234',
element => $paragraph,
start => { before => "abc" },
end => { after => "xyz" }
);
The next example creates a range bookmark (i.e. a bookmark covering
a text area) that starts before the "abc" substring in a paragraph
and ends at the end of another paragraph:
$doc->setRangeMark(
'bookmark', 'bm0001',
start => { element => $p1, before => "abc" },
end => { element => $p2, offset => 'end' }
);
=head3 setText(path, position, text)
=head3 setText(element, text)
Uses the given text as the content of the given element.
Any previous content (including formatting markup, bookmarks,
notes, references, etc) is replaced by the given text.
If the given text includes tab stops ("\t") or line breaks ("\n"),
they are replaced by the appropriate OpenDocument tags. If this
translation must be avoided, use setFlatText() instead.
Note: The strings containing repeated whitespaces are not properly
processed by default. A sequence of repeated spaces, whatever its
length, is replaced by a single space in the target document. So
$doc->setText($p, "Begin End");
produces the same visible result as
$doc->setText($p, "Begin End");
It's possible to override this default behaviour using the
'multiple_spaces' document property. If 'multiple_spaces' is
set to 'on', the repeated spaces in the example above are properly
recorded. However, this optional feature is a the price of some
other features and, above all, it have a negative impact on the
performances (due to an additional processing of *every* space).
Of course, a temporary activation of the 'multiple_spaces'
feature is allowed, like in the following example, which sets
a content including multiples whitespaces:
$doc->{'multiple_spaces'} = 'on';
$doc->setText($p, "Begin End");
$doc->{'multiple_spaces'} = undef;
See spaces() and extendText() for a workaround if you
need to insert repeated spaces without using the 'multiple_spaces'
property.
=head3 setUserFieldDeclaration(name [, options])
Creates a new user field declaration in the document.
The optional parameter are:
'type' => data type (default 'string')
'value' => initial value (default "")
'currency' => a 3-letter currency code (ex: EUR, USD...)
See also setTextField() in OpenOffice::OODoc::Text.
=head3 spaces(length)
Returns a special element, available for insertion within a text
element, representing repeated contiguous blank spaces (knowing
that repeated spaces can't be properly displayed by an OpenDocument-
compliant application if stored as a flat string). The returned
element is free, so it could/should be inserted later within a text
element. See extendText() for an example of use.
=head3 splitContent(path, pos, tag, expression [, attributes])
=head3 splitContent(element, tag, expression [, attributes])
Moves some parts of the text content of the given element and its
descendants in new child elements.
The tag argument specifies the XML tag of the child elements to be
created. Unless this tag is provided with a namespace prefix (or more
precisely unless it contains a semicolon), it's automatically
concatenated with the namespace prefix of the host element.
The following argument is a regular expression that specifies the text
substrings to wrap in the new elements. An element is created for every
match in the context element and, if any, in its existing children.
After these arguments, additional attribute/value pairs may be
optionally provided; each one will become an attribute for every created
child element (the same name and attributes apply to all). Every
attribute name provided without namespace prefix is automatically
concatenated to the same namespace prefix as the new elements.
This method returns the new child elements as a list.
Note that splitContent() is a simplified interface for the mark()
method provided by XML::Twig, which may be directly used as an element
method for more advanced uses.
=head3 splitElement(element, offset)
Splits a text element at a given offset. This method is a wrapper
of the XML::Twig::Elt split_at() method, so, as said by Michel
Rodriguez in his documentation, it splits "a text element in 2" at
the given offset so "the original element now holds the first part
of the string and a new element holds the right part".
In addition, the new element is created with the same attributes (ex:
the style or the heading level, if any) as the original one.
The new element is inserted immediately after the old one.
The method returns both the original and the new elements in a list
context. In a scalar context, the new element only is returned.
Caution: splitElement() works properly on elements containing "flat
text" only. It's a bit complicated to use and probably doesn't
produce the right effects on elements containing line breaks, tab
stops, "styled spans" or any kind of structure. If it's used with an
element containing more that one text segment, it works with the first
one only.
=head3 tabStop
Returns a special tabulation mark element, available for insertion
within an existing text element (knowing that "\t" is not recognized
as a tab stop if stored "as is"). The returned element is free, so
it could/should be inserted later within a text element.
=head3 userFieldValue(user_field [, value])
Reads the stored value of a given user field or changes it if a
value is provided. The 1st argument can be either the name of the
field (as it appears for the end-user) or a previously loaded
user field element. See also getUserField().
This method doesn't create any new user field. It can only read or
update an existing one.
If the given user field is numeric (ex: date, currency) the returned
and/or provided value is the internally stored value, and not the
displayed one.
Warning: the changes made in a document using userFieldValue() don't
necessarily produce visible changes for the end-user. This method
can update the internal value of a field, but the displayable
representations of this field are not automatically refreshed (it
depends on a later field update).
=head3 variableValue(name/element [, newvalue])
Returns the current value of the given user-defined variable or, if
a new value is provided as the second argument, updates the variable
accordingly.
[Contribution by Andrew Layton]
=head2 Element methods
Every document element is an OpenOffice::OODoc::Element object,
and OpenOffice::OODoc::Element inherits all the rich features of
XML::Twig::Elt, including the very powerful copy(), cut(), paste(),
move() and replace() methods (look at the XML::Twig documentation
for details). Some additional methods, provided in the ::Element
package, are described below.
The "element methods" should be regarded as reserved for advanced
uses, possibly in combination with native XML::Twig::Elt methods
(not documented here, but the XML::Twig package itself is well
documented).
Remember these methods belong to the element and not to the
document...!
=head3 appendChild(newnode)
Appends a node as the last child of the calling node.
If the argument is an existing node, it's appended as is.
If the argument is a string, a new node is created, with the
given string as the XML tag name.
=head3 appendTextChild(text)
Appends a text node (PCDATA) as the last child of the calling
element.
=head3 flatten()
Converts in place the content of the calling element to a flat string,
removing any structure. All the children of the calling element are
removed and their text content is concatenated. The resulting string
becomes the only content of the element. For example, if the calling
element is a table, the tabular structure disappears and is replaced
by the concatenated contents of all the cells. Any possible internal
tab stop or line break element is removed, as well as any "styled"
text span (see setSpan() and removeSpan() is the OODoc::Text chapter
for information about styled text spans).
Be careful, a lot of elements are not displayed by the OpenDocument
compliant software. For example, a section element becomes invisible
if it directly contains its text, without structure elements such as
paragraphs, headings, tables, and so on. In order to make visible the
"flattened" content of a previously complex element, the XML tag
should be replaced by the tag of a "displayable" element. In the
following example, a section is flattened, then tagged as a
paragraph, so its content remains visible:
my $s = $doc->getSection("AnySection");
$s->flatten;
$s->set_tag('text:p');
Note: getSection() belongs to OpenOffice::OODoc::Text and set_tag()
is provided by the underlying XML::Twig::Elt package.
The text flattening is sometimes required in order to allow the
applications to retrieve strings which are split into more than one
text container. For example, a string such as "OpenDocument" can't
be retrieved using selectElements() or any other string search method
of the API if, say, "Open" and "Office" don't belong to the same text
span (i.e. if they have different styles; look at setSpan() in
OpenOffice::OODoc::Text to know more about text spans). In such a
situation, flatten() removes any text span markup, so the whole text
content of the element can be processed as a regular character string.
Caution, this method can produce terrific results when misused.
=head3 getLocalPosition([regexp])
Returns the position of the current element in the list of all
the children of the same parent with the same type.
Example:
$cell->getLocalPosition();
Assuming $cell is a table cell, this example returns the position
of the cell in the row without counting the covered cells (if any).
If a regular expression is provided as the optional argument, all
the siblings matching the expression are counted; but the method
returns zero if the calling element itself doesn't match the
expression.
Example:
$cell->getLocalPosition(qr'table:(covered-|)table-cell');
returns the position of the cell among all the cells (covered or not)
in the row.
Note: This method is a wrapper of the pos() method of XML::Twig::Elt,
but the returned values are zero-based in order to be consistent
with the other element addressing features of OpenOffice::OODoc.
=head3 insertNewNode(xml_tag, position_flag [, offset])
Creates a new XML element, whose tag is passed as the 1st argument,
before, after or within the calling element. The 2nd argument
must be set to 'before', 'after', 'within', or any other value
accepted by the paste() method of XML::Twig. If the 2nd argument
is 'within', a 3rd one must be provided and indicate the offset.
=head3 replicateNode(count, position)
Produces one or more copies of the calling element and inserts
the copies before or after it. The position argument should be
'before' or 'after'; its default is 'after'. Technically, the
position argument could be anyone of the position options of
the XML::Twig::Elt->paste method, including 'first_child',
'last_child' or 'within'; but any other than 'before' and 'after'
probably don't make sense in an OpenDocument-compliant data
structure.
Without any argument, the calling element is replicated once.
But if the count argument is provided and set to zero or a
negative value, nothing is done.
Example :
my $row = $doc->getTableRow("Table1", -1);
$row->replicateNode(5);
This sequence appends 5 more rows to a table; each new row is a
copy of the last original row, including each individual cell
and its content.
=head3 selectChildElement(filter)
Like selectChildElements() below, but returns only the first node
matching the filter.
Note: the first_child() method of XML::Twig::Elt should be preferred
when the filter is the exact tag name of the needed element.
=head3 selectChildElements(filter)
Selects the children with XML tag names matching a given filter.
The filter is processed as a regexp.
Note: the children() method of XML::Twig::Elt should be preferred
if the filter is the exact tag name of the needed elements.
=head3 textLength()
Works with text nodes. In array context, returns the length of the text
and the text itself; in scalar context, returns the length only.
=head2 Properties
No class variables are exported; the applications, if needed,
must access them using their full name ($OpenOffice::OODoc::XPath:XXX)
The following names should be prefixed explicitly with
"$OpenOffice::OODoc::XPath::"
CHARS_TO_ESCAPE
contains the list of reserved characters which, in XML, should be
replaced by escape sequences.
OO_CHARSET
indicates the character set used for OpenDocument document
encoding and whose default value is 'utf8' (it should not be changed).
LOCAL_CHARSET
indicates the user's character set, by default 'iso-8859-1'; it must
be changed according to the real user's needs (warning: there is no
kind of automatic adaptation to the user's locales, so the application
must explicitly load the right value in this variable); it should be
done using the odfLocalEncoding() accessor (see the OpenOffice::OODoc
man page and, for the list of supported character sets, the Encode
module's documentation).
The content of these three variables should not normally be directly
modified by the applications.
Instance hash variables are :
'container' => <oodoc_file_object>
'file' => <OpenDocument file>
'part' => <name of the XML part in the ODF package>
'readable_XML' => <'true' or 'false'>
'local_encoding' => <user's output encoding>
'multiple_spaces' => <'on' or undef, see setText()>
'element' => <name of loaded XML element>
'xpath' => <XML::Twig, XPath-capable object>
'twig_options' => <XML::Twig options as a hash reference>
'opendocument' => <'true' or 'false'>
However, the 'xml' variable is cleared almost immediately after a
successful constructor call, in order to save memory. As soon as the
corresponding XPath object has been created, the XML source is no
longer required.
The 'xpath' variable of an OODoc::XPath object contains a reference
to the document structure as it's made available through XML::Twig
(see CPAN documentation). This object encompasses the entire current
XML tree. Each access to XML using OODoc::XPath objects is done via
XML::Twig. So, after having run the following command:
my $xp = $doc->{'xpath'};
the experienced programmer will be able to use $xp to access all the
functionality of the XML::Twig API, bearing in mind that all
operations using this interface will have a direct effect on the
content of the $doc object.
'twig_options' allows the user to provide a hash reference of
additional options to XML::Twig. These options can modify the way the
document is parsed during the execution of odfXPath(). For special
applications only (see the XML::Twig reference manual).
The 'opendocument' property, if true, means that the document is
declared as an OASIS Open Document. If this property is false or
undef, the document format is OpenOffice.org version 1. This property
should not be changed (as long as OpenOffice::OODoc can't change the
format of an existing document).
=head1 AUTHOR/COPYRIGHT
Developer/Maintainer: Jean-Marie Gouarne L<http://jean.marie.gouarne.online.fr>
Contact: jmgdoc@cpan.org
Copyright 2004-2010 by Genicorp, S.A. L<http://www.genicorp.com>
Initial English version of the reference manual by Graeme A. Hunter
(graeme.hunter@zen.co.uk).
License: GNU Lesser General Public License v2.1
=cut
|