/usr/bin/tv_grab_uk_rt is in xmltv-util 0.5.63-2.
This file is owned by root:root, with mode 0o755.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 | #!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
if 0; # not running under some shell
use warnings;
use strict;
use XMLTV::ProgressBar;
use XMLTV::Memoize; XMLTV::Memoize::check_argv('get_octets');
use XMLTV::Supplement qw/GetSupplement SetSupplementRoot/;
use XMLTV::Options qw/ParseOptions/;
use XMLTV::Configure::Writer;
use XMLTV::Ask;
use File::Path;
use File::Basename;
use LWP::UserAgent;
use HTTP::Cache::Transparent;
use Encode qw/decode encode/;
use DateTime;
use DateTime::Duration;
use DateTime::TimeZone;
use HTML::Entities;
use IO::Scalar; # used for configuration to write channels to string
use XML::LibXML; # used for lineups parsing/modification
##############################################
#################### TODO ####################
##############################################
# - wrap date/file tests in evals if there is a risk of failure
#
# - remove fixups from prog_titles_to_process that are now handled automatically
# e.g. duplicated title and/or episode in episode field
#
# - audio tag for audio described channels
#
# - include actor role data where available
###############################################
################## VARIABLES ##################
###############################################
# Grabber name
my $grabber_name = 'tv_grab_uk_rt';
my $grabber_spc = ' ';
# Grabber version
my $grabber_cvs_id = '$Id: tv_grab_uk_rt,v 1.28 2012/06/13 08:06:58 knowledgejunkie Exp $';
my $grabber_version;
if ($grabber_cvs_id =~ m!\$Id: [^,]+,v (\S+) ([0-9/: -]+)!) {
$grabber_version = "$1, $2";
}
else {
$grabber_version = "Unknown";
}
# Location of Radio Times channel index file
my $rt_root_dir = 'http://xmltv.radiotimes.com/xmltv';
my $rt_channels_uri = "$rt_root_dir/channels.dat";
# The format of the Radio Times source data (set to strict UTF-8)
my $source_encoding = "utf-8";
# Default XML output encoding to use (set to strict UTF-8). May be updated
# based on contents of configuration file
my $xml_encoding = "utf-8";
# Required to be displayed by Radio Times
my $rt_copyright
= "\n"
. " +-----------------------------------------------------+ \n"
. " | In accessing this XML feed, you agree that you will | \n"
. " | only access its contents for your own personal and | \n"
. " | non-commercial use and not for any commercial or | \n"
. " | other purposes, including advertising or selling | \n"
. " | any goods or services, including any third-party | \n"
. " | software applications available to the general | \n"
. " | public. <xmltv.radiotimes.com> | \n"
. " +-----------------------------------------------------+ \n"
. "\n";
my %tv_attributes = (
'source-info-name' => 'Radio Times XMLTV Service',
'source-info-url' => 'http://www.radiotimes.com',
'source-data-url' => "$rt_channels_uri",
'generator-info-name' => "XMLTV/$XMLTV::VERSION, $grabber_name $grabber_version",
'generator-info-url' => 'http://www.xmltv.org',
);
# Reciprocal XMLTV/RT ID hashes for the required channel_ids fields, allowing
# RT ID -> XMLTV ID and XMLTV ID -> RT ID lookups
my (%rt_to_xmltv, %xmltv_to_rt);
# Hashes for the optional channel_ids fields, keyed by XMLTV ID
my (%extra_dn, %icon_urls, %channel_offset, %broadcast_hours, %video_quality);
# Do the progress bars need a final update?
my $need_final_update;
#type id source-data-url generator-info-name generator-info-url
my %xmltv_lineup_attributes = (
'type' => 'DVB-T',
'version' => '1.00',
'id' => 'freeview.co.uk',
'source-data-url' => 'tv_grab_uk_rt FreeView channels',
'generator-info-name' => "XMLTV/$XMLTV::VERSION, $grabber_name $grabber_version",
'generator-info-url' => 'http://www.xmltv.org',
);
# Lineup writer
my $lineup_writer;
# Get default location to store cached listings data
my $default_cachedir = get_default_cachedir();
# Set up LWP::UserAgent
my $ua = LWP::UserAgent->new;
$ua->agent("xmltv/$XMLTV::VERSION");
$ua->env_proxy;
# Read all command line options
my ( $opt, $conf ) = ParseOptions( {
grabber_name => "$grabber_name",
version => "$grabber_cvs_id",
description => "United Kingdom/Republic of Ireland (Radio Times)",
capabilities => [qw/baseline manualconfig cache preferredmethod tkconfig apiconfig lineups/],
defaults => { days => 15, offset => 0, quiet => 0, debug => 0 },
preferredmethod => 'allatonce',
load_old_config_sub => \&load_old_config,
stage_sub => \&config_stage,
listchannels_sub => \&list_channels,
list_lineups_sub => \&list_lineups,
get_lineup_sub => \&get_lineup,
} );
################################################################
# At this point, the script takes over from ParseOptions #
################################################################
###############################################
############### GRAB THE DATA #################
###############################################
die "Error: You cannot specify --quiet with --debug, exiting"
if ($opt->{quiet} && $opt->{debug});
if (defined $conf->{lineup}) {
say("Channel selection: Lineup\n") if (! $opt->{quiet});
}
elsif (defined $conf->{channel}) {
say("Channel selection: Config file\n") if (! $opt->{quiet});
}
else {
print STDERR "No configured channels in config file ($opt->{'config-file'})\n" .
"Please run the grabber with --configure.\n";
exit 1;
}
# New-style config files must include a cachedir entry
if (not defined( $conf->{cachedir} )) {
print STDERR "No cachedir defined in configfile ($opt->{'config-file'})\n" .
"Please run the grabber with --configure.\n";
exit 1;
}
# Update encoding if seen in new-style config file
if (defined( $conf->{encoding} )) {
$xml_encoding = $conf->{encoding}[0];
}
# Enable title processing? Enable it by default if not explicitly disabled
my $title_processing;
if (defined( $conf->{'title-processing'} )) {
$title_processing = $conf->{'title-processing'}[0];
}
else {
$title_processing = 'enabled';
}
# Enable UTF-8 fixups? Enable by default if not explicitly disabled
my $utf8_fixups_status;
if (defined( $conf->{'utf8-fixups'} )) {
$utf8_fixups_status = $conf->{'utf8-fixups'}[0];
}
else {
$utf8_fixups_status = 'enabled';
}
# Initialise the cache-directory
init_cachedir( $conf->{cachedir}[0] );
# Set cache options
#
# MaxAge set to 15 days as Radio Times provides 14 days of listings
# NoUpdate set to 1hr as Radio Times data updated once per day
#
HTTP::Cache::Transparent::init( {
BasePath => $conf->{cachedir}[0],
MaxAge => 15*24,
NoUpdate => 60*60,
Verbose => $opt->{debug},
ApproveContent => \&check_content_length,
}
);
# A reusable 1 day duration object
my $day_dur = DateTime::Duration->new( days => 1 );
# Variables for programme title manipulation
my $have_title_data = 0;
my %non_title_info; # key = title, value = title
my %mixed_title_subtitle; # key = title, value = title
my @mixed_subtitle_title; # array
my %reversed_title_subtitle; # key = title, value = title
my %replacement_titles; # key = old title, value = replacement title
my %replacement_episodes; # key = title, value = hash (where key = old ep, value = new ep)
my %replacement_cats; # key = title, value = category
my %replacement_title_eps; # key = 'old_title . '|' . old_ep', value = (new_title, new_ep)
my %replacement_title_desc; # key = 'old_title . '|' . old_ep' . '|' . 'old_desc', value = (new_title, new_ep)
my %flagged_title_eps; # key = old_title from title fixup routine 8
my %dotdotdot_titles; # key = replacement title ending with '...' seen in title fixup routine 8
my %replacement_ep_from_desc; # key = title, value = hash (where key = desc, value = new ep)
my %demoted_title; # key = title, value = title
my %uc_prog_titles; # key = title, value = title
my %title_in_subtitle_fixed; # key = 'title . '|' . episode', value = hashref (keys are title and episode)
my %title_ep_in_subtitle_fixed; # key = 'title . '|' . episode', value = hashref (keys are title and episode)
my %title_in_subtitle_notfixed; # key = 'title . '|' . episode', value = hashref (keys are title and episode)
my %colon_in_subtitle; # key = 'title . '|' . episode', value = hashref (keys are title and episode)
# Create global hashes to store programme/film titles for all programmes on all
# channels, as we will process these lists after grabbing to determine any
# titles which may need to be 'fixed up'
my %prog_titles;
my %film_titles;
# hash to store case/punctuation-insensitive variants of titles
my %case_insens_titles;
# Hash to store bad character strings and their replacments that are used when
# processing the source data to remove mis-encoded UTF-8 characters
my %utf8_fixups;
# Create hashes to store names/urls of channels with occurences of mis-encoded
# UTF-8 data after our replacement routines have run
my %hasC27F9Fchars;
my %hadEFBFBD;
my %hadC3AFC2BFC2BD;
# Create hashes to store uncategories programmes and available categories
# to potentially use for such programmes
my %uncategorised_progs;
my %categories;
my %cats_per_prog;
# Create hashes to store episode details that may still contain series, episode
# or part numbering after processing to handle these has been carried out
my %possible_series_nums;
my %possible_episode_nums;
my %possible_part_nums;
# Hash to store titles containing text that should likely be removed
my %title_text_to_remove;
# Hash to store details of empty source listings
my %empty_listings;
# Track problems during listings retrieval. Currently we exit(1) only if
# listings data is missing for any requested channels
my $chan_warnings = 0;
my $prog_warnings = 0;
# Output XMLTV library and grabber versions
if (! $opt->{quiet}) {
say("Program/library version information:\n");
say("XMLTV library version: $XMLTV::VERSION");
say("$grabber_name version: $grabber_version");
say(" libwww-perl version: $LWP::VERSION\n");
}
# Determine the modification time of the source data on the RT servers
my $rt_mod_time = get_mod_time($rt_channels_uri);
if ($rt_mod_time) {
say("\nSource data last updated on: " . $rt_mod_time . "\n") if (! $opt->{quiet});
$tv_attributes{'date'} = $rt_mod_time;
}
# Retrieve list of all channels currently available
my $available_channels = load_available_channels($conf, $opt);
# Now ensure configured channels are still available to download
my $wanted_chs = get_wanted_channels_aref($conf, $opt, $available_channels);
# Configure output and write XMLTV data - header, channels, listings, and footer
my $writer;
setup_xmltv_writer($conf, $opt);
write_xmltv_header();
write_channel_list($wanted_chs);
write_listings_data($conf, $opt, $wanted_chs);
write_xmltv_footer();
# Print debug info for titles, categories, bad utf-8 chars
if ($opt->{debug}) {
print_titles_with_colons();
print_titles_with_hyphens();
print_new_titles();
print_uc_titles_post();
print_title_variants();
print_titles_inc_years();
print_titles_inc_bbfc_certs();
print_flagged_title_eps();
print_dotdotdot_titles();
print_title_in_subtitle();
print_categories();
print_uncategorised_progs();
print_cats_per_prog();
print_possible_prog_numbering();
print_misencoded_utf8_data();
}
# Give a useful exit status if data for some channels was not downloaded
if (! $opt->{quiet}) {
print_empty_listings();
if ($chan_warnings) {
say("\nFinished, but listings for some configured channels are missing. Check debug log.\n");
exit(1);
}
elsif ($prog_warnings) {
say("\nFinished, but listings for some programmes could not be processed. Check debug log.\n");
exit(0);
}
else {
say("\nFinished!\n");
exit(0);
}
}
###############################################
################ SUBROUTINES ##################
###############################################
sub get_default_cachedir {
my $winhome = undef;
if (defined $ENV{HOMEDRIVE} && defined $ENV{HOMEPATH}) {
$winhome = $ENV{HOMEDRIVE} . $ENV{HOMEPATH};
}
my $home = $ENV{HOME} || $winhome || ".";
my $dir = "$home/.xmltv/cache";
t("Using '$dir' as cache-directory for XMLTV listings");
return $dir;
}
sub load_old_config {
my ( $config_file ) = @_;
if (! $opt->{quiet}) {
say("Using old-style config file");
}
my @config_entries = XMLTV::Config_file::read_lines( $config_file );
my $conf = {};
# Use default cachedir as there was no support for choosing an alternative
# cache directory before ParseOptions support was added to the grabber.
$conf->{cachedir}[0] = $default_cachedir;
$conf->{channel} = [];
CONFIG_ENTRY:
foreach my $config_entry (@config_entries)
{
next CONFIG_ENTRY if (! defined $config_entry);
next CONFIG_ENTRY if ($config_entry =~ m/^#/ || $config_entry =~ m/^\s*$/);
if ($config_entry !~ m/^channel\s+(\S+)$/) {
if (! $opt->{quiet}) {
say("Bad line '$config_entry' in config file, skipping");
}
next CONFIG_ENTRY;
}
my( $command, $param ) = split( /\s+/, $config_entry, 2 );
$param =~ tr/\n\r//d;
$param =~ s/\s+$//;
# We only support channel entries in the old-style config
if ($command =~ m/^\s*channel\s*$/) {
push @{$conf->{channel}}, $param;
}
else {
die "Unknown command '$command' in config file $config_file"
}
}
return $conf;
}
sub init_cachedir {
my $path = shift @_;
if (! -d $path) {
if (mkpath($path)) {
t("Created cache-directory '$path'");
}
else {
die "Error: Failed to create cache-directory $path: $@, exiting";
}
}
}
# Check whether data files on the RT website are empty but still online, or
# contain HTML/XML from the Radio Times' error page.
#
# These files will have a good HTTP response header as they exist, but they
# contain no data. Caching via HCT without checking for a non-zero content_size
# beforehand will therefore overwrite good data with bad. Any file having a
# content_length of 0 or seen to contain DOCTYPE info will not be cached and the
# existing cached copy of the file will be used instead.
#
# Support for this functionality requires using at least the 1.0 version of
# HTTP::Cache::Transparent, which can be obtained from CPAN.
#
sub check_content_length {
my $rt_file = shift @_;
if ($rt_file->is_success) {
# reject an empty (but available) file
if ($rt_file->content_length == 0) {
return 0;
}
# an empty source file containing only the RT disclaimer has a length
# of approx 300 bytes
elsif ($rt_file->content_length < 400) {
return 0;
}
# reject a likely HTML error page
elsif ($rt_file->content =~ m/DOCTYPE/) {
return 0;
}
# cache a likely good file
else {
return 1;
}
}
# reject file if retrieval failed
else {
return 0;
}
}
# Get the last-modified time of a successful HTTP Response object. Return
# undef on error
sub get_mod_time {
my $resp = $ua->get(shift @_);
if ($resp->is_error) {
return undef;
}
else {
return $resp->header('Last-Modified');
}
}
# Determine all currently available channels by reading the current Radio
# Times list of channels, and adding additional information from the
# grabber's channel_ids file. The content of both of these files is
# required in order to proceed with listings retrieval.
#
sub load_available_channels {
my ( $conf, $opt ) = @_;
# Update encoding if seen in new-style config file
if (defined( $conf->{encoding} )) {
$xml_encoding = $conf->{encoding}[0];
}
# First we read in the XMLTV channel_ids file to provide additional
# information (icon, display name) about available channels, and also
# provide the information necessary for timeshifted and part-time channel
# support.
#
# We use the hashes %rt_to_xmltv and %xmltv_to_rt to lookup the Radio
# Times and XMLTV channel IDs. These will deal sensibly with a new RT
# channel that isn't yet mentioned in the channel_ids file.
# Provide statistics for the number of usable, unusable, timeshifted,
# part-time, and part-time timeshifted channels listed in channel_ids.
my $num_good_channels = 0;
my $num_bad_channels = 0;
my $num_ts_channels = 0;
my $num_pt_channels = 0;
my $num_pt_ts_channels = 0;
# Retrieve grabber's channel_ids file via XMLTV::Supplement
my $xmltv_channel_ids = GetSupplement("$grabber_name", 'channel_ids');
die "Error: XMLTV channel_ids data is missing, exiting"
if (! defined $xmltv_channel_ids || $xmltv_channel_ids eq '');
my @lines = split /[\n\r]+/, $xmltv_channel_ids;
t("\nExtended XMLTV channel information:\n");
XMLTV_CHANID_ENTRY:
foreach my $line (@lines) {
# Skip blank lines. Comments are allowed if they are at the start
# of the line.
next XMLTV_CHANID_ENTRY if ($line =~ m/^#/ || $line =~ m/^$/);
my @fields = split /\|/, $line;
# We need at least 2 fields (xmltv_id,rt_id) to run the grabber.
# No check on maximum number of fields to support future updates
# to channel_ids now we are using XMLTV::Supplement.
if (scalar @fields < 2) {
t("Wrong number of fields in XMLTV channel_ids entry:\n"
. "\t" . $line);
next XMLTV_CHANID_ENTRY;
}
# The channel_ids fields are:
# 1) XMLTV ID
# 2) RT ID
# 3) Channel name
# 4) Channel icon URL
# 5) Timeshift offset
# 6) Broadcast hours
# 7) Video quality
#
# The RT channels.dat provides a channel name, but it may be out of
# date - here we provide an alternative or updated name if the
# channel name has changed
my ($xmltv_id, $rt_id, $extra_dn,
$icon_url, $channel_offset, $broadcast_hours,
$video_quality) = @fields;
# Flag timeshifted and part-time channels for stats
my ($is_timeshift, $is_parttime);
# Check for required XMLTV ID and RT ID fields, skip if missing
if (! defined $xmltv_id || $xmltv_id eq '') {
t("Undefined XMLTV ID seen in channel_ids, skipping");
next XMLTV_CHANID_ENTRY;
}
if ($xmltv_id !~ m/\w+\.\w+.*/) {
t("Invalid XMLTV ID seen in channel_ids, skipping");
next XMLTV_CHANID_ENTRY;
}
if (! defined $rt_id || $rt_id eq '') {
t("Undefined RT ID seen in channel_ids, skipping");
next XMLTV_CHANID_ENTRY;
}
if ($rt_id !~ m/^\d+$/) {
t("Invalid RT ID seen in channel_ids, skipping");
next XMLTV_CHANID_ENTRY;
}
# Check for duplicate RT IDs having same associated XMLTV ID. As part of
# timeshifted/part-time channel support, we associate the same RT ID
# with different XMLTV IDs
foreach my $id (@{$rt_to_xmltv{$rt_id}}) {
if (defined $id && $id eq $xmltv_id) {
t("Radio Times ID '$rt_id' already seen in XMLTV "
. "channel_ids file, skipping");
next XMLTV_CHANID_ENTRY;
}
}
# Check whether current XMLTV ID has already been seen
if (defined $xmltv_to_rt{$xmltv_id}) {
t("XMLTV ID '$xmltv_id' already seen in XMLTV channel_ids file, skipping");
next XMLTV_CHANID_ENTRY;
}
# Store the XMLTV channel description, report if it is missing
if (defined $extra_dn) {
if ($extra_dn eq '' || $extra_dn !~ m/\w+/) {
$extra_dn = undef;
if (defined $extra_dn && $extra_dn !~ m/\(Do\ Not\ Use\)/) {
t("No XMLTV channel name associated with '$xmltv_id'");
}
}
else {
$extra_dn{$xmltv_id} = $extra_dn;
}
}
# Check for channel icon
if (defined $icon_url) {
if ($icon_url eq '' || $icon_url !~ m/^http/) {
$icon_url = undef;
if (defined $extra_dn && $extra_dn !~ m/\(Do\ Not\ Use\)/) {
t("No channel icon associated with '$xmltv_id'");
}
}
else {
$icon_urls{$xmltv_id} = $icon_url;
}
}
# Check for valid timeshift offset
if (defined $channel_offset) {
if ($channel_offset eq '' || $channel_offset !~ m/^(\+|\-)/) {
$channel_offset = undef;
}
else {
$channel_offset{$xmltv_id} = $channel_offset;
if (defined $extra_dn && $extra_dn !~ m/\(Do\ Not\ Use\)/) {
t("Channel '$xmltv_id' has timeshift of '$channel_offset'");
}
$is_timeshift = 1;
}
}
# Check for correct broadcast hours format (HHMM-HHMM)
if (defined $broadcast_hours) {
if ($broadcast_hours eq '' || $broadcast_hours !~ m/\d{4}-\d{4}/) {
$broadcast_hours = undef;
}
else {
$broadcast_hours{$xmltv_id} = $broadcast_hours;
if (defined $extra_dn && $extra_dn !~ m/\(Do\ Not\ Use\)/) {
t("Channel '$xmltv_id' is on air '$broadcast_hours'");
}
$is_parttime = 1;
}
}
# Check for presence of video quality information (SDTV or HDTV)
if (defined $video_quality) {
if ($video_quality eq '' || $video_quality !~ m/SDTV|HDTV/) {
$video_quality = undef;
}
else {
$video_quality{$xmltv_id} = $video_quality;
}
}
# Handle multiple XMLTV IDs associated with a single RT ID. Required
# after introduction of timeshifted and part-time channel support,
# which map multiple XMLTV IDs to a single RT ID.
push @{$rt_to_xmltv{$rt_id}}, $xmltv_id;
$xmltv_to_rt{$xmltv_id} = $rt_id;
# Update the counts of part-time and timeshifted channels
if ($is_timeshift && $is_parttime) {
$num_pt_ts_channels++;
}
elsif ($is_timeshift) {
$num_ts_channels++;
}
elsif ($is_parttime) {
$num_pt_channels++;
}
# Finally, update count of good/bad channels
if ($extra_dn =~ m/\(Do\ Not\ Use\)/) {
$num_bad_channels++;
}
else {
$num_good_channels++;
}
}
t("\n");
# channel_ids processing finished
die "Error: No usable XMLTV channel definitions seen in channel_ids, exiting"
if (! defined $num_good_channels || $num_good_channels < 1);
# Read in the Radio Times channels.dat file, which is supplied in UTF-8
# format. We process the list of available channels and check for
# presence of duplicate IDs or names.
#
# Grab the octets
t("Retrieving channel list from Radio Times website");
my $rt_channels_dat = get_octets( $rt_channels_uri );
die "Error: Radio Times channels.dat data is missing, exiting\n"
. "Please check $rt_channels_uri"
if (! defined $rt_channels_dat || $rt_channels_dat eq '');
# Decode source UTF-8 octets, process for HTML entities, and encode
# into configured output encoding
t("\nDecoding channel data from $source_encoding octets into Perl's internal format");
$rt_channels_dat = decode($source_encoding, $rt_channels_dat);
t("Processing for HTML entities seen in the channel data");
decode_entities($rt_channels_dat);
t("Encoding channel data from Perl's internal format into $xml_encoding octets\n");
$rt_channels_dat = encode($xml_encoding, $rt_channels_dat);
my @rt_channels = split /\n/, $rt_channels_dat;
my $num_rt_channels = scalar @rt_channels;
$need_final_update = 0;
my $chans_bar;
if (! $opt->{quiet} && ! $opt->{debug}) {
$chans_bar = new XMLTV::ProgressBar({name => 'Retrieving channels',
count => $num_rt_channels,
ETA => 'linear', });
}
# Hash to store details for <channel> elements
my %channels;
my (%seen_rt_id, %seen_name);
my $num_good_rt_channels = 0;
my $to_say = "";
RT_CHANDAT_ENTRY:
foreach my $rt_channel (@rt_channels) {
chomp $rt_channel;
# ignore empty line and disclaimer at start of file
if ($rt_channel =~ m/^\s*$/ || $rt_channel =~ /^In accessing this XML feed/) {
next RT_CHANDAT_ENTRY;
}
if ($rt_channel !~ m/^(\d+)\|(.+)/) {
t("Bad entry '$rt_channel' seen in RT channels.dat, skipping");
next RT_CHANDAT_ENTRY;
}
my ($rt_id, $rt_name) = ($1, $2);
if ($seen_rt_id{$rt_id}++) {
t("Duplicate channnel ID '$rt_id' seen in RT channels.dat, skipping");
next RT_CHANDAT_ENTRY;
}
if ($seen_name{$rt_name}++) {
t("Duplicate channel name '$rt_name' seen in RT channels.dat, skipping");
next RT_CHANDAT_ENTRY;
}
# Check whether there is at least one XMLTV ID associated with the RT ID
#
# If the current RT channel has a known XMLTV ID, check it against known bad
# channels and skip it if required. If the channel does not have an
# XMLTV ID, create one and continue.
#
my $xmltv_id = $rt_to_xmltv{$rt_id}[0];
if (defined $xmltv_id) {
# Skip any RT entries which have been flagged as bad in channel_ids file
if ($extra_dn{ $rt_to_xmltv{$rt_id}[0] } =~ m/\(Do\ Not\ Use\)/) {
t("Channel '$rt_name' ($rt_id) flagged as bad, skipping");
$need_final_update = 1;
next RT_CHANDAT_ENTRY;
}
}
else {
# Handle new channels available on RT site unknown to channel_ids file
$to_say .= "Unknown channel '$rt_name'. Will configure as 'C$rt_id.radiotimes.com'\n";
push @{$rt_to_xmltv{$rt_id}}, "C$rt_id.radiotimes.com";
}
foreach my $id (@{$rt_to_xmltv{$rt_id}}) {
# Use a name for the channel if defined in our channel_ids file,
# otherwise use the name supplied by the Radio Times.
my @names = ();
if (defined $extra_dn{$id}) {
@names = ([ $extra_dn{$id} ]);
}
else {
@names = ([ $rt_name ]);
}
# Add a URL for a channel icon if available.
my @icon;
my $icon_url = $icon_urls{$id};
if ($icon_url) {
@icon = { 'src' => $icon_url };
}
# Add the channel's details to the %channels hash, adding icon
# details if available.
if (@icon) {
$channels{$id} = {
id => $id,
rt_id => $rt_id,
'display-name' => \@names,
'icon' => \@icon,
};
}
else {
$channels{$id} = {
id => $id,
rt_id => $rt_id,
'display-name' => \@names,
};
}
}
# We have a usable channel definition at this point
$num_good_rt_channels++;
# Update the progres bar by one increment
if (defined $chans_bar) {
$chans_bar->update();
}
}
die "Error: No usable Radio Times channel definitions available, exiting"
if ($num_good_rt_channels < 1);
if (defined $chans_bar) {
# Only update the progress bar to 100% if we need to
if ($need_final_update) {
$chans_bar->update($num_rt_channels);
}
$chans_bar->finish();
if (! $opt->{quiet}) {
say( "\n" );
}
}
if (! $opt->{quiet} && $to_say) {
say( $to_say );
say("\n Please notify the maintainer to get the new channels added");
}
# Output statistics on the number of channels currently available
if (! $opt->{quiet}) {
say("\nThe Radio Times has usable data available for $num_good_rt_channels channels which we\n"
. "can use to generate TV listings for regular and some timeshifted\n"
. "channels. The tv_grab_uk_rt software also has support for an additional\n"
. "$num_ts_channels timeshifted, $num_pt_channels part-time, and $num_pt_ts_channels part-time timeshifted channels\n"
. "based on the Radio Times data.\n\n"
. "In total, tv_grab_uk_rt currently supports $num_good_channels channels.\n");
}
# Report any channels listed in channel_ids not seen on the Radio Times
# site
if (! $opt->{quiet}) {
XMLTV_ID:
foreach my $xmltv_id (keys %xmltv_to_rt) {
# Ignore channels flagged as bad in channel_ids
next XMLTV_ID if ($extra_dn{$xmltv_id} =~ m/.*Do\ Not\ Use.*/);
if (! exists $channels{$xmltv_id}) {
say("XMLTV channel '$xmltv_id' ($xmltv_to_rt{$xmltv_id}) "
. "not seen on RT site\n");
}
}
}
return \%channels;
}
# Determine options for, and create XMLTV::Writer object
sub setup_xmltv_writer {
my ( $conf, $opt ) = @_;
# Update encoding if seen in new-style config file
if (defined( $conf->{encoding} )) {
$xml_encoding = $conf->{encoding}[0];
}
# output options
my %g_args = ();
if (defined $opt->{output}) {
t("\nOpening XML output file '$opt->{output}'\n");
my $fh = new IO::File ">$opt->{output}";
die "Error: Cannot write to '$opt->{output}', exiting" if (! $fh);
%g_args = (OUTPUT => $fh);
}
# Determine how many days of listings are required and range-check, applying
# default values if impossible. If --days or --offset is specified we must
# ensure that values for days, offset and cutoff are passed to XMLTV::Writer
my %d_args = ();
if (defined $opt->{days} || defined $opt->{offset}) {
if (defined $opt->{days}) {
if ($opt->{days} < 1 || $opt->{days} > 15) {
if (! $opt->{quiet}) {
say("Specified --days option is not possible (1-15). "
. "Retrieving all available listings.");
}
$opt->{days} = 15
}
}
else {
$opt->{days} = 15;
}
if (defined $opt->{offset}) {
if ($opt->{offset} < 0 || $opt->{offset} > 14) {
if (! $opt->{quiet}) {
say("Specified --offset option is not possible (0-14). "
. "Retrieving all available listings.");
}
$opt->{offset} = 0;
}
}
else {
$opt->{offset} = 0;
}
$d_args{days} = $opt->{days};
$d_args{offset} = $opt->{offset};
# We currently don't provide a --cutoff option
$d_args{cutoff} = "000000";
}
t("Setting up XMLTV::Writer using \"" . $xml_encoding . "\" for output");
$writer = new XMLTV::Writer(%g_args, %d_args, encoding => $xml_encoding);
}
sub write_xmltv_header {
t("Writing XMLTV header");
$writer->start(\%tv_attributes);
}
sub write_channel_list {
my $wanted_chs = shift;
t("Started writing <channel> elements...");
foreach my $chan_href (@{$wanted_chs}) {
my %h = %$chan_href;
delete $h{'rt_id'};
$writer->write_channel(\%h);
}
t("Finished writing <channel> elements");
}
# Download listings data for configured channels that are available
sub write_listings_data {
my ( $conf, $opt, $wanted_chs ) = @_;
# Update encoding if seen in new-style config file
if (defined( $conf->{encoding} )) {
$xml_encoding = $conf->{encoding}[0];
}
my $num_req_chans = scalar @{$wanted_chs};
if (! $opt->{quiet}) {
display_copyright();
say("Will download listings for $num_req_chans configured channels\n");
}
my $listings_bar;
if (! $opt->{quiet} && ! $opt->{debug}) {
$listings_bar = new XMLTV::ProgressBar({name => 'Retrieving listings',
count => $num_req_chans,
ETA => 'linear', });
}
# Was title processing enabled in config file?
if ($title_processing eq 'enabled') {
t("Extra title processing is enabled\n");
load_prog_titles_to_process();
}
else {
t("Extra title processing is disabled\n");
}
# Were UTF-8 fixups enabled in config file?
if ($utf8_fixups_status eq 'enabled') {
t("UTF-8 fixups are enabled\n");
load_utf8_fixups();
}
else {
t("UTF-8 fixups are disabled\n");
}
# Hash to hold warnings of incorrect number of fields. The warning
# is given once per listings file if noticed more than once
my %warned_wrong_num_fields;
# Reset check for final progress bar update
$need_final_update = 0;
# Process all of the channels we want listings for
WANTED_CH:
foreach my $chan_href (@{$wanted_chs}) {
my $xmltv_id = $chan_href->{'id'};
my $rt_id = $chan_href->{'rt_id'};
my $rt_name = $chan_href->{'display-name'}[0][0];
if (! defined $rt_id) {
t("No Radio Times ID for channel '$rt_name', skipping");
next WANTED_CH;
}
# Create the channel's URL based on ID
my $rt_listings_uri = "$rt_root_dir/$rt_id.dat";
# Include the URL in any warn/die messages
local $SIG{__DIE__} = sub { die "$rt_listings_uri: $_[0]" };
local $SIG{__WARN__} = sub { warn "$rt_listings_uri: $_[0]" };
# Read in the listings data for the channel as UTF-8 octets. We will
# process the raw octets before decoding them to Perl's internal
# format below.
t("\nRetrieving listings for '$rt_name'");
my $page = get_octets( $rt_listings_uri );
if (! defined $page || $page eq '') {
if (! $opt->{quiet}) {
say("No listings data available for '$rt_name' ($xmltv_id), skipping");
}
$chan_warnings++;
next WANTED_CH;
}
if (! $opt->{quiet}) {
say("Processing listings for '$rt_name' ($xmltv_id)");
}
t("");
my $ts_dt;
if (defined $channel_offset{$xmltv_id}) {
t(" Detected a channel offset of '$channel_offset{$xmltv_id}' for '$rt_name'");
# Setup a reusable Duration object for this channel's timeshift
$channel_offset{$xmltv_id} =~ m/[+](\d+)hours?/;
$ts_dt = DateTime::Duration->new( hours => $1 );
}
# detect/correct UTF-8 errors in source data
$page = process_utf8_fixups($page, $rt_name, $rt_listings_uri);
# Decode source UTF-8 octets and process for HTML entities
t("\nDecoding listings data from $source_encoding octets into Perl's internal format");
$page = decode($source_encoding, $page);
t("Processing for HTML entities seen in the listings data");
decode_entities($page);
##### From this point, $page is in a Perl string #####
# Start to process individual programme entries found in listings
t("Started processing programmes for channel '$rt_name'\n");
# list to store programme elements for writing when each channel is parsed
my @programmes = ();
# Track number of programmes per channel
my $num_titles = 0;
PROGRAMME:
foreach my $prog (split /\n/, $page) {
# ignore empty line and disclaimer at start of each file
if ($prog =~ m/^\s*$/ || $prog =~ m/^In accessing this XML feed/) {
next PROGRAMME;
}
my @fields = split /\~/, $prog;
if (scalar @fields < 23) {
if ($warned_wrong_num_fields{$xmltv_id}++) {
t(" Too few data fields (need at least 23) in line:\n $prog\n");
}
$prog_warnings++;
t("\n ----\n");
next PROGRAMME;
}
# Remove any spaces at start/end of fields
foreach my $field (@fields) {
$field =~ s/^\s+//;
$field =~ s/\s+$//;
undef $field if !length $field;
}
# Description of Radio Times data fields (23 in total):
#
# 1 title - the programme title (text)
# 2 sub_title - used to carry series/episode numbering (text)
# 3 episode - used to carry the name/subtitle of an episode of the
# programme (text)
# 4 year - the year of production (text)
# 5 director - the programme's director(s) (text)
# 6 cast - the programme's cast (may include character details) (text)
# 7 premiere - whether this is a film's first showing (boolean)
# 8 film - whether the programme is a film (boolean)
# 9 repeat - whether the programme has been shown before (boolean)
# 10 subtitles - whether subtitles are available (boolean)
# 11 widescreen - whether the broadcast is 16:9 widescreen (boolean)
# 12 new_series - whether the programme is the first episode in a
# series new (boolean)
# 13 deaf_signed - whether in-vision signing is available (boolean)
# 14 blank_and_white - whether the broadcast is not in colour (boolean)
# 15 star_rating - a star rating between 0 and 5 for films (text)
# 16 certificate - the BBFC certificate for the programme (text)
# 17 genre - the genre of the programme (text)
# 18 desc - a description of the programme. Can be a specific review
# by a Radio Times reviewer (text)
# 19 choice - whether the programme is recommended by the
# Radio Times (boolean)
# 20 date - the transmission date (text)
# 21 start - the transmission start time for the programme (text)
# 22 stop - the transmissions stop time for the programme (text)
# 23 duration_mins - the duration of the programme in minutes (text)
#
# Hash to store all programme-specific variables. Initially store
# the channel's XMLTV ID and name.
my %prog = (channel => $xmltv_id, '_rt_name' => $rt_name);
# Store fields against temp keys. We will assign values to the XMLTV
# specific keys during processing. Key names starting with "_" are
# ignored by XMLTV::Writer.
( $prog{'_title'}, $prog{'_sub_title'}, $prog{'_episode'},
$prog{'_year'}, $prog{'_director'}, $prog{'_cast'},
$prog{'_premiere'}, $prog{'_film'}, $prog{'_repeat'},
$prog{'_subtitles'}, $prog{'_widescreen'}, $prog{'_new_series'},
$prog{'_deaf_signed'}, $prog{'_black_and_white'}, $prog{'_star_rating'},
$prog{'_certificate'}, $prog{'_genre'}, $prog{'_desc'},
$prog{'_choice'}, $prog{'_date'}, $prog{'_start'},
$prog{'_stop'}, $prog{'_duration_mins'},
) = @fields;
# Validate key fields (title/date/time) before processing
if (! validate_key_fields(\%prog)) {
$prog_warnings++;
t("\n ----\n");
next PROGRAMME;
}
# Check true/false fields for valid data
foreach my $field ('_premiere', '_film', '_repeat',
'_subtitles', '_widescreen', '_new_series',
'_deaf_signed','_black_and_white', '_choice', ) {
if (! validate_boolean_field(\%prog, $field) ) {
$prog_warnings++;
t("\n ----\n");
next PROGRAMME;
}
}
t(" Processing programme title '" . $prog{'_title'} . "'");
# Check for DST-related information in title
check_explicit_tz_in_title(\%prog);
# Remove any last-minute scheduling messages from desc
remove_updated_listing_desc(\%prog);
# Check for episode numbering in sub_title field
check_numbering_in_subtitle(\%prog);
# At this point, $prog{'_sub_title'} should be undefined with all
# text either parsed out or moved into $prog{'_episode'}
# Check for null or invalid release year
validate_year_field(\%prog);
# Remove production year information from $episode for films
remove_year_from_episode(\%prog);
# Tidy $title text before title processing
tidy_title_text(\%prog);
# Store uppercase titles for late analysis
check_uppercase_titles(\%prog);
# Debug output before any title processing takes place
my $ep_in = "<UNDEF>";
if (defined $prog{'_episode'}) {
$prog{'_episode'} =~ s/\s+/ /g; # tidy whitespace
$ep_in = $prog{'_episode'};
}
t(" Pre- processing title/episode details '"
. $prog{'_title'} . " / " . $ep_in . "'");
# Remove a duplicated programme title/ep if seen in episode field
remove_duplicated_title_and_ep_in_ep(\%prog);
# Remove a duplicated programme title if seen in episode field
remove_duplicated_title_in_ep(\%prog);
# Title and episode processing. We process titles if the user has
# not explicitly disabled title processing during configuration
# and we have supplement data to process programmes against.
process_title_fixups(\%prog);
# Look for series/episode/part numbering in programme title/subtitle
check_potential_numbering_in_text(\%prog);
# Tidy $episode text after title processing
tidy_episode_text(\%prog);
# Output updated title/episode information after processing
my $ep_out = "<UNDEF>";
if (defined $prog{'_episode'}) {
$prog{'_episode'} =~ s/\s+/ /g; # tidy whitespace
$ep_out = $prog{'_episode'};
}
t(" Post-processing title/episode details '"
. $prog{'_title'} . " / " . $ep_out . "'");
# Store title debug info for later analysis
store_title_debug_info(\%prog);
# Remove film title duplicated in $episode field
check_duplicated_film_title(\%prog);
# Check for film without a valid release year
check_missing_film_year(\%prog);
# Tidy $desc text after title processing
tidy_desc_text(\%prog);
# Check description for possible premiere/repeat hints
update_premiere_repeat_flags_from_desc(\%prog);
# Look for series/episode numbering in programme description
extract_numbering_from_desc(\%prog);
# Create episode numbering
my $ep_num = generate_episode_numbering(\%prog);
# Create cast list
my $cast_ref = generate_cast_list(\%prog);
# Store genre debug info for later analysis
store_genre_debug_info(\%prog);
# Create start/stop timings
if (! generate_start_stop_times(\%prog, $ts_dt)) {
$prog_warnings++;
t("\n ----\n");
next PROGRAMME;
}
# After processing is finished, create the %prog keys
# that will be written out by XMLTV::Writer, encoding the
# Perl string data back to the chosen output format
$prog{title} = [ [ encode($xml_encoding, $prog{'_title'}) ] ];
if (defined $prog{'_episode'} && $prog{'_episode'} !~ m/^\s*$/) {
$prog{'sub-title'} = [ [ encode($xml_encoding, $prog{'_episode'}) ] ];
}
if (defined $prog{'_desc'} && $prog{'_desc'} !~ m/^\s*$/) {
$prog{desc} = [ [ encode($xml_encoding, $prog{'_desc'}), 'en' ] ];
}
if (defined $ep_num) {
$prog{'episode-num'} = [ [ $ep_num, "xmltv_ns" ] ];
}
if (defined $prog{'_director'} && $prog{'_director'} !~ m/^\s*$/) {
$prog{credits}{director} = [ encode($xml_encoding, $prog{'_director'}) ];
}
if (defined $cast_ref) {
foreach my $cast (@{$cast_ref}) {
$cast = encode($xml_encoding, $cast);
}
$prog{credits}{actor} = $cast_ref;
}
if (defined $prog{'_year'}) {
$prog{date} = $prog{'_year'};
}
if (defined $prog{'_genre'} && ! $prog{'_film'}) {
push @{$prog{category}}, [ encode($xml_encoding, $prog{'_genre'}), 'en' ];
}
elsif ($prog{'_film'}) {
push @{$prog{category}}, [ 'Film', 'en' ];
}
if ($prog{'_widescreen'}) {
$prog{video}{aspect} = '16:9';
}
if ($prog{'_black_and_white'}) {
$prog{video}{colour} = 0;
}
if (defined $video_quality{$xmltv_id}) {
if ($video_quality{$xmltv_id} =~ m/HDTV/) {
$prog{video}{quality} = 'HDTV';
$prog{video}{aspect} = '16:9';
}
elsif ($video_quality{$xmltv_id} =~ m/SDTV/) {
$prog{video}{quality} = 'SDTV';
}
}
if ($prog{'_premiere'}) {
if (defined $channel_offset{$xmltv_id}) {
$prog{'_repeat'} = 1;
t(" Ignoring premiere flag on timeshifted channel");
}
else {
$prog{premiere} = [ '' ];
$prog{'_repeat'} = 0;
}
}
if ($prog{'_repeat'}) {
$prog{'previously-shown'} = {};
}
if ($prog{'_new_series'}) {
$prog{new} = 1;
}
if ($prog{'_subtitles'}) {
push @{$prog{subtitles}}, {type=>'teletext'};
}
if ($prog{'_deaf_signed'}) {
push @{$prog{subtitles}}, {type=>'deaf-signed'};
}
if (defined $prog{'_certificate'} && $prog{'_certificate'} !~ m/^\s*$/) {
$prog{rating} = [ [ $prog{'_certificate'}, 'BBFC' ] ];
}
if (defined $prog{'_star_rating'} && $prog{'_star_rating'} !~ m/^\s*$/ && $prog{'_film'}) {
push @{$prog{'star-rating'}}, [ "" . $prog{'_star_rating'} . "/5", 'Radio Times Film Rating' ];
}
if ($prog{'_choice'}) {
push @{$prog{'star-rating'}}, [ '1/1', 'Radio Times Recommendation' ];
}
# Finally, write the programme's XML data to programme list
push @programmes, \%prog;
$num_titles++;
t("\n ----\n");
}
if ($num_titles < 1) {
$empty_listings{$rt_name} = $rt_listings_uri;
$chan_warnings++;
t(" No programmes found for '$rt_name' - check source file");
}
else {
# Write the channel's programme elements to output
foreach my $prog (@programmes) {
$writer->write_programme($prog);
}
t(" Writing $num_titles <programme> elements for '$rt_name'");
}
t("Finished processing listings for '$rt_name' ($xmltv_id)\n");
t("----");
# Update the progres bar by one increment
if (defined $listings_bar) {
$listings_bar->update();
}
}
if (defined $listings_bar) {
# Only update the progress bar to 100% if we need to
if ($need_final_update) {
$listings_bar->update($num_req_chans);
}
$listings_bar->finish();
if (! $opt->{quiet}) {
say("\n");
}
}
}
sub write_xmltv_footer {
t("\nWriting XMLTV footer\n");
$writer->end;
}
# Convenience method for use with XMLTV::Memoize. Only return content
# after a successful response. We require access to the raw octets via
# $resp->content in order to be able to process the data for double and
# mis-encoded UTF-8 content. Calling $resp->decoded_content or using
# LWP::Simple::get() (versions of LWP >=5.827) would not permit this.
sub get_octets {
my $resp = $ua->get(shift @_);
if ($resp->is_error) {
return undef;
}
else {
return $resp->content;
}
}
# Return the digit equivalent of its word, i.e. "one" -> "1",
# or return the word if it appears to consist of only digits
sub word_to_digit {
my $word = shift;
return undef if ! defined $word;
return $word if $word =~ m/\d+/;
for (lc $word) {
if (m/^one$/) { return 1 }
elsif (m/^two$/) { return 2 }
elsif (m/^three$/) { return 3 }
elsif (m/^four$/) { return 4 }
elsif (m/^five$/) { return 5 }
elsif (m/^six$/) { return 6 }
elsif (m/^seven$/) { return 7 }
elsif (m/^eight$/) { return 8 }
elsif (m/^nine$/) { return 9 }
elsif (m/^ten$/) { return 10 }
elsif (m/^eleven$/) { return 11 }
elsif (m/^twelve$/) { return 12 }
elsif (m/^thirteen$/) { return 13 }
elsif (m/^fourteen$/) { return 14 }
elsif (m/^fifteen$/) { return 15 }
elsif (m/^sixteen$/) { return 16 }
elsif (m/^seventeen$/) { return 17 }
elsif (m/^eighteen$/) { return 18 }
elsif (m/^nineteen$/) { return 19 }
elsif (m/^twenty$/) { return 20 }
# handle 1-8 in roman numberals
elsif (m/^i$/) { return 1 }
elsif (m/^ii$/) { return 2 }
elsif (m/^iii$/) { return 3 }
elsif (m/^iv$/) { return 4 }
elsif (m/^v$/) { return 5 }
elsif (m/^vi$/) { return 6 }
elsif (m/^vii$/) { return 7 }
elsif (m/^viii$/) { return 8 }
# return undef if input unhandled
else { return undef }
}
}
# Display required copyright message from Radio Times
sub display_copyright {
say("$rt_copyright");
}
# Read in the prog_titles_to_process file
sub load_prog_titles_to_process {
my $prog_titles_to_process = undef;
# Retrieve prog_titles_to_process via XMLTV::Supplement
$prog_titles_to_process
= GetSupplement("$grabber_name", 'prog_titles_to_process');
if (defined $prog_titles_to_process) {
$prog_titles_to_process = decode($source_encoding, $prog_titles_to_process);
my @prog_titles_to_process = split /[\n\r]+/, $prog_titles_to_process;
t("\nTitle processing information:\n");
PROG_TITLE_ENTRY:
foreach my $line (@prog_titles_to_process) {
# Comments are allowed if they are at the start of the line
next PROG_TITLE_ENTRY if ($line =~ m/^#/);
my @fields = split /\|/, $line;
# Each entry requires 2 fields
if (scalar @fields != 2) {
t("Wrong number of fields in XMLTV prog_titles_to_process entry:\n"
. "\t" . $line);
next PROG_TITLE_ENTRY;
}
# The prog_titles_to_process fields are:
# 1) procesing code
# 2) title/non-title text to process
#
my ($code, $process_text) = @fields;
if (! defined $code || $code eq '' || $code !~ m/\d+/) {
t("Invalid title processing code: " . $line . "'");
next PROG_TITLE_ENTRY;
}
if (! defined $process_text || $process_text eq ''
|| $process_text !~ m/\w+/) {
t("Invalid title processing text: " . $line . "'");
next PROG_TITLE_ENTRY;
}
my $idx_char = lc(substr ($process_text, 0, 1));
# processing codes are documented in prog_titles_to_process file
if ($code eq '1') {
push @{$non_title_info{$idx_char}}, $process_text;
t("[1] Will remove '" . $process_text . "' from title if found");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '2') {
push @{$mixed_title_subtitle{$idx_char}}, $process_text;
t("[2] Will check for subtitle after title for '" . $process_text . "'");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '3') {
push @mixed_subtitle_title, $process_text;
t("[3] Will check for subtitle before title for '" . $process_text . "'");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '4') {
push @{$reversed_title_subtitle{$idx_char}}, $process_text;
t("[4] Will check for reversed title/subtitle for '" . $process_text . "'");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '5') {
my @fields = split( /~/, $process_text, 2);
if (scalar @fields != 2) {
t("[5] Invalid number of fields (need 2) in processing text: '" . $process_text . "'");
next PROG_TITLE_ENTRY;
}
my( $old_title, $new_title ) = @fields;
$replacement_titles{$old_title} = $new_title;
t("[5] Will check for inconsistent title '" . $old_title . "'");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '6') {
my @fields = split( /~/, $process_text, 2);
if (scalar @fields != 2) {
t("[6] Invalid number of fields (need 2) in processing text: '" . $process_text . "'");
next PROG_TITLE_ENTRY;
}
my( $uncat_title, $cat ) = @fields;
$replacement_cats{$uncat_title} = $cat;
t("[6] Will assign title '" . $uncat_title . "' to category '" . $cat . "'");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '7') {
my @fields = split( /~/, $process_text, 3);
if (scalar @fields != 3) {
t("[7] Invalid number of fields (need 3) in processing text: '" . $process_text . "'");
next PROG_TITLE_ENTRY;
}
my( $ep_title, $old_ep, $new_ep ) = @fields;
$replacement_episodes{$ep_title}->{$old_ep} = $new_ep;
t("[7] Will check for inconsistent episode data '" . $old_ep . "' for title '" . $ep_title . "'");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '8') {
my @fields = split( /~/, $process_text, 4);
if (scalar @fields != 4) {
t("[8] Invalid number of fields (need 4) in processing text: '" . $process_text . "'");
next PROG_TITLE_ENTRY;
}
foreach my $field (@fields) {
$field = "" if ! defined $field;
}
my( $old_title, $old_ep, $new_title, $new_ep ) = @fields;
if ($old_title eq '' or $new_title eq '') {
t("[8] Ignoring fixup '" . $process_text . "' as old/new title not given");
next PROG_TITLE_ENTRY;
}
# remember old title so that we can output a debug list of
# programmes that may also need to be handled via this fixup
$flagged_title_eps{$old_title} = $old_title;
my $key = ("" . $old_title . "|" . $old_ep);
$replacement_title_eps{$key} = [$new_title, $new_ep];
t("[8] Will update old title/subtitle '" . $old_title . " / " . $old_ep
. "' to '" . $new_title . " / " . $new_ep . "'");
if ($old_title eq $new_title) {
t("[8] Old/new title are the same - change to type 7 title fixup: '" . $process_text . "'");
}
if ($old_ep =~ m/^\Q$new_title\E/) {
t("[8] Old ep contains new title - change to type 11 title fixup? '" . $process_text . "'");
}
# store titles that are being corrected with an existing "some title..." fixup
# store the title without a leading "The" or "A" or the trailing "..." for later matching
if ($new_title =~ m/^(?:The\s+|A\s+)?(.*)\.\.\.$/) {
$dotdotdot_titles{$1} = $new_title;
}
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '9') {
my @fields = split( /~/, $process_text, 3);
if (scalar @fields != 3) {
t("[9] Invalid number of fields (need 3) in processing text: '" . $process_text . "'");
next PROG_TITLE_ENTRY;
}
my( $title, $episode, $desc ) = @fields;
$replacement_ep_from_desc{$title}->{$desc} = $episode;
t("[9] Will update subtitle to '" . $episode . "' for title '" . $title
. "' based on given description '" . $desc . "'");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '10') {
my @fields = split( /~/, $process_text, 5);
if (scalar @fields != 5) {
t("[10] Invalid number of fields (need 5) in processing text: '" . $process_text . "'");
next PROG_TITLE_ENTRY;
}
foreach my $field (@fields) {
$field = "" if ! defined $field;
}
my( $old_title, $old_ep, $new_title, $new_ep, $desc ) = @fields;
if ($old_title eq '' or $new_title eq '' or $desc eq '') {
t("[10] Ignoring fixup '" . $process_text . "' as titles/desc not given");
next PROG_TITLE_ENTRY;
}
my $key = ("" . $old_title . "|" . $old_ep . "|" . $desc);
$replacement_title_desc{$key} = [$new_title, $new_ep];
t("[10] Will update old title/subtitle/desc '" . $old_title . " / " . $old_ep
. "' to '" . $new_title . " / " . $new_ep . "'");
# store titles that are being corrected with an existing "some title..." fixup
# store the title without a leading "The" or "A" or the trailing "..." for later matching
if ($new_title =~ m/^(?:The\s+|A\s+)?(.*)\.\.\.$/) {
$dotdotdot_titles{$1} = $new_title;
}
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
elsif ($code eq '11') {
my @fields = split( /~/, $process_text, 2);
if (scalar @fields != 2) {
t("[11] Invalid number of fields (need 2) in processing text: '" . $process_text . "'");
next PROG_TITLE_ENTRY;
}
my( $brand, $new_title ) = @fields;
if ($brand eq '' or $new_title eq '') {
t("[11] Ignoring fixup '" . $process_text . "' as brand/title not given");
next PROG_TITLE_ENTRY;
}
push @{$demoted_title{$brand}}, $new_title;
t("[11] Will check for demoted title '" . $new_title . "' for brand '" . $brand . "'");
$have_title_data = 1;
next PROG_TITLE_ENTRY;
}
else {
t("Unknown code seen in prog_titles_to_process file,"
. " skipping entry '" . $line . "'");
next PROG_TITLE_ENTRY;
}
}
}
else {
if (! $opt->{quiet}) {
say("Disabling title processing, no information found.");
}
}
if (! $opt->{quiet}) {
say("\n");
}
}
# Read in the utf8_fixups file
sub load_utf8_fixups {
my $utf8_fixups = undef;
# Retrieve utf8_fixups via XMLTV::Supplement
$utf8_fixups
= GetSupplement("$grabber_name", 'utf8_fixups');
if (defined $utf8_fixups) {
my @utf8_fixups = split /[\n\r]+/, $utf8_fixups;
t("\nLoading UTF-8 fixups\n");
UTF8_FIXUP_ENTRY:
foreach my $line (@utf8_fixups) {
# Comments are allowed if they are at the start of the line
next UTF8_FIXUP_ENTRY if ($line =~ m/^#/);
my @fields = split /\|/, $line;
# Each entry requires 2 fields
if (scalar @fields != 2) {
t("Wrong number of fields in XMLTV UTF-8 fixup entry:\n"
. "\t" . $line);
next UTF8_FIXUP_ENTRY;
}
# The utf8_fixups fields are:
# 1) bad utf-8 characters to find and replace (as hex)
# 2) the replacement characters (as hex)
#
my ($bad_chars, $replacement) = @fields;
if (! defined $bad_chars || $bad_chars eq '') {
t("Invalid UTF-8 fixup regex: '" . $line . "'");
next UTF8_FIXUP_ENTRY;
}
if (! defined $replacement || $replacement eq '') {
t("Invalid UTF-8 fixup replacement: '" . $line . "'");
next UTF8_FIXUP_ENTRY;
}
# ignore unknown fixup formats
if ($bad_chars !~ m/\\xEF\\xBF\\xBD/
&& $bad_chars !~ m/\\xC3\\xAF\\xC2\\xBF\\xC2\\xBD/
&& $bad_chars !~ m/^\\xC2\\x[8-9][0-9A-F]/) {
t("Ignoring UTF-8 fixup: '" . $line . "'");
next UTF8_FIXUP_ENTRY;
}
# Remove the \x chars read from the file leaving a simple hex string
# containing only [0-9A-F] chars
$replacement =~ s/\\x//g;
# Now convert each byte (2 hex chars) into its character equivalent
$replacement =~ s/([0-9A-F][0-9A-F])/chr(hex($1))/eig;
# Create hashes to store each type of fixup separately. This should
# improve processing speed by restricting number of fixups checked.
if ($bad_chars =~ m/\\xEF\\xBF\\xBD/) {
$utf8_fixups{'EFBFBD'}{$bad_chars} = $replacement;
}
elsif ($bad_chars =~ m/\\xC3\\xAF\\xC2\\xBF\\xC2\\xBD/) {
$utf8_fixups{'C3AFC2BFC2BD'}{$bad_chars} = $replacement;
}
elsif ($bad_chars =~ m/^\\xC2\\x[8-9][0-9A-F]/) {
$utf8_fixups{'C2809F'}{$bad_chars} = $replacement;
}
# Process the regex to get a character string to print. We use
# the preserved hex string during processing
my $bad_chars_chr = $bad_chars;
$bad_chars_chr =~ s/\\x//g;
$bad_chars_chr =~ s/([0-9A-F][0-9A-F])/chr(hex($1))/eig;
t("UTF-8 fixup: will replace \"" . $bad_chars_chr . "\" with \""
. $replacement . "\" if seen");
next UTF8_FIXUP_ENTRY;
}
}
else {
if (! $opt->{quiet}) {
say("No additional UTF-8 fixups were found.");
}
}
if (! $opt->{quiet}) {
say("\n");
}
}
# Tidy up any bad characters in the source data. Although the data is provided
# as UTF-8, the text may contain mis-encoded UTF-8 characters or the NULL
# or other extraneous characters which should be corrected where possible.
#
sub process_utf8_fixups {
# read in the data to be processed, a descriptive name and a URI for it
my $page = shift;
my $rt_name = shift;
my $rt_listings_uri = shift;
t(" Checking '$rt_name' listings data for bad UTF-8 chars...");
for ($page) {
# Programme entries containing RT reviews or updated information
# may have erroneous CR+SP characters which we tidy up here
#
t(" Looking for CR+SP characters...");
if (s/\x0D\x20//g) {
t(" Removed CR+SP characters from '$rt_name' listings data");
}
# Fix double-encoded UTF-8 characters (4 bytes)
# =============================================
#
# The ISO-8859-1 charset contains 256 codepoints (0x00-0xFF). When
# encoded into UTF-8, either 1 or 2 bytes are required to encode these
# characters as follows:
#
# ISO-8859-1 UTF-8 Chars in Bytes Notes
# range byte(s) range Range Required
#
# 0x00-0x1F [00]-[1F] 32 1 Non-printing
# 0x20-0x7F [20]-[7F] 96 1 Printing
# 0x80-0x9F [C2][80]-[C2][9F] 32 2 Non-printing
# 0xA0-0xBF [C2][A0]-[C2][BF] 32 2 Printing
# 0xC0-0xFF [C3][80]-[C3][BF] 64 2 Printing
#
# A double-encoded UTF-8 character that uses 4 bytes (but should use
# only 2 if properly encoded) uses the first 2 bytes to contain the
# UTF-8 representation of the first byte of the proper UTF-8
# representation of the character, and the second 2 bytes to contain
# the UTF-8 representation of the second byte.
#
# E.g.:
#
# The data contains a double-encoded UTF-8 encoding of the A-grave
# character using 4 bytes. The correct UTF-8 encoding of this character
# is [C3][80]. The data actually contains the 4 bytes [C3][83][C2][80].
# [C3][83] is the UTF-8 encoding of [C3], and [C2][80] is the UTF-8
# encoding of [80]. We therefore replace this 4-byte double-encoding
# with [C3][80] which is valid UTF-8 and can be successfully encoded
# into other character encodings if required.
#
# The range of Unicode codepoints encoded into 2 bytes in UTF-8 lie in the
# range [C2-DF][80-BF].
#
# http://en.wikipedia.org/wiki/ISO/IEC_8859-1
# http://en.wikipedia.org/wiki/UTF-8
# http://www.eki.ee/letter/
#
t(" Looking for double-encoded UTF-8 characters...");
if (m/[\xC3][\x82-\x83][\xC2][\x80-\xBF]/) {
# first capture each set of double-encoded UTF-8 bytes
# (4 in total, 2 for each "real" UTF-8 char) into a list
my @double_bytes = ($page =~ m/[\xC3][\x82-\x83][\xC2][\x80-\xBF]/g);
# get a unique list of the different doubly encoded bytes
my %unique_double_bytes;
foreach(@double_bytes) {
$unique_double_bytes{$_} = $_;
}
# Get a list of unique 4-byte sequences
@double_bytes = sort values %unique_double_bytes;
foreach (@double_bytes) {
t(" Found double-encoded bytes: " . $_);
}
# process the list, reading 2 pairs of bytes in each iteration
foreach (@double_bytes) {
/([\xC3][\x82-\x83])([\xC2][\x80-\xBF])/;
my $badbytes_1 = $1;
my $badbytes_2 = $2;
# convert each pair of bytes from UTF-8 to ISO-8859-1 to get a single
# byte from the original pair
my $goodbyte_1 = encode("iso-8859-1", decode("utf-8", $badbytes_1) );
my $goodbyte_2 = encode("iso-8859-1", decode("utf-8", $badbytes_2) );
# finally globally replace each group of 4 bad bytes with
# the 2 correct replacement bytes
$page =~ s/$badbytes_1$badbytes_2/$goodbyte_1$goodbyte_2/g;
t(" Replaced bad bytes '" . $badbytes_1 . $badbytes_2
. "' with good bytes '" . $goodbyte_1 . $goodbyte_2 . "'");
}
}
# Fix double-encoded UTF-8 General Punctuation characters (6 bytes)
# =================================================================
#
# Occasionally in the listings we see double-encoded characters from
# the Unicode General Punctuation range of characters. When encoded
# into UTF-8 these characters should require 3 bytes. However, when
# double-encoded they take 6 bytes. During their handling we replace
# them with their ASCII equivalents which are how the characters are
# usually included in the listings.
#
t(" Looking for double-encoded UTF-8 General Punctuation characters...");
if (m/[\xC3][\xA2][\xC2][\x80-\x81]/) {
t(" Replaced double-encoded 6-byte UTF-8 General Punctuation chars");
s/\xC3\xA2\xC2\x80\xC2\x90/\x2D/g; # <2D> -> -
s/\xC3\xA2\xC2\x80\xC2\x91/\x2D/g; # <2D> -> -
s/\xC3\xA2\xC2\x80\xC2\x92/\x2D/g; # <2D> -> -
s/\xC3\xA2\xC2\x80\xC2\x93/\x2D/g; # <2D> -> -
s/\xC3\xA2\xC2\x80\xC2\x94/\x2D/g; # <2D> -> -
s/\xC3\xA2\xC2\x80\xC2\x95/\x2D/g; # <2D> -> -
s/\xC3\xA2\xC2\x80\xC2\x98/\x27/g; # <27> -> '
s/\xC3\xA2\xC2\x80\xC2\x99/\x27/g; # <27> -> '
s/\xC3\xA2\xC2\x80\xC2\x9A/\x27/g; # <27> -> '
s/\xC3\xA2\xC2\x80\xC2\x9C/\x22/g; # <22> -> "
s/\xC3\xA2\xC2\x80\xC2\x9D/\x22/g; # <22> -> "
s/\xC3\xA2\xC2\x80\xC2\x9E/\x22/g; # <22> -> "
s/\xC3\xA2\xC2\x80\xC2\x9F/\x22/g; # <22> -> "
s/\xC3\xA2\xC2\x80\xC2\xA6/\x2E\x2E\x2E/g; # <2E><2E><2E> -> ...
}
# Fix mis-encoded UTF-8 characters (6/8 bytes)
# ============================================
#
# Frequently seen in the data (especially in film listings) are completely
# mis-encoded sequences of UTF-8 characters. Each sequence of bad bytes
# starts with a correctly encoded 2 byte UTF-8 character but it then
# followed by 2 or 3 mis-encoded ASCII-range characters. When encoded into
# UTF-8 these ASCII chars should take 1 byte each, but in this situation
# use 2 bytes which then fail to decode or display correctly.
#
# This fixup looks for mis-encoded character sequences in the range
# [C3][A0-AF][C2][80-BF][C2][80-BF] (6 bytes)
#
t(" Looking for mis-encoded [C3][A0-AF] bytes...");
if (m/[\xC3][\xA0-\xAF][\xC2][\x80-\xBF][\xC2][\x80-\xBF]/) {
# first capture each sequence of mis-encoded UTF-8 bytes
# (6 in total)
my @misencoded_bytes =
($page =~ m/[\xC3][\xA0-\xAF][\xC2][\x80-\xBF][\xC2][\x80-\xBF]/g);
# get a unique list of the different mis-encoded byte sequences
my %unique_misencoded_bytes;
MIS_ENC_BYTE:
foreach (@misencoded_bytes) {
# the Unicode Replacement Character is handled below, so ignore here
# (when double-encoded, it will match the regex above)
if (m/\xC3\xAF\xC2\xBF\xC2\xBD/) {
t(" Ignoring double-encoded Unicode Replacement Character (handled separately)");
next MIS_ENC_BYTE;
}
$unique_misencoded_bytes{$_} = $_;
}
# Get a new list of the unique 6-byte sequences
@misencoded_bytes = sort values %unique_misencoded_bytes;
foreach (@misencoded_bytes) {
t(" Found mis-encoded bytes: " . $_);
}
# process the list, reading 4 bytes in each iteration. Bytes
# 1 and 2 are correct and left untouched, bytes 4 and 6 are
# extracted and corrected before being output
foreach (@misencoded_bytes) {
/([\xC3][\xA0-\xAF])[\xC2]([\x80-\xBF])[\xC2]([\x80-\xBF])/;
my $goodbytes = $1; # correct, and used in replacement
my $badbyte_1 = $2; # incorrect byte value
my $badbyte_2 = $3; # incorrect byte value
# the bad bytes require 0x40 (DEC 64) to be subtracted from the char
# value. 0xA0 are a special case and always converted to regular
# space char (0x20)
my $goodbyte_1;
if ($badbyte_1 !~ m/\xA0/) {
$goodbyte_1 = chr( (ord $badbyte_1) - 64);
}
else {
$goodbyte_1 = "\x20";
}
my $goodbyte_2;
if ($badbyte_2 !~ m/\xA0/) {
$goodbyte_2 = chr( (ord $badbyte_2) - 64);
}
else {
$goodbyte_2 = "\x20";
}
# finally globally replace each sequence of bad bytes with
# the correct replacement bytes
$page =~ s/$_/$goodbytes$goodbyte_1$goodbyte_2/g;
t(" Replaced mis-encoded [C3][A0-AF] bytes '" . $_
. "' with bytes '"
. $goodbytes . $goodbyte_1 . $goodbyte_2 . "'");
}
}
# This fixup looks for mis-encoded character sequences in the range
# [C3][B0-BF][C2][80-BF][C2][80-BF][C2][80-BF] (8 bytes)
#
t(" Looking for mis-encoded [C3][B0-BF] bytes...");
if (m/[\xC3][\xB0-\xBF][\xC2][\x80-\xBF][\xC2][\x80-\xBF][\xC2][\x80-\xBF]/) {
# first capture each sequence of mis-encoded UTF-8 bytes
# (8 in total)
my @misencoded_bytes =
($page =~ m/[\xC3][\xB0-\xBF][\xC2][\x80-\xBF][\xC2][\x80-\xBF][\xC2][\x80-\xBF]/g);
# get a unique list of the different mis-encoded byte sequences
my %unique_misencoded_bytes;
foreach(@misencoded_bytes) {
$unique_misencoded_bytes{$_} = $_;
}
# Get a new list of the unique 8-byte sequences
@misencoded_bytes = sort values %unique_misencoded_bytes;
foreach (@misencoded_bytes) {
t(" Found mis-encoded bytes: " . $_);
}
# process the list, reading 5 bytes in each iteration. Bytes
# 1 and 2 are correct and left untouched, bytes 4, 6 and 8 are
# extracted and corrected before being output
foreach (@misencoded_bytes) {
/([\xC3][\xB0-\xBF])[\xC2]([\x80-\xBF])[\xC2]([\x80-\xBF])[\xC2]([\x80-\xBF])/;
my $goodbytes = $1; # correct, and used in replacement
my $badbyte_1 = $2; # incorrect byte value
my $badbyte_2 = $3; # incorrect byte value
my $badbyte_3 = $4; # incorrect byte value
# the bad bytes require 0x40 (DEC 64) to be subtracted from the char
# value. 0xA0 are a special case and always converted to regular
# space char (0x20)
my $goodbyte_1;
if ($badbyte_1 !~ m/\xA0/) {
$goodbyte_1 = chr( (ord $badbyte_1) - 64);
}
else {
$goodbyte_1 = "\x20";
}
my $goodbyte_2;
if ($badbyte_2 !~ m/\xA0/) {
$goodbyte_2 = chr( (ord $badbyte_2) - 64);
}
else {
$goodbyte_2 = "\x20";
}
my $goodbyte_3;
if ($badbyte_3 !~ m/\xA0/) {
$goodbyte_3 = chr( (ord $badbyte_3) - 64);
}
else {
$goodbyte_3 = "\x20";
}
# finally globally replace each sequence of bad bytes with
# the correct replacement bytes
$page =~ s/$_/$goodbytes$goodbyte_1$goodbyte_2$goodbyte_3/g;
t(" Replaced mis-encoded [C3][B0-BF] bytes '" . $_
. "' with bytes '"
. $goodbytes . $goodbyte_1 . $goodbyte_2 . $goodbyte_3 . "'");
}
}
# Manual Replacements
# ===================
#
# Here we replace specific sequences of characters seen in the source
# data that cannot be handled automatically above. These include
# occurences of the Unicode Replace Character (single and double
# encoded) and other mis-encoded characters.
#
# We use a supplemental file to store these fixups to allow updating
# without needing to update the grabber itself.
#
if ($utf8_fixups_status eq 'enabled') {
# Unicode Replacement Character (U+FFFD)
# ======================================
#
# The UTF-8 source data may also contain the bytes [EF][BF][BD] which
# are the UTF-8 encoding of the Unicode Replacement Character U+FFFD.
# It is likely that these are introduced during preparation of the
# listings data by the Radio Times, as any characters that cannot be
# understood are replaced by this character.
#
t(" Looking for Unicode Replacement Character...");
if (m/\xEF\xBF\xBD/) {
if (%utf8_fixups && exists $utf8_fixups{'EFBFBD'}) {
foreach my $bad_chars (keys %{$utf8_fixups{'EFBFBD'}}) {
my $replacement = $utf8_fixups{'EFBFBD'}{$bad_chars};
# Search for the regex string and replace with char string
if ($page =~ s/$bad_chars/$replacement/g) {
t(" Replaced Unicode Replacement Character with \""
. $replacement . "\"");
}
}
}
if ($page =~ s/\xEF\xBF\xBD/\x3F/g) {
t(" After fixups, data for '$rt_name' still contains Unicode "
. "Replacement character. Replaced with \"?\"\n");
$hadEFBFBD{$rt_name} = $rt_listings_uri;
}
}
# Double-encoded Unicode Replacement Character (6 bytes)
# ======================================================
#
# The correct encoding for the Unicode Replacement Character is
# [EF][BF][BD], however it has been seen double-encoded in the listings
# data as [C3][AF][C2][BF][C2][BD]. As with the normal replacement
# character, there is no way to determine which replacement character
# to use in this case, so we substitute a '?' char if we cannot handle
# the specific occurence. This error needs to have been seen at least
# once in source data to be able to construct a suitable fixup.
#
t(" Looking for double-encoded Unicode Replacement Character...");
if (m/\xC3\xAF\xC2\xBF\xC2\xBD/) {
if (%utf8_fixups && exists $utf8_fixups{'C3AFC2BFC2BD'}) {
foreach my $bad_chars (keys %{$utf8_fixups{'C3AFC2BFC2BD'}}) {
my $replacement = $utf8_fixups{'C3AFC2BFC2BD'}{$bad_chars};
# Search for the regex string and replace with char string
if ($page =~ s/$bad_chars/$replacement/g) {
t(" Replaced double-encoded Unicode Replacement Character with \""
. $replacement . "\"");
}
}
}
if ($page =~ s/\xC3\xAF\xC2\xBF\xC2\xBD/\x3F/g) {
t(" After fixups, data for '$rt_name' still contains "
. "double-encoded Unicode Replacement character. "
. "Replaced with \"?\"\n");
$hadC3AFC2BFC2BD{$rt_name} = $rt_listings_uri;
}
}
# Mis-encoded characters in range [C2][80-9F]
# ===========================================
#
# Single characters that are seen in the source data as bytes in the
# range [C2][80-9F] that UTF-8 decode as non-printing characters
# instead of their intended character.
#
t(" Looking for mis-encoded characters in range [C2][80-9F]...");
if (m/\xC2[\x80-\x9F]/) {
if (%utf8_fixups && exists $utf8_fixups{'C2809F'}) {
foreach my $bad_chars (keys %{$utf8_fixups{'C2809F'}}) {
my $replacement = $utf8_fixups{'C2809F'}{$bad_chars};
# Search for the regex string and replace with char string
if ($page =~ s/$bad_chars/$replacement/g) {
t(" Replaced mis-encoded characters \"" . $bad_chars
. "\" with \"". $replacement . "\"");
}
}
}
}
}
# With manual replacements handled above, finally remove any
# remaining bad/non-printing characters we find
# Replacements for specific strings seen in source data
# =====================================================
#
t(" Looking for specific strings to replace...");
# Replacement for Pound Sterling symbol seen as {pound}
if (s/\x7B\x70\x6F\x75\x6E\x64\x7D/\xC2\xA3/g) {
t(" Replaced \"{pound}\" with Pound Sterling symbol");
}
# Replace any non-breaking (NBSP) space chars with regular spaces
if (s/\xC2\xA0/\x20/g) {
t(" Replaced non-breaking spaces with regular spaces");
}
# Remove any remaining non-printing control characters (keep
# \t \n and \r). Refer to above table for ISO-8859-1 and UTF-8 Unicode
# encodings for chars.
#
# First, chars in UTF-8 range [00-1F] (ISO-8859-1 range [00-1F])
if (s/[\x00-\x08\x0B-\x0C\x0E-\x1F]//g) {
t(" Removed non-printing characters (range [00]-[1F]) from "
. "'$rt_name' listings data");
}
# Next, remove any remaining byte pairs in UTF-8 range [C2][7F-9F]
# (ISO-8859-1 range [7F-9F]) (non-printing)
if (s/[\xC2][\x7F-\x9F]//g) {
t(" Removed non-printing characters (range [C2][7F-9F]) from "
. "'$rt_name' listings data");
$hasC27F9Fchars{$rt_name} = $rt_listings_uri;
}
}
return $page;
}
# Validate the key %prog fields (title/date/time) for a programme
sub validate_key_fields {
my $prog = shift;
if (! defined $prog->{'_title'}) {
t(" Missing title in entry, skipping");
return undef;
}
if (! defined $prog->{'_date'}) {
t(" Missing date in entry, skipping");
return undef;
}
if (! defined $prog->{'_start'}) {
t(" Missing start time in entry, skipping");
return undef;
}
if (! defined $prog->{'_duration_mins'}) {
t(" Missing duration in entry, skipping");
return undef;
}
if ($prog->{'_date'} !~ m{^\d\d/\d\d/\d{4}$}) {
t(" Bad date '" . $prog->{'_date'} . "' detected for '" . $prog->{'_title'} . "', skipping");
return undef;
}
if ($prog->{'_start'} !~ m/\d\d:\d\d/) {
t(" Bad start time '" . $prog->{'_start'} . "' detected for '" . $prog->{'_title'} . "', skipping");
return undef;
}
if ($prog->{'_duration_mins'} !~ m/\d+/) {
t(" Bad duration '" . $prog->{'_duration'} . "' detected for '" . $prog->{'_title'} . "', skipping");
return undef;
}
if ($prog->{'_duration_mins'} == 0) {
t(" Zero duration detected for '" . $prog->{'_title'} . "', skipping");
return undef;
}
return $prog;
}
# Check boolean fields for valid data. Update 'true'/'false'
# strings to '1'/'0' values
sub validate_boolean_field {
my $prog = shift;
my $field = shift;
if (! defined $prog->{$field}) {
t(" A required true/false value was undefined for '"
. $prog->{'_title'} . "', skipping");
return undef;
}
if ($prog->{$field} !~ m/(true|false)/i) {
t(" A bad true/false value '$prog->{$field}' was seen for '"
. $prog->{'_title'} . "', skipping");
return undef;
}
$prog->{$field} = 1 if $prog->{$field} eq 'true';
$prog->{$field} = 0 if $prog->{$field} eq 'false';
return $prog;
}
# Check for any DST-related information the RT may include in the title
# for a programme. If we find any explicit DST information we store it
# for use later and remove it from the title.
sub check_explicit_tz_in_title {
my $prog = shift;
if ($prog->{'_title'} =~ s/^\((GMT|BST)\)\s*//) {
$prog->{'_explicit_tz'} = $1;
}
}
# Remove any last-minute scheduling info inserted into regular
# description that will affect later regexes.
sub remove_updated_listing_desc {
my $prog = shift;
if (defined $prog->{'_desc'}) {
$prog->{'_desc'} =~ s/\s+/ /g;
if ($prog->{'_desc'} =~ s/\s?(?:UPDATED|UPADTED)\s+LISTING\s?(?:-|:|@)\s?(.*)$//i) {
$prog->{'_updated_listing_info'} = $1;
t(" Removed updated listing information:\n" . " '" . $1 . "'");
}
}
}
# Episode/series numbering is provided in the sub_title field in the source
# data, which is parsed out if seen. Retain episode $sub_title data if
# $episode contains only episode numbering.
sub check_numbering_in_subtitle {
my $prog = shift;
if (defined $prog->{'_sub_title'}) {
extract_numbering_from_sub_title($prog);
# sub_title should be empty after successful parsing
if ($prog->{'_sub_title'} eq '') {
$prog->{'_sub_title'} = undef;
}
# text left in sub_title is most likely _episode info, so move it to _episode
else {
if (! defined $prog->{'_episode'}) {
t(" Using sub-title '" . $prog->{'_sub_title'} . "' as episode not given");
$prog->{'_episode'} = $prog->{'_sub_title'};
$prog->{'_sub_title'} = undef;
}
else {
t(" Merging episode '" . $prog->{'_episode'} . "' with sub_title '" . $prog->{'_sub_title'} . "'");
$prog->{'_episode'} = $prog->{'_episode'} . ": " . $prog->{'_sub_title'};
$prog->{'_sub_title'} = undef;
}
}
}
}
# Check for null or invalid release year
sub validate_year_field {
my $prog = shift;
if (defined $prog->{'_year'}) {
if ($prog->{'_year'} =~ m/null/i) {
t(" Null release year given for this programme.");
$prog->{'_year'} = undef;
}
elsif ($prog->{'_year'} !~ m/\d{4}/) {
t(" Invalid release year given for this programme.");
$prog->{'_year'} = undef;
}
}
}
# Remove production year from $episode for films only
# If we do not already have a valid prod year, use the year
# detected
sub remove_year_from_episode {
my $prog = shift;
if (defined $prog->{'_episode'}) {
if ($prog->{'_film'} && $prog->{'_episode'} =~ s/Prod Year (\d{4})//i) {
t(" Removed production year info from episode details");
$prog->{'_episode'} = undef;
if (! defined $prog->{'_year'}) {
$prog->{'_year'} = $1;
}
}
}
}
# Remove bad chars from $title text before further processing
sub tidy_title_text {
my $prog = shift;
# remove vertical bar/colon
if ($prog->{'_title'} =~ s/([|:])$//) {
t(" Removed '" . $1 . "' from end of title");
}
}
# Some listings appear to be added without being processed upstream
# to provide subtitle (episode) information. The titles of these
# programmes are uppercase and may contain season numbering. Here
# we monitor these before further title processing is carried out.
sub check_uppercase_titles {
my $prog = shift;
if ($opt->{debug} && ($prog->{'_title'} eq uc($prog->{'_title'}))) {
$uc_prog_titles{$prog->{'_title'}} = $prog->{'_title'};
}
}
# Remove duplicated programme title *and* episode from episode field
#
# Listings for programmes may be provided that contain the
# programme title *and* episode duplicated in the episode field:
# i) at the start separated from the episode by colon - $title: $episode: $episode
sub remove_duplicated_title_and_ep_in_ep {
my $prog = shift;
if (defined $prog->{'_episode'}) {
my $tmp_title = $prog->{'_title'};
my $tmp_episode = $prog->{'_episode'};
my $key = $tmp_title . "|" . $tmp_episode;
# Remove the duplicated title/ep from episode field if we find it
# Use a backreference to match the second occurence of the episode text
if ($tmp_episode =~ m/^\Q$tmp_title\E:\s(.+):\s\1$/) {
$prog->{'_episode'} = $1;
t(" Removing duplicated title/ep text from episode field");
if ($opt->{debug}) {
$title_ep_in_subtitle_fixed{$key} = { 'title' => $tmp_title,
'episode' => $tmp_episode,
};
}
}
}
}
# Remove duplicated programme title from episode field
#
# Listings for programmes may be provided that contain the
# programme title duplicated in the episode field, either:
# i) at the start followed by the 'real' episode in parentheses (rare);
# ii) at the start separated from the episode by a colon/hyphen; or
# iii) at the end separated from the episode by a colon/hyphen
#
sub remove_duplicated_title_in_ep {
my $prog = shift;
if (defined $prog->{'_episode'}) {
my $tmp_title = $prog->{'_title'};
my $tmp_episode = $prog->{'_episode'};
my $key = $tmp_title . "|" . $tmp_episode;
# Remove the duplicated title from episode field if we find it
if ($tmp_episode =~ m/^\Q$tmp_title\E(?::\s|\s-\s)(.+)$/ || $tmp_episode =~ m/^\Q$tmp_title\E\s+\((.+)\)$/) {
$prog->{'_episode'} = $1;
t(" Removing title text from beginning of episode field");
if ($opt->{debug}) {
$title_in_subtitle_fixed{$key} = { 'title' => $tmp_title,
'episode' => $tmp_episode,
};
}
}
# Look for title appearing at end of episode field
elsif ($tmp_episode =~ m/^(.+)(?::\s|\s-\s)\Q$tmp_title\E$/) {
$prog->{'_episode'} = $1;
t(" Removing title text from end of episode field");
if ($opt->{debug}) {
$title_in_subtitle_fixed{$key} = { 'title' => $tmp_title,
'episode' => $tmp_episode,
};
}
}
}
}
# Process programme against supplemental title fixups
sub process_title_fixups {
my $prog = shift;
# Remove non-title text found in programme title.
#
# Applied to all titles in the source data (type = 1)
process_non_title_info($prog);
# Track when titles/subtitles have been updated to allow
# short-circuiting of title processing
$prog->{'_titles_processed'} = 0;
$prog->{'_subtitles_processed'} = 0;
# Next, process titles to make them consistent
#
# One-off demoted title replacements (type = 11)
if (! $prog->{'_titles_processed'}) {
process_demoted_titles($prog);
}
# One-off title and episode replacements (type = 10)
if (! $prog->{'_titles_processed'}) {
process_replacement_titles_desc($prog);
}
# One-off title and episode replacements (type = 8)
if (! $prog->{'_titles_processed'}) {
process_replacement_titles_episodes($prog);
}
# Look for $title:$episode in source title (type = 2)
if (! $prog->{'_titles_processed'}) {
process_mixed_title_subtitle($prog);
}
# Look for $episode:$title in source title (type = 3)
if (! $prog->{'_titles_processed'}) {
process_mixed_subtitle_title($prog);
}
# Look for reversed title and subtitle information (type = 4)
if (! $prog->{'_titles_processed'}) {
process_reversed_title_subtitle($prog);
}
# Look for inconsistent programme titles (type = 5)
#
# This fixup is applied to all titles (processed or not) to handle
# titles split out in fixups of types 2-4 above
process_replacement_titles($prog);
# Next, process subtitles to make them consistent
#
# Look for inconsistent programme subtitles (type = 7)
if (! $prog->{'_subtitles_processed'}) {
process_replacement_episodes($prog);
}
# Replace subtitle based on description (type = 9)
if (! $prog->{'_subtitles_processed'}) {
process_replacement_ep_from_desc($prog);
}
# Last, provide/update a programme's category based on 'corrected' title
# (type = 6)
process_replacement_genres($prog);
}
# Check for potential episode numbering that still remains in the title
# or episode fields
sub check_potential_numbering_in_text {
my $prog = shift;
extract_numbering_from_episode($prog);
extract_numbering_from_title($prog);
extract_part_numbering_from_episode($prog);
# after processing see if $title contains "season" text that should
# probably be removed
if ($opt->{debug} && $prog->{'_title'} =~ m/season/i) {
t(" Title text contains \"Season\": " . $prog->{'_title'});
$title_text_to_remove{$prog->{'_title'}} = $prog->{'_title'};
}
# after processing see if $episode contains "series" text
if ($opt->{debug} && defined $prog->{'_episode'} && $prog->{'_episode'} =~ m/series/i) {
t(" Possible series numbering still seen: " . $prog->{'_episode'});
$possible_series_nums{$prog->{'_episode'}} = $prog->{'_episode'};
}
# check for potential episode numbering left unprocessed
if ($opt->{debug} && defined $prog->{'_episode'}
&& ($prog->{'_episode'} =~ m/^\d{1,2}\D/ || $prog->{'_episode'} =~ m/\D\d{1,2}$/)
&& $prog->{'_episode'} !~ m/(Part|Pt(\.)?)(\d+|\s+\w+)/) {
t(" Possible episode numbering still seen: " . $prog->{'_episode'});
$possible_episode_nums{$prog->{'_episode'}} = $prog->{'_episode'};
}
}
# Set $episode to undefined if empty/whitespace
sub tidy_episode_text {
my $prog = shift;
if (defined $prog->{'_episode'} && $prog->{'_episode'} =~ m/^\s*$/) {
$prog->{'_episode'} = undef;
}
}
# Store a variety of title debugging information for later analysis
# and debug output
sub store_title_debug_info {
my $prog = shift;
if ($opt->{debug}) {
# Monitor for case/punctuation-insensitive title variations
my $title_nopunc = lc $prog->{'_title'};
$title_nopunc =~ s/^the\s+//;
$title_nopunc =~ s/(\s+and\s+|\s+&\s+)/ /g;
$title_nopunc =~ s/\s+No 1s$//g;
$title_nopunc =~ s/\s+No 1's$//g;
$title_nopunc =~ s/\s+Number Ones$//g;
$title_nopunc =~ s/' //g;
$title_nopunc =~ s/'s/s/g;
$title_nopunc =~ s/\W//g;
# count number of each variant by genre and channel name
my $tmp_genre;
$tmp_genre = $prog->{'_genre'}; $tmp_genre = "No Genre" if not defined $tmp_genre;
$case_insens_titles{$title_nopunc}{ $prog->{'_title'} }{$tmp_genre}{ $prog->{'_rt_name'} }++;
$case_insens_titles{$title_nopunc}{ $prog->{'_title'} }{'count'}++;
# Check for title text still present in episode details
if (defined $prog->{'_episode'} && $prog->{'_episode'} =~ m/^\Q$prog->{'_title'}\E.*$/) {
my $key = $prog->{'_title'} . "|" . $prog->{'_episode'};
$title_in_subtitle_notfixed{$key} = { 'title' => $prog->{'_title'},
'episode' => $prog->{'_episode'},
};
}
# Check for episode details that contain a colon/hyphen - these may indicate
# a title in the episode field that needs to be moved into the title field
if (defined $prog->{'_episode'} && $prog->{'_episode'} =~ m/(:|\s+-\s+)/) {
my $key = $prog->{'_title'} . "|" . $prog->{'_episode'};
$colon_in_subtitle{$key} = { 'title' => $prog->{'_title'},
'episode' => $prog->{'_episode'},
};
}
# Add title to the list of programme titles for later debugging.
# Likewise for films, but store in separate hash.
if (! $prog->{'_film'}) {
$prog_titles{$prog->{'_title'}} = $prog->{'_title'};
}
else {
$film_titles{$prog->{'_title'}} = $prog->{'_title'};
}
}
}
# Occasionally film listings contain the title duplicated in the
# $episode field, so we remove it here
sub check_duplicated_film_title {
my $prog = shift;
if ($prog->{'_film'} && defined $prog->{'_episode'}
&& (uc $prog->{'_title'} eq uc $prog->{'_episode'})) {
$prog->{'_episode'} = undef;
}
}
# Check for films without a valid release year (absent or null string
# seen in source data)
sub check_missing_film_year {
my $prog = shift;
if ($prog->{'_film'} && ! defined $prog->{'_year'}) {
t(" No release year given for this film.");
}
}
# Tidy description text
sub tidy_desc_text {
my $prog = shift;
if (defined $prog->{'_desc'}) {
$prog->{'_desc'} =~ s/\s+/ /g;
}
}
# Update the premiere/repeat flags based on contents of programme desc
sub update_premiere_repeat_flags_from_desc {
my $prog = shift;
if (defined $prog->{'_desc'}) {
# Check if desc start with "Premiere.". Remove if found and set flag
if ($prog->{'_desc'} =~ s/^Premiere\.\s*//) {
t(" Setting premiere flag based on description (Premiere. )");
$prog->{'_premiere'} = 1;
}
# Flag showings described as repeats
elsif ($prog->{'_desc'} =~ m/^Another chance/) {
t(" Setting repeat flag based on description (Another chance...)");
$prog->{'_repeat'} = 1;
}
# Check if desc starts with "New series..."
elsif ($prog->{'_desc'} =~ m/^New series/) {
t(" Setting premiere flag based on description (New series...)");
$prog->{'_premiere'} = 1;
# Now check if desc starts with "New series [(x/y)]. "
# Remove text and preserve numbering for processing below
if ($prog->{'_desc'} =~ m/^New series(\s*\d+\/\d+\s*)?\.\s*/i) {
$prog->{'_desc'} =~ s/^New series\s*//i;
$prog->{'_desc'} =~ s/^\s*\.\s*//i;
}
}
}
}
# Check for potential season/episode numbering in description. Only
# use numbering found in the desc if we have not already found it
# elsewhere (i.e. prefer data provided in the subtitle field of the
# raw data).
#
sub extract_numbering_from_desc {
my $prog = shift;
if (defined $prog->{'_desc'}) {
# Extract episode and series info from start of description
# "1/6. ..."
# "1/6; series one. ..."
if ($prog->{'_desc'} =~
s{
^ # start at beginning of episode details
(\d+) # CAPTURE the first number(s) found ($episode_num)
\s* # ignore any whitespace
(?:&\d+)? # check for "&2" details relating to following episode
\s* # ignore any whitespace
\/ # forward slash
\s* # ignore any whitespace
(\d+) # CAPTURE the second number(s) found ($num_episodes)
\s* # ignore any whitespace
(?:\.|;)? # check for punctuation characters
\s* # ignore any whitespace
(?:series\s+(\w+)\s*\.)?
# check for series number information ($series_num)
\s* # ignore any whitespace
}
{}ix ) {
my $updated_from_desc = 0;
if ( ! defined $prog->{'_episode_num'} && ! defined $prog->{'_num_episodes'} ) {
$prog->{'_episode_num'} = $1 - 1;
$prog->{'_num_episodes'} = $2;
t(" Episode number found: episode $1 of $2 (desc)");
$updated_from_desc++;
}
else {
t(" Ignoring episode numbering seen in desc (episode $1 of $2)");
}
if ( defined $3 ) {
my $series_digits = word_to_digit($3);
if ( ! defined $prog->{'_series_num'} ) {
if (defined $series_digits and $series_digits > 0) {
$prog->{'_series_num'} = $series_digits - 1;
t(" Series number found: series $series_digits (parsed as $3 from desc)");
$updated_from_desc++;
}
}
else {
t(" Ignoring series numbering seen in desc (series $series_digits)");
}
}
$prog->{'_processed'} = 1 if $updated_from_desc;
}
}
}
# Create episode numbering based on information extracted from $episode
# and $desc fields.
sub generate_episode_numbering {
my $prog = shift;
# series number is zero-indexed
if (! defined $prog->{'_series_num'} || $prog->{'_series_num'} < 0) {
$prog->{'_series_num'} = '';
}
# episode number is zero-indexed
if (! defined $prog->{'_episode_num'} || $prog->{'_episode_num'} < 0) {
$prog->{'_episode_num'} = '';
}
# episode total is one-indexed and should always be greater than the
# max episode number (which is zero-indexed)
if (defined $prog->{'_num_episodes'}
&& $prog->{'_num_episodes'} > 0
&& $prog->{'_num_episodes'} > $prog->{'_episode_num'} ) {
$prog->{'_num_episodes'} = "/" . $prog->{'_num_episodes'};
}
else {
$prog->{'_num_episodes'} = '';
}
# Create details if we have the series and/or episode numbers
if ($prog->{'_series_num'} ne '' || $prog->{'_episode_num'} ne '') {
return "" . $prog->{'_series_num'} . "." . $prog->{'_episode_num'}
. $prog->{'_num_episodes'} . "." . "";
}
return undef;
}
# Create cast list based on various cast formats seen in source data
sub generate_cast_list {
my $prog = shift;
# The Radio Times data includes cast information in 2 formats:
#
# a) pairings of 'character*actor' with subsequent pairings
# separated by '|' - '*' does not appear in any text
# b) a comma separated list of actors with no character details
#
# If 'Director' appears in the character entry, this is to be used
# as a regular cast member, not the programme's director
if (defined $prog->{'_cast'}) {
my @cast;
$prog->{'_cast'} =~ s/\s+/ /g; # remove extra spaces
$prog->{'_cast'} =~ s/\|\|/\|/g; # remove empty pipe-separated fields
$prog->{'_cast'} =~ s/,,/,/g; # remove empty comma-separated fields
# First we check for 'character*actor' entries
if ($prog->{'_cast'} =~ tr/*//) {
# Multiple 'character*actor'entries
if ($prog->{'_cast'} =~ tr/|//) {
@cast = split /\|/, $prog->{'_cast'};
}
# Single 'character*actor' entry
else {
push @cast, $prog->{'_cast'};
}
# Now process the list of cast entries
foreach my $cast (@cast) {
# Replace any actor given as Himself/Herself with the
# character name given
if ($cast =~ m/^(.*)[*](Himself|Herself)$/) {
$cast = "$1*$1";
}
# Remove the 'character*' portion of the entry
if ($cast !~ s/^.*[*]//) {
t(" Bad cast entry for '" . $prog->{'_title'} . "': " . $cast);
}
}
}
# Next we check for CSV-style actor entries
elsif ($prog->{'_cast'} =~ tr/,//) {
@cast = split /,/, $prog->{'_cast'};
}
# Finally assume a single name that contains neither '*' nor ','
else {
push @cast, $prog->{'_cast'};
}
# Trim whitespace from beginning/end of actor names
foreach my $cast (@cast) {
$cast =~ s/^\s+//;
$cast =~ s/\s+$//;
}
return \@cast if scalar @cast > 0;
}
return undef;
}
# Store details of uncategorised programmes and programmes having different
# genres throughout the listings for alter analysis
sub store_genre_debug_info {
my $prog = shift;
if ($opt->{debug}) {
if (defined $prog->{'_genre'} && ! $prog->{'_film'}) {
$categories{$prog->{'_genre'}} = $prog->{'_genre'};
if ($prog->{'_genre'} =~ m/^(No Genre)$/
&& $prog->{'_title'} !~ m/^(To Be Announced|TBA|Close)$/) {
$uncategorised_progs{$prog->{'_title'}} = $prog->{'_title'};
}
$cats_per_prog{$prog->{'_title'}}{$prog->{'_genre'}}++;
}
elsif (! defined $prog->{'_genre'} && $prog->{'_title'} !~ m/^(To Be Announced|TBA|Close)$/) {
$uncategorised_progs{$prog->{'_title'}} = $prog->{'_title'};
}
}
}
# Broadcast date, start/stop times, and timezone adjustments.
#
# For each programme entry, the Radio Times data includes the
# date at start of broadcast, the start time and the stop time.
#
# The Radio Times sometimes explicitly flags a programme's start/stop
# times as being in a specific timezone (GMT or BST). We parse this
# information out when processing the programme's title and apply it
# to the start time of any such programmes. Flagged programmes are
# seen in the data in March and October, when British Summer Times
# begins and ends.
#
# We calculate the programme's stop time using the
# UTC-offset-corrected start time and its stated length. This allows
# us to handle occasions when programmes having mixed GMT/BST
# timings are not flagged.
#
# The Summer Time Order of 2002 defines British Summer Time as
# "...the period beginning at one o'clock, Greenwich mean time, in
# the morning of the last Sunday in March and ending at one o'clock,
# Greenwich mean time, in the morning of the last Sunday in October."
sub generate_start_stop_times {
my $prog = shift;
my $ts_dt = shift;
my ($dd, $mm, $yyyy) = ($prog->{'_date'} =~ m{(\d\d)/(\d\d)/(\d{4})});
my ($start_hr, $start_mn) = ($prog->{'_start'} =~ m/(\d\d):(\d\d)/);
t(" Start time given as '" . $yyyy . "/" . $mm . "/" . $dd . " "
. $start_hr . ":" . $start_mn . "', duration " . $prog->{'_duration_mins'} . " mins");
# Use explicit GMT/BST information if found in title
my $tz = 'Europe/London';
if (defined $prog->{'_explicit_tz'}) {
t(" Explicit timezone '" . $prog->{'_explicit_tz'} . "' detected in title");
if ($prog->{'_explicit_tz'} eq 'GMT') {
t(" Forcing timezone to GMT/+0000");
$tz = '+0000';
}
elsif ($prog->{'_explicit_tz'} eq 'BST') {
t(" Forcing timezone to BST/+0100");
$tz = '+0100';
}
}
# Determine start time with correct UTC offset
my $start_dt = DateTime->new(
year => $yyyy,
month => $mm,
day => $dd,
hour => $start_hr,
minute => $start_mn,
second => 0,
time_zone => $tz,
);
$prog->{start} = "" . $start_dt->ymd('') . $start_dt->hms('') . ' ' . DateTime::TimeZone->offset_as_string($start_dt->offset);
t(" " . $prog->{start} . " - Start time");
# Determine stop time with correct UTC offset by adding duration of
# programme to start time
my $dur = DateTime::Duration->new( minutes => $prog->{'_duration_mins'} );
my $stop_dt = $start_dt + $dur;
$prog->{stop} = "" . $stop_dt->ymd('') . $stop_dt->hms('') . ' ' . DateTime::TimeZone->offset_as_string($stop_dt->offset);
t(" " . $prog->{stop} . " - Stop time");
# Now we have determined the correct start/stop times for the programme
# add any required timeshift defined in channel_ids and preserve the
# correct timezone information
#
if (defined $channel_offset{ $prog->{channel} }) {
$start_dt = $start_dt + $ts_dt;
$stop_dt = $stop_dt + $ts_dt;
$prog->{start} = "" . $start_dt->ymd('') . $start_dt->hms('') . ' ' . DateTime::TimeZone->offset_as_string($start_dt->offset);
$prog->{stop} = "" . $stop_dt->ymd('') . $stop_dt->hms('') . ' ' . DateTime::TimeZone->offset_as_string($stop_dt->offset);
t(" " . $prog->{start} . " - Start time after applying '" . $channel_offset{ $prog->{channel} } . "' timeshift");
t(" " . $prog->{stop} . " - Stop time after applying '" . $channel_offset{ $prog->{channel} } . "' timeshift");
}
# Now check to see whether the channel broadcasting the programme is a
# part-time channel, and if so, see whether this programme's timeslot
# times fall within the broadcast window. If a channel broadcasts
# through the night, we also need to test against the next day's
# broadcast times.
#
if (defined $broadcast_hours{ $prog->{channel} }) {
$broadcast_hours{ $prog->{channel} } =~ m/(\d\d)(\d\d)-(\d\d)(\d\d)/;
my ($chan_start_hr, $chan_start_mn, $chan_stop_hr, $chan_stop_mn) = ($1, $2, $3, $4);
my $chan_start_dt = DateTime->new(
year => $yyyy,
month => $mm,
day => $dd,
hour => $chan_start_hr,
minute => $chan_start_mn,
second => 0,
time_zone => 'Europe/London',
);
my $chan_start = "" . $chan_start_dt->ymd('') . $chan_start_dt->hms('') . ' ' . DateTime::TimeZone->offset_as_string($chan_start_dt->offset);
my $chan_stop_dt = DateTime->new(
year => $yyyy,
month => $mm,
day => $dd,
hour => $chan_stop_hr,
minute => $chan_stop_mn,
second => 0,
time_zone => 'Europe/London',
);
# Correct the stop time if it is earlier than the start time
my $chan_stop_next_day = 0;
if ($chan_start_dt >= $chan_stop_dt) {
$chan_stop_dt = $chan_stop_dt + $day_dur;
$chan_stop_next_day = 1;
}
my $chan_stop = "" . $chan_stop_dt->ymd('') . $chan_stop_dt->hms('') . ' ' . DateTime::TimeZone->offset_as_string($chan_stop_dt->offset);
t(" " . $chan_start . " - Start time of channel (current day)");
t(" " . $chan_stop . " - Stop time of channel (current day)");
# Include the current programme if its timeslot lies inside the
# channel's broadcast window
if ($start_dt >= $chan_start_dt && $stop_dt <= $chan_stop_dt) {
t(" '" . $prog->{'_title'} . "' shown whilst channel is on-air, adding");
}
# If the channel starts and stops broadcasting on the same
# calendar day and the programme's timeslot is outside the
# broadcast window, skip it
elsif (( $start_dt < $chan_start_dt || $stop_dt > $chan_stop_dt)
&& $chan_stop_next_day == 0 ) {
t(" '" . $prog->{'_title'} . "' shown whilst channel is off-air, skipping\n");
return undef;
}
else {
# If the channel broadcasts through the night, and the channel
# start time is later than the stop time (i.e. 2300-0600), it is
# possible for a program shown at or after midnight to result in
# the generation of incorrect channel start/stop times (shifted
# +1day forward). We therefore generate another pair of channel
# start/stop times for the previous day to match against.
#
# Example: consider a 30min programme broadcast on 20120101 at 00:30
# and we're comparing it to a channel that broadcasts between
# 2300 and 0600. We generate start/stop times of 201201012300 and
# 201201020600 for the channel, but the programme starts/stops
# at 201201010030 and 201201010100. These times occur whilst the
# channel is on-air the _previous_ day.
#
$chan_start_dt = $chan_start_dt - $day_dur;
$chan_stop_dt = $chan_stop_dt - $day_dur;
my $chan_start = "" . $chan_start_dt->ymd('') . $chan_start_dt->hms('') . ' ' . DateTime::TimeZone->offset_as_string($chan_start_dt->offset);
my $chan_stop = "" . $chan_stop_dt->ymd('') . $chan_stop_dt->hms('') . ' ' . DateTime::TimeZone->offset_as_string($chan_stop_dt->offset);
t(" " . $chan_start . " - Start time of channel (previous day)");
t(" " . $chan_stop . " - Stop time of channel (previous day)");
# Test again to see if the programme falls between the adjusted
# channel broadcast times
if ($start_dt >= $chan_start_dt && $stop_dt <= $chan_stop_dt) {
t(" '" . $prog->{'_title'} . "' shown whilst channel is on-air, adding");
} else {
t(" '" . $prog->{'_title'} . "' shown whilst channel is off-air, skipping\n");
return undef;
}
}
}
return [ $prog->{start}, $prog->{stop} ];
}
# Remove non-title text found in programme title. This text is placed at the
# start of the 'real' title, separated from it by a colon.
#
# Text to try and match against the programme title is stored in a hash of arrays
# to shortcut the list of possible matches to those beginning with the same
# first character as the title. It would seem to be quicker to use a regex
# to match some amount of text up to colon character in the programme title,
# and then use a hash lookup against the matched text. However, there is no
# limit to the number of colons in the text to remove, so this approach cannot
# be used. NOTE: the method is used for several of the title consistency
# routines in order to speed up processing.
#
sub process_non_title_info {
my $prog = shift;
if ($have_title_data && %non_title_info && $prog->{'_title'} =~ m/:/) {
my $idx_char = lc(substr $prog->{'_title'}, 0, 1);
NON_TITLE_TEXT:
foreach my $non_title_info (@{$non_title_info{$idx_char}}) {
if ($prog->{'_title'} =~ s/^(\Q$non_title_info\E)\s*:\s*//i) {
t(" Removed '" . $non_title_info
. "' from title. New title '" . $prog->{'_title'} . "'");
last NON_TITLE_TEXT;
}
}
}
}
# Promote demoted title from subtitle field to title field, replacing whatever
# text is in the title field at the time. If the demoted title if followed by
# a colon and the subtitle text, that is preserved in the subtitle field.
#
# A title can be demoted to the subtitle field if the programme's "brand"
# is present in the title field, as can happen with data output from Atlas.
#
# Text to try and match against the programme subtitle is stored in a hash of arrays
# to shortcut the list of possible matches to those beginning with the same
# first character as the title (as with process_non_title_info() ).
#
sub process_demoted_titles {
my $prog = shift;
my $brand = $prog->{'_title'};
if ($have_title_data && %demoted_title && defined $prog->{'_episode'}
&& defined $demoted_title{ $brand }) {
DEMOTED_TITLE:
foreach my $demoted_title (@{$demoted_title{ $brand }}) {
my $new_ep;
if ($prog->{'_episode'} =~ m/^\Q$demoted_title\E$/i) {
$new_ep = '';
}
elsif ($prog->{'_episode'} =~ m/^\Q$demoted_title\E(?::|\s-)\s(.*)$/i) {
$new_ep = $1;
}
else {
next DEMOTED_TITLE;
}
$prog->{'_title'} = $demoted_title;
$prog->{'_episode'} = $new_ep;
t(" Promoted title '" . $demoted_title . "' from subtitle for brand '"
. $brand . "'. New subtitle '" . $prog->{'_episode'} . "'");
$prog->{'_titles_processed'} = 1;
$prog->{'_subtitles_processed'} = 1;
last DEMOTED_TITLE;
}
}
}
# Allow arbitrary replacement of one title/episode pair with another, based
# on a given description.
#
# Intended to be used where previous title/episode replacement routines
# do not allow a specific enough correction to the listings data (i.e. for
# one-off changes).
#
# *** THIS MUST BE USED WITH CARE! ***
#
sub process_replacement_titles_desc {
my $prog = shift;
if ($have_title_data && %replacement_title_desc && defined $prog->{'_desc'} ) {
my $tmp_ep;
my $tmp_ep_num;
my $tmp_ep_num_text = '';
# Handle potential undef episode value, as the empty string
# was used in place of an undef episode during concatenation
# in the replacement hash
if (not defined $prog->{'_episode'}) {
$tmp_ep = '';
}
# Also handle an episode number that may be present in source
# data but not in replacement text
elsif ($prog->{'_episode'} =~ m/^\d+\s*(?:&\d+)?\s*\/\s*\d+\s*(?:\.|\/|,|;|:)?\s*-?\s*(?:series\s*(\w+)\s*(?:;|\.)?)?\s*$/) {
$tmp_ep = '';
$tmp_ep_num = $prog->{'_episode'};
$tmp_ep_num_text = " (Preserving existing numbering)";
}
else {
$tmp_ep = $prog->{'_episode'};
}
my $key = "" . $prog->{'_title'} . "|" . $tmp_ep . "|" . $prog->{'_desc'};
# Check whether we have matched the old programme title/episode/desc combo
if (defined $replacement_title_desc{$key}) {
# Now replace the old title/ep values with new ones
my ($old_title, $old_ep) = ($prog->{'_title'}, $tmp_ep);
my ($new_title, $new_ep) = @{$replacement_title_desc{$key}};
# update the title
$prog->{'_title'} = $new_title;
# if new episode value is empty string, replace with undef;
# otherwise use new value
if ($new_ep eq '') {
if (defined $tmp_ep_num) {
$prog->{'_episode'} = $tmp_ep_num;
}
else {
$prog->{'_episode'} = undef;
}
}
else {
if (defined $tmp_ep_num) {
$prog->{'_episode'} = $tmp_ep_num . ": " . $new_ep;
}
else {
$prog->{'_episode'} = $new_ep;
}
}
t(" Replaced old title/ep '" . $old_title . " / " . $old_ep
. "' with new title/ep '" . $new_title . " / " . $new_ep
. "' using desc" . $tmp_ep_num_text);
$prog->{'_titles_processed'} = 1;
}
}
}
# Allow arbitrary replacement of one title/episode pair with another.
# Intended to be used where previous title/episode replacement routines
# do not allow the desired correction (i.e. for one-off changes).
#
# *** THIS MUST BE USED WITH CARE! ***
#
sub process_replacement_titles_episodes {
my $prog = shift;
if ($have_title_data && %replacement_title_eps) {
my $tmp_ep;
my $tmp_ep_num;
my $tmp_ep_num_text = '';
# Handle potential undef episode value, as the empty string
# was used in place of an undef episode during concatenation
# in the replacement hash
if (not defined $prog->{'_episode'}) {
$tmp_ep = '';
}
# Also handle an episode number that may be present in source
# data but not in replacement text
elsif ($prog->{'_episode'} =~ m/^\d+\s*(?:&\d+)?\s*\/\s*\d+\s*(?:\.|\/|,|;|:)?\s*-?\s*(?:series\s*(\w+)\s*(?:;|\.)?)?\s*$/) {
$tmp_ep = '';
$tmp_ep_num = $prog->{'_episode'};
$tmp_ep_num_text = " (Preserving existing numbering)";
}
else {
$tmp_ep = $prog->{'_episode'};
}
my $key = "" . $prog->{'_title'} . "|" . $tmp_ep;
# Check whether we have matched the old programme title/episode combo
if (defined $replacement_title_eps{$key}) {
# Now replace the old title/ep values with new ones
my ($old_title, $old_ep) = ($prog->{'_title'}, $tmp_ep);
my ($new_title, $new_ep) = @{$replacement_title_eps{$key}};
# update the title
$prog->{'_title'} = $new_title;
# if new episode value is empty string, replace with undef;
# otherwise use new value
if ($new_ep eq '') {
if (defined $tmp_ep_num) {
$prog->{'_episode'} = $tmp_ep_num;
}
else {
$prog->{'_episode'} = undef;
}
}
else {
if (defined $tmp_ep_num) {
$prog->{'_episode'} = $tmp_ep_num . ": " . $new_ep;
}
else {
$prog->{'_episode'} = $new_ep;
}
}
t(" Replaced old title/ep '" . $old_title . " / " . $old_ep
. "' with new title/ep '" . $new_title . " / " . $new_ep
. "'" . $tmp_ep_num_text);
$prog->{'_titles_processed'} = 1;
}
}
}
# Some given programme titles contain both the title and episode data,
# separated by a colon ($title:$episode) or a hyphen ($title - $episode).
# Here we reassign the episode to the $episode element, leaving only the
# programme's title in the $title element
#
sub process_mixed_title_subtitle {
my $prog = shift;
if ($have_title_data && %mixed_title_subtitle && $prog->{'_title'} =~ m/:|-/) {
my $idx_char = lc(substr $prog->{'_title'}, 0, 1);
MIXED_TITLE_SUBTITLE:
foreach my $mixed_title_subtitle (@{$mixed_title_subtitle{$idx_char}}) {
if ($prog->{'_title'} =~ m/^(\Q$mixed_title_subtitle\E)\s*(?::|-)\s*(.*)/) {
# store the captured text
my $new_title = $1;
my $new_episode = $2;
$prog->{'_titles_processed'} = 1;
if (! defined $prog->{'_episode'}) {
t(" Moved '" . $new_episode . "' to sub-title,"
. " new title is '" . $new_title . "'");
$prog->{'_title'} = $new_title;
$prog->{'_episode'} = $new_episode;
last MIXED_TITLE_SUBTITLE;
}
elsif ($prog->{'_episode'} eq $new_episode) {
t(" Sub-title '" . $new_episode . "' seen in "
. "title already exists, new title is '"
. $new_title . "'");
$prog->{'_title'} = $new_title;
last MIXED_TITLE_SUBTITLE;
}
# concat subtitle after any episode numbering (x/y)
elsif ($prog->{'_episode'} =~ m/^\d+\s*(?:&\d+)?\s*\/\s*\d+\s*(?:\.|\/|,|;|:)?\s*-?\s*(?:series\s*(\w+)\s*(?:;|\.)?)?\s*$/) {
t(" Concatenating sub-title '" . $new_episode
. "' seen in title after existing episode numbering '"
. $prog->{'_episode'} . "'");
$prog->{'_title'} = $new_title;
$prog->{'_episode'} = $prog->{'_episode'} . ": " . $new_episode;
last MIXED_TITLE_SUBTITLE;
}
else {
t(" Concatenating sub-title '" . $new_episode
. "' seen in title with existing episode info '"
. $prog->{'_episode'} . "'");
$prog->{'_title'} = $new_title;
$prog->{'_episode'} = $new_episode . ": " . $prog->{'_episode'};
last MIXED_TITLE_SUBTITLE;
}
}
}
}
}
# Some given programme titles contain both the episode and title data,
# separated by a colon ($episode:$title) or a hyphen ($episode - $title).
# Here we reassign the episode to the $episode element, leaving only the
# programme's title in the $title element
#
sub process_mixed_subtitle_title {
my $prog = shift;
if ($have_title_data && @mixed_subtitle_title && $prog->{'_title'} =~ m/:|-/) {
MIXED_SUBTITLE_TITLE:
foreach my $mixed_subtitle_title (@mixed_subtitle_title) {
if ($prog->{'_title'} =~ m/^(.*)\s*(?::|-)\s*(\Q$mixed_subtitle_title\E)/) {
# store the captured text
my $new_title = $2;
my $new_episode = $1;
$prog->{'_titles_processed'} = 1;
if (! defined $prog->{'_episode'}) {
t(" Moved '" . $new_episode . "' to sub-title, "
. "new title is '" . $new_title . "'");
$prog->{'_title'} = $new_title;
$prog->{'_episode'} = $new_episode;
last MIXED_SUBTITLE_TITLE;
}
elsif ($prog->{'_episode'} eq $new_episode) {
t(" Identical sub-title '" . $new_episode
. "' also seen in title, new title is '"
. $new_title . "'");
$prog->{'_title'} = $new_title;
last MIXED_SUBTITLE_TITLE;
}
# concat subtitle after any episode numbering (x/y)
elsif ($prog->{'_episode'} =~ m/^\d+\s*(?:&\d+)?\s*\/\s*\d+\s*(?:\.|\/|,|;|:)?\s*-?\s*(?:series\s*(\w+)\s*(?:;|\.)?)?\s*$/) {
t(" Concatenating sub-title '" . $new_episode
. "' seen in title after existing episode numbering '"
. $prog->{'_episode'} . "'");
$prog->{'_title'} = $new_title;
$prog->{'_episode'} = $prog->{'_episode'} . ": " . $new_episode;
last MIXED_SUBTITLE_TITLE;
}
else {
t(" Concatenating sub-title '" . $new_episode
. "' seen in title with existing episode info '"
. $prog->{'_episode'} . "'");
$prog->{'_title'} = $new_title;
$prog->{'_episode'} = $new_episode . ": " . $prog->{'_episode'};
last MIXED_SUBTITLE_TITLE;
}
}
}
}
}
# Listings for some programmes may have reversed title and sub-title information
# ($title = 'real' episode and $episode = 'real' title. In order to create more
# consistent data, we check for flagged programme titles and reverse the given
# title and sub-title when found.
#
sub process_reversed_title_subtitle {
my $prog = shift;
if ($have_title_data && %reversed_title_subtitle && defined $prog->{'_episode'}) {
my $idx_char = lc(substr $prog->{'_title'}, 0, 1);
REVERSED_TITLE_SUBTITLE:
foreach my $reversed_title_subtitle (@{$reversed_title_subtitle{$idx_char}}) {
if ($reversed_title_subtitle eq $prog->{'_episode'}) {
t(" Seen reversed title-subtitle for '"
. $prog->{'_title'} . ":" . $prog->{'_episode'} . "' - reversing" );
$prog->{'_episode'} = $prog->{'_title'};
$prog->{'_title'} = $reversed_title_subtitle;
t(" New title is '" . $prog->{'_title'} . "' and new "
. "sub-title is '" . $prog->{'_episode'} . "'");
$prog->{'_titles_processed'} = 1;
last REVERSED_TITLE_SUBTITLE;
}
}
}
}
# Process inconsistent titles, replacing any flagged bad titles with good
# titles. A straightforward hash lookup against the programme title is used.
#
sub process_replacement_titles {
my $prog = shift;
if ($have_title_data && %replacement_titles) {
my $bad_title = $prog->{'_title'};
if (defined $replacement_titles{$bad_title}) {
$prog->{'_title'} = $replacement_titles{$bad_title};
t(" Replaced title '" . $bad_title . "' with '" . $prog->{'_title'} . "'");
$prog->{'_titles_processed'} = 1;
}
}
}
# Process inconsistent episodes. The %replacement_episodes data structure
# is a hash of hashes.
#
sub process_replacement_episodes {
my $prog = shift;
if ($have_title_data && %replacement_episodes && defined $prog->{'_episode'}) {
my $bad_episode_title = $prog->{'_title'};
my $bad_episode = $prog->{'_episode'};
# First, check whether we have matched the programme title
if (defined $replacement_episodes{$bad_episode_title}) {
# Now look for a specific episode match for the title
if (defined $replacement_episodes{$bad_episode_title}->{$bad_episode}) {
$prog->{'_episode'} = $replacement_episodes{$bad_episode_title}->{$bad_episode};
t(" Replaced episode info '" . $bad_episode . "' with '" . $prog->{'_episode'} . "'");
$prog->{'_subtitles_processed'} = 1;
}
}
}
}
# Replace an inconsistent or missing episode subtitle based a given description.
# The description should therefore be unique for each episode of the programme.
# The %replacement_ep_from_desc data structure is a hash of hashes.
#
sub process_replacement_ep_from_desc {
my $prog = shift;
if ($have_title_data && %replacement_ep_from_desc && defined $prog->{'_desc'}) {
my $bad_episode_title = $prog->{'_title'};
my $bad_ep_desc = $prog->{'_desc'};
# First, check whether we have matched the programme title
if (defined $replacement_ep_from_desc{$bad_episode_title}) {
# Now look for a specific desc match for the title
if (defined $replacement_ep_from_desc{$bad_episode_title}->{$bad_ep_desc}) {
my $old_ep;
(defined $prog->{'_episode'}) ? ($old_ep = $prog->{'_episode'}) : ($old_ep = '');
$prog->{'_episode'} = $replacement_ep_from_desc{$bad_episode_title}->{$bad_ep_desc};
t(" Updated episode from '" . $old_ep . "' to '" . $prog->{'_episode'}
. "' for title '" . $bad_episode_title . "', based on desc '");
$prog->{'_subtitles_processed'} = 1;
}
}
}
}
# Process programmes that may not be categorised, or are categorised with
# various categories in the source data. Untitled programmes ("To Be Announced")
# and films are ignored. Different programmes having identical titles should not
# be replaced using this routine as it may cause such programmes to be given
# inaccurate genres.
#
sub process_replacement_genres {
my $prog = shift;
if ($have_title_data && %replacement_cats && $prog->{'_title'} !~ m/^(To Be Announced|TBA)/i && ! $prog->{'_film'}) {
if (defined $replacement_cats{$prog->{'_title'}}) {
$prog->{'_genre'} = $replacement_cats{$prog->{'_title'}};
t(" Assigned title '" . $prog->{'_title'} . "' to category '" . $prog->{'_genre'} . "'");
}
}
}
# Extract series/episode numbering found in $prog->{'_episode'}. Series
# and episode numbering are parsed out of the text and eventually made
# available in the <episode-num> element, being stored in intermediary
# variables during processing as when parsing the $prog->{'_sub_title'}.
# With most numbering being parsed out of $prog->{'_sub_title'} directly
# from the source data, this routine will extract most numbering inserted
# through the title/episode update/consistency routines.
#
sub extract_numbering_from_episode {
my $prog = shift;
if (defined $prog->{'_episode'}
&& $prog->{'_episode'} =~ m/\d+|episode/i) {
# ) check for "x/y" format covering following formats
#
# "1/6 - ..."
# "1/6, series 1 - ..."
# "1, series 1 - ..."
# "1/6, series one - "...
# "1, series one - ..."
#
if ($prog->{'_episode'} =~
s{
^ # start at beginning of episode details
(\d+) # CAPTURE the first number(s) found ($episode_num)
\s* # ignore any whitespace
\/? # forward slash
\s* # ignore any whitespace
(\d+)? # CAPTURE the second number(s) found ($num_episodes)
\s* # ignore any whitespace
(?:,)? # check for punctuation characters
\s* # ignore any whitespace
(?:series\s*(\w+|\d+))? # check for series number information ($series_num)
\s* # ignore any whitespace
(?:-) # hyphen to separate numbering from episode text
\s* # ignore any whitespace
}
{}ix ) {
$prog->{'_episode_num'} = $1 - 1;
# Check that source episode number is not greater than number of episodes
# Rather than discard the episode number, we discard the total instead which
# is more likely to be incorrect based on observation.
if (defined $2) {
if ($1 <= $2) {
$prog->{'_num_episodes'} = $2;
t(" Episode number/total found: episode $1 of $2 (subtitle, x/y)");
}
else {
t(" Bad episode total found: episode $1 of $2, discarding total (subtitle, x/y)");
}
}
else {
t(" Episode number found: episode $1 (episode, x/y)");
}
if (defined $3) {
my $digits = word_to_digit($3);
if (defined $digits and $digits > 0) {
t(" Series number found: series $digits (parsed as $3, episode, x/y)");
$prog->{'_series_num'} = $digits - 1;
}
}
$prog->{'_processed'} = 1;
}
# ) check for "Episode x" format covering following formats:
#
# "Episode 1"
# "Episode one"
#
elsif ($prog->{'_episode'} =~
s{
^ # start at beginning of episode details
(?:Episode|Ep) # ignore "Episode" text
\s* # ignore any whitespace
(\w+|\d+) # CAPTURE the first number(s) found ($episode_num)
$ # finish at end of episode details
}
{}ix ) {
my $digits = word_to_digit($1);
if (defined $digits and $digits > 0) {
t(" Episode number found: episode $digits (parsed as $1, episode, episode x)");
$prog->{'_episode_num'} = $digits - 1;
}
$prog->{'_processed'} = 1;
}
}
}
# Check for potential season numbering in title
#
sub extract_numbering_from_title {
my $prog = shift;
if ($prog->{'_title'} =~ m/Series|Season/i) {
# this regex looks for season numbering in title with
# in parentheses
#
# "Wheeler Dealers - (Series 1)"
# "Wheeler Dealers (Season 1)"
if ($prog->{'_title'} =~
m{
^ # start at beginning of title details
(.*\b[\x21-\x2F\x3F]?) # CAPTURE the title details before season numbering
\s* # ignore any whitespace
(?:,|;|:|-)? # check for optional punctuation characters
\s* # ignore any whitespace
(?:\() # opening paren
(?:Series|Season) # check for Part/Pt text
\s* # ignore any whitespace
(\d+) # CAPTURE season number
(?:,)? # ignore comma if present
\s* # ignore any whitespace
(\d+)? # CAPTURE episode number if present
\s* # ignore any whitespace
(?:\)) # closing paren
$ # finish at end of title details
}ix )
{
$prog->{'_title'} = $1;
if (defined $prog->{'_series_num'} && $prog->{'_series_num'} != $2) {
t(" Season number (" . $prog->{'_series_num'} . ") already defined. "
. "Ignoring different season number (" . $2 . ") in title.");
}
else {
t(" Season number found: Season $2 (title regex)");
$prog->{'_series_num'} = $2 - 1;
}
$prog->{'_episode_num'} = $3 - 1 if $3;
$prog->{'_processed'} = 1;
}
# this regex looks for season numbering in title without
# parentheses
#
# "Wheeler Dealers Series 1"
# "Wheeler Dealers Series 1, 3"
elsif ($prog->{'_title'} =~
m{
^ # start at beginning of title details
(.*\b[\x21-\x2F\x3F]?) # CAPTURE the title details before season numbering
\s* # ignore any whitespace
(?:,|;|:|-)? # check for optional punctuation characters
\s* # ignore any whitespace
(?:Season|Series) # check for Part/Pt text
\s* # ignore any whitespace
(\d+) # CAPTURE season number
(?:,)? # ignore comma if present
\s* # ignore any whitespace
(\d+)? # CAPTURE episode number if present
\s* # ignore any whitespace
$ # finish at end of title details
}ix )
{
$prog->{'_title'} = $1;
if (defined $prog->{'_series_num'} && $prog->{'_series_num'} != $2) {
t(" Season number (" . $prog->{'_series_num'} . ") already defined. "
. "Ignoring different season number (" . $2 . ") in title.");
}
else {
t(" Season number found: Season $2 (title regex)");
$prog->{'_series_num'} = $2 - 1;
}
$prog->{'_episode_num'} = $3 - 1 if $3;
$prog->{'_processed'} = 1;
}
}
}
# Part numbering is parsed but unused. However, when part numbering is
# seen in the text it is processed to make its format consistent.
#
# FIXME should we export part number in <episode-num> and remove
# it from the text?
#
sub extract_part_numbering_from_episode {
my $prog = shift;
if (defined $prog->{'_episode'}
&& $prog->{'_episode'} =~ m/Part|Pt|\d\s*$/i) {
# this regex looks for part numbering in parentheses
#
# "Dead Man's Eleven (Part 1)"
# "Dead Man's Eleven - (Part 1)"
# "Dead Man's Eleven - (Part 1/2)"
# "Dead Man's Eleven (Pt 1)"
# "Dead Man's Eleven - (Pt. 1)"
# "Dead Man's Eleven - (Pt. 1/2)"
if ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(.*\b[\x21-\x2F\x3F]?) # CAPTURE the episode details before part numbering
\s* # ignore any whitespace
(?:,|;|:|-)? # check for optional punctuation characters
\s* # ignore any whitespace
(?:\() # opening paren
(?:Part|Pt(?:\.)?) # check for Part/Pt text
\s* # ignore any whitespace
(\d+) # CAPTURE part number
\s* # ignore any whitespace
(?:\/\s*\d+)? # ignore any total part number
(?:\)) # closing paren
$ # finish at end of episode details
}ix )
{
t(" Part number found: part $2 (regex #1)");
$prog->{'_episode'} = $1 . " (Part " . $2 . ")";
$prog->{'_part_num'} = $2 - 1;
$prog->{'_processed'} = 1;
}
# this regex looks for part numbering with no other episode information
#
# "Part 1"
# "Part 1/3"
# "Pt 2"
# "Pt 2/3"
# "Pt. 3"
elsif ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(?:Part|Pt(?:\.)?) # check for Part/Pt text
\s* # ignore any whitespace
(\d+) # CAPTURE part number
\s* # ignore any whitespace
(?:\/\s*\d+)? # ignore any total part number
$ # finish at end of episode details
}ix )
{
t(" Part number found: part $1 (regex #2)");
$prog->{'_episode'} = "Part " . $1;
$prog->{'_part_num'} = $1 - 1;
$prog->{'_processed'} = 1;
}
# this regex looks for bare part numbering after a comma, semicolon,
# colon or hyphen
#
# "Dead Man's Eleven - Part 1"
# "Dead Man's Eleven: Part 1"
# "Dead Man's Eleven; Pt 1"
# "Dead Man's Eleven, Pt. 1"
elsif ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(.*\b[\x21-\x2F\x3F]?) # CAPTURE the episode details before part numbering
\s* # ignore any whitespace
(?:,|;|:|-) # punctuation characters
\s* # ignore any whitespace
(?:Part|Pt(?:\.)?) # check for Part/Pt text
\s* # ignore any whitespace
(\d+) # CAPTURE part number
\s* # ignore any whitespace
(?:\/\s*\d+)? # ignore any total part number
$ # finish at end of episode details
}ix )
{
t(" Part number found: part $2 (regex #3)");
$prog->{'_episode'} = $1 . " (Part " . $2 . ")";
$prog->{'_part_num'} = $2 - 1;
$prog->{'_processed'} = 1;
}
# this regex looks for part numbering immediately following episode info
#
# "Dead Man's Eleven Part 1"
# "Dead Man's Eleven Pt 1"
# "Dead Man's Eleven Pt 1/2"
# "Dead Man's Eleven Pt. 1"
elsif ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(.*\b[\x21-\x2F\x3F]?) # CAPTURE the episode details before part numbering
\s* # ignore any whitespace
(?:Part|Pt(?:\.)?) # check for Part/Pt text
\s* # ignore any whitespace
(\d+) # CAPTURE part number
\s* # ignore any whitespace
(?:\/\s*\d+)? # ignore any total part number
$ # finish at end of episode details
}ix )
{
t(" Part number found: part $2 (regex #4)");
$prog->{'_episode'} = $1 . " (Part " . $2 . ")";
$prog->{'_part_num'} = $2 - 1;
$prog->{'_processed'} = 1;
}
# this regex looks for a digit (conservatively between 1 and 6) following
# the episode details, a colon and at least one space
#
# "Dead Man's Eleven: 1"
elsif ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(.*) # CAPTURE the episode details before part numbering
\s* # ignore any whitespace
(?::) # colon
\s+ # ignore any whitespace - min 1 space
(\d{1}) # CAPTURE single digit part number between 1 and 6
$ # finish at end of episode details
}ix )
{
if ($2 ge 1 && $2 le 6) {
t(" Part number found: part $2 (regex #5, range 1-6)");
$prog->{'_episode'} = $1 . " (Part " . $2 . ")";
$prog->{'_part_num'} = $2 - 1;
$prog->{'_processed'} = 1;
}
}
# this regex looks for worded part numbering with no other episode information
#
# "Part One"
# "Pt Two"
# "Pt. Three"
elsif ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(?:Part|Pt(?:\.)?) # check for Part/Pt text
\s+ # ignore any whitespace
(\w+) # CAPTURE part number wording
$ # finish at end of episode details
}ix )
{
my $part_digits = word_to_digit($1);
if (defined $part_digits and $part_digits > 0) {
t(" Part number found: part $part_digits (regex #6, parsed as $1)");
$prog->{'_episode'} = "Part " . $part_digits;
$prog->{'_part_num'} = $part_digits - 1;
$prog->{'_processed'} = 1;
}
}
# this regex looks for bare part numbering after a comma, semicolon,
# colon or hyphen, where the numbering is given in words
#
# "Dead Man's Eleven - Part One"
# "Dead Man's Eleven: Part One"
# "Dead Man's Eleven; Pt One"
# "Dead Man's Eleven, Pt. One"
elsif ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(.*\b[\x21-\x2F\x3F]?) # CAPTURE the episode details before part numbering
\s* # ignore any whitespace
(?:,|;|:|-) # punctuation characters
\s* # ignore any whitespace
(?:Part|Pt(?:\.)?) # check for Part/Pt text
\s+ # ignore any whitespace
(\w+) # CAPTURE part number wording
$ # finish at end of episode details
}ix )
{
my $part_digits = word_to_digit($2);
if (defined $part_digits and $part_digits > 0) {
t(" Part number found: part $part_digits (regex #7, parsed as $2)");
$prog->{'_episode'} = $1 . " (Part " . $part_digits . ")";
$prog->{'_part_num'} = $part_digits - 1;
$prog->{'_processed'} = 1;
}
}
# this regex looks for worded part numbering immediately following episode info
#
# "Dead Man's Eleven Part One"
# "Dead Man's Eleven Pt One"
# "Dead Man's Eleven Pt. One"
elsif ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(.*\b[\x21-\x2F\x3F]?) # CAPTURE the episode details before part numbering
\s* # ignore any whitespace
(?:Part|Pt(?:\.)?) # check for Part/Pt text
\s* # ignore any whitespace
(\w+) # CAPTURE part number wording
$ # finish at end of episode details
}ix )
{
my $part_digits = word_to_digit($2);
if (defined $part_digits and $part_digits > 0) {
t(" Part number found: part $part_digits (regex #8, parsed as $2)");
$prog->{'_episode'} = $1 . " (Part " . $part_digits . ")";
$prog->{'_part_num'} = $part_digits - 1;
$prog->{'_processed'} = 1;
}
}
# this regex looks for worded part numbering in parentheses
#
# "Dead Man's Eleven (Part One)"
# "Dead Man's Eleven - (Part One)"
# "Dead Man's Eleven (Pt One)"
# "Dead Man's Eleven - (Pt. One)"
elsif ($prog->{'_episode'} =~
m{
^ # start at beginning of episode details
(.*\b[\x21-\x2F\x3F]?) # CAPTURE the episode details before part numbering
\s* # ignore any whitespace
(?:,|;|:|-)? # check for optional punctuation characters
\s* # ignore any whitespace
(?:\() # opening paren
(?:Part|Pt(?:\.)?) # check for Part/Pt text
\s* # ignore any whitespace
(\w+) # CAPTURE part number wording
\s* # ignore any whitespace
(?:\)) # closing paren
$ # finish at end of episode details
}ix )
{
my $part_digits = word_to_digit($2);
if (defined $part_digits and $part_digits > 0) {
t(" Part number found: part $part_digits (regex #9, parsed as $2)");
$prog->{'_episode'} = $1 . " (Part " . $part_digits . ")";
$prog->{'_part_num'} = $part_digits - 1;
$prog->{'_processed'} = 1;
}
}
# check for potential part numbering left unprocessed
#
# we do this at the end of the if-else because the (Part x) text is
# not (yet) removed from the episode details, only made consistent
elsif ($opt->{debug} && $prog->{'_episode'} =~ m/\b(Part|Pt(\.)?)(\d+|\s+\w+)/i) {
t(" Possible part numbering still seen: " . $prog->{'_episode'});
$possible_part_nums{$prog->{'_episode'}} = $prog->{'_episode'};
}
}
}
# Extract series/episode numbering found in $prog->{'_sub_title'}. Series
# and episode numbering are parsed out of the text and eventually made
# available in the <episode-num> element, being stored in intermediary
# variables during processing.
#
sub extract_numbering_from_sub_title {
my $prog = shift;
if (defined $prog->{'_sub_title'}
&& $prog->{'_sub_title'} =~ m/\d+|series|episode/i) {
# ) check for most common "x/y, series z" format first
#
# "1/6, series 1"
#
if ($prog->{'_sub_title'} =~
s{
^ # start at beginning of sub_title details
(\d+) # CAPTURE the first number(s) found ($episode_num)
\/ # forward slash
(\d+) # CAPTURE the second number(s) found ($num_episodes)
, # comma
\s # whitespace
series\s(\d+) # check for series number information ($series_num)
$ # stop at end of sub_title details
}
{}ix ) {
$prog->{'_episode_num'} = $1 - 1;
# Check that source episode number is not greater than number of episodes
# Rather than discard the episode number, we discard the total instead which
# is more likely to be incorrect based on observation.
if ($1 <= $2) {
$prog->{'_num_episodes'} = $2;
t(" Episode number/total found: episode $1 of $2 (subtitle: x/y, series z)");
}
else {
t(" Bad episode total found: episode $1 of $2, discarding total (subtitle: x/y, series z)");
}
t(" Series number found: series $3 (subtitle: x/y, series z)");
$prog->{'_series_num'} = $3 - 1;
$prog->{'_processed'} = 1;
}
# ) check for "x/y" formats covering other formats
#
# "1"
# "1/6"
# "1, series 1"
# "1/6, series 1"
# "1`/3, series 1"
# "1&2/6, series 1" - second episode unused
# "7&1, series 1&2" - second episode and series unused
# "1&2"
# "1 and 2/6, series 1"
# "1A/6, series 1" - episode "part A or B" currently unused
#
elsif ($prog->{'_sub_title'} =~
s{
^ # start at beginning of sub_title details
(\d+) # CAPTURE the first number(s) found ($episode_num)
`? # ignore any erroneous chars
(?:A|B)? # ignore optional episode "part"
\s* # ignore any whitespace
(?:(?:&|and)\s*\d+)? # check for "&2", "and 2" details relating to following episode
\s* # ignore any whitespace
\/? # forward slash
\s* # ignore any whitespace
(\d+)? # CAPTURE the second number(s) found ($num_episodes)
\s* # ignore any whitespace
(?:,)? # check for punctuation characters
\s* # ignore any whitespace
(?:series\s*(\d+)(?:&\d+)?)? # check for series number information ($series_num)
\s* # ignore any whitespace
$ # stop at end of sub_title details
}
{}ix ) {
$prog->{'_episode_num'} = $1 - 1;
# Check that source episode number is not greater than number of episodes
# Rather than discard the episode number, we discard the total instead which
# is more likely to be incorrect based on observation.
if (defined $2) {
if ($1 <= $2) {
$prog->{'_num_episodes'} = $2;
t(" Episode number/total found: episode $1 of $2 (subtitle, x/y)");
}
else {
t(" Bad episode total found: episode $1 of $2, discarding total (subtitle, x/y)");
}
}
else {
t(" Episode number found: episode $1 (subtitle, x/y)");
}
if (defined $3) {
t(" Series number found: series $3 (subtitle, x/y)");
$prog->{'_series_num'} = $3 - 1;
}
$prog->{'_processed'} = 1;
}
# ) check for special case of "x/y/z, series n" format where two parts of a series have
# been edited into a single programme for transmission. Only the first episode number
# given is output in episode-num
#
# "1/2/6, series 1"
#
elsif ($prog->{'_sub_title'} =~
s{
^ # start at beginning of sub_title details
(\d+) # CAPTURE the first number(s) found ($episode_num)
\s* # ignore any whitespace
\/ # forward slash
\s* # ignore any whitespace
(\d+) # CAPTURE the second number(s) found (unused at present)
\s* # ignore any whitespace
\/ # forward slash
\s* # ignore any whitespace
(\d+) # CAPTURE the third number(s) found ($num_episodes)
\s* # ignore any whitespace
, # check for punctuation characters
\s* # ignore any whitespace
series\s*(\d+) # check for series number information ($series_num)
\s* # ignore any whitespace
$ # stop at end of sub_title details
}
{}ix ) {
$prog->{'_episode_num'} = $1 - 1;
# Check that source episode number is not greater than number of episodes
# Rather than discard the episode number, we discard the total instead which
# is more likely to be incorrect based on observation.
if (defined $3) {
if ($1 <= $3) {
$prog->{'_num_episodes'} = $3;
t(" Episode number/total found: episode $1 of $3 (subtitle, x/y/z. series n)");
}
else {
t(" Bad episode total found: episode $1 of $3, discarding total (subtitle, x/y/z, series n)");
}
}
if (defined $4) {
t(" Series number found: series $4 (subtitle, x/y/z, series n)");
$prog->{'_series_num'} = $4 - 1;
}
$prog->{'_processed'} = 1;
}
# ) check for "Series x" format covering following formats:
#
# "Series 1"
#
elsif ($prog->{'_sub_title'} =~
s{
^ # start at beginning of sub_title details
(?:Series) # ignore "Series" text
\s* # ignore any whitespace
(\d+) # CAPTURE the first number(s) found ($series_num)
$ # finish at end of sub_title details
}
{}ix ) {
if (defined $1) {
t(" Series number found: series $1 (subtitle, series x)");
$prog->{'_series_num'} = $1 - 1;
}
$prog->{'_processed'} = 1;
}
# ) check for "Series " format where series number is missing. Here
# we remove the text from the sub_title field
#
elsif ($prog->{'_sub_title'} =~
s{
^ # start at beginning of sub_title details
Series # "Series" text
\s* # ignore any whitespace
$ # finish at end of sub_title details
}
{}ix ) {
t(" Missing series number found (subtitle, series)");
}
# ) check for "Episode x" format covering following formats:
#
# "Episode 1"
# "Episode one"
#
elsif ($prog->{'_sub_title'} =~
s{
^ # start at beginning of sub_title details
(?:Episode|Ep|Epiosde) # ignore "Episode" text
\s* # ignore any whitespace
(\w+|\d+) # CAPTURE the first number(s) found ($episode_num)
$ # finish at end of sub_title details
}
{}ix ) {
my $digits = word_to_digit($1);
if (defined $digits and $digits > 0) {
t(" Episode number found: episode $digits (parsed as $1, subtitle, episode x)");
$prog->{'_episode_num'} = $digits - 1;
}
$prog->{'_processed'} = 1;
}
}
}
sub config_stage {
my ( $stage, $conf ) = @_;
# Update encoding if seen in new-style config file
if (defined( $conf->{encoding} )) {
$xml_encoding = $conf->{encoding}[0];
}
my $result;
my $writer = new XMLTV::Configure::Writer( OUTPUT => \$result,
encoding => $xml_encoding );
$writer->start( { grabber => "$grabber_name" } );
if ($stage eq 'start') {
$writer->start_selectone( {
id => 'encoding',
title => [ [ 'Encoding', 'en' ] ],
description => [
[ "Select which output format to use",
'en' ] ],
} );
$writer->write_option( {
value => 'utf-8',
text => [ [ 'UTF-8 (Unicode)', 'en' ] ],
} );
$writer->write_option( {
value => 'iso-8859-1',
text => [ [ 'ISO-8859-1 (Latin-1)', 'en' ] ],
} );
$writer->end_selectone();
$writer->end('select-cachedir');
}
elsif ($stage eq 'select-cachedir') {
$writer->write_string( {
id => 'cachedir',
title => [ [ 'Enter the directory to store the listings cache in', 'en' ] ],
description => [
[ "$grabber_name uses a cache with files that it has already " .
"downloaded. Please specify where the cache shall be stored.",
'en' ] ],
default => $default_cachedir,
} );
$writer->end('select-title-processing');
}
elsif ($stage eq 'select-title-processing') {
$writer->start_selectone( {
id => 'title-processing',
title => [ [ 'Enable title processing?', 'en' ] ],
description => [
[ "In a bid to provide more consistent listings data, $grabber_name " .
"can further process programme and episode titles.",
'en' ] ],
} );
$writer->write_option( {
value => 'enabled',
text => [ [ 'Enable title processing', 'en' ] ],
} );
$writer->write_option( {
value => 'disabled',
text => [ [ 'Disable title processing', 'en' ] ],
} );
$writer->end_selectone();
$writer->end('select-utf8-fixups');
}
elsif ($stage eq 'select-utf8-fixups') {
$writer->start_selectone( {
id => 'utf8-fixups',
title => [ [ 'Enable UTF-8 fixups?', 'en' ] ],
description => [
[ "The source data can be processed to detect and correct any mis-encoded " .
"UTF-8 characters. Although such errors are rare, it is recommended to " .
"enable this option.",
'en' ] ],
} );
$writer->write_option( {
value => 'enabled',
text => [ [ 'Enable UTF-8 fixups', 'en' ] ],
} );
$writer->write_option( {
value => 'disabled',
text => [ [ 'Disable UTF-8 fixups', 'en' ] ],
} );
$writer->end_selectone();
$writer->end('ask-autoconfig');
}
elsif ($stage eq 'ask-autoconfig') {
my $use_lineups = ask_boolean('Do you want channels auto-configured based on your chosen platform?', 1);
if ($use_lineups) {
$writer->end('select-country');
}
else {
$writer->end('select-channels');
}
}
elsif ($stage eq 'select-country') {
$writer->start_selectone( {
id => 'country',
title => [ [ 'Choose your location', 'en' ] ],
description => [
[ "$grabber_name can use your location in " .
"order to determine which national channels to " .
"receive listings for.",
'en' ] ],
} );
$writer->write_option( {
value => 'England',
text => [ [ 'England', 'en' ] ],
} );
$writer->write_option( {
value => 'Scotland',
text => [ [ 'Scotland', 'en' ] ],
} );
$writer->write_option( {
value => 'Wales',
text => [ [ 'Wales', 'en' ] ],
} );
$writer->write_option( {
value => 'Northern Ireland',
text => [ [ 'Northern Ireland', 'en' ] ],
} );
$writer->write_option( {
value => 'Ireland',
text => [ [ 'Republic of Ireland', 'en' ] ],
} );
$writer->end_selectone();
$writer->end('select-postcode');
}
elsif ($stage eq 'select-postcode') {
my $country = $conf->{'country'}[0];
if ($country =~ m/^Ireland$/i) {
$writer->end('select-lineup');
}
else {
$writer->write_string( {
id => 'postcode',
title => [ [ 'Enter the first part of your postcode', 'en' ] ],
description => [
[ "$grabber_name can use the first part of your postcode in " .
"order to determine which regional channels to receive listings for",
'en' ] ],
default => 'W12',
} );
$writer->end('select-lineup');
}
}
elsif ($stage eq 'select-lineup') {
my $lineups_doc = parse_lineup_xml_doc( 'lineups.xml' );
my $ns = $lineups_doc->find( "//xmltv-lineup" );
LINEUP:
foreach my $lineup ($ns->get_nodelist) {
# filter available lineups by country
if ($lineup->findnodes( "availability[\@area='country']" )) {
my $country = $conf->{'country'}[0];
my $path = "availability[\@area='country'][.='$country']";
if (! $lineup->findnodes( $path ) ) {
remove_node($lineup);
next LINEUP;
}
}
else {
# include the lineup if no specific availability given
}
}
# refresh list
$ns = $lineups_doc->find( "//xmltv-lineup" );
$writer->start_selectone( {
id => 'lineup',
title => [ [ 'Select which TV platform you use', 'en' ] ],
description => [
[ "When choosing which channels to download listings for, $grabber_name " .
"can show only those channels available on your TV platform.",
'en' ] ],
} );
LINEUP:
foreach my $lineup ($ns->get_nodelist) {
my $id = $lineup->findvalue( '@id' );
my $dn = $lineup->findvalue( 'display-name' );
my $type = $lineup->findvalue( 'type' );
$writer->write_option( {
value => $id,
text => [ [ "$dn ($type)", 'en' ] ],
} );
}
$writer->end_selectone();
$writer->end( 'select-packages' );
}
#FIXME
#filter by country/postcode before determining available packages
elsif ($stage eq 'select-packages') {
my $lineup = $conf->{lineup}[0];
my $lineup_doc = parse_lineup_xml_doc("$lineup.xml");
# First check for any basic packages that should be chosen first.
# These can be FTA/FTV channels for channels available on a platform
# without a subscription, or the basic package requried for a
# subscription platform like Virgin TV.
my $basic_nodes = $lineup_doc->find( "//lineup-entry/package[\@type='basic']" );
# We include FTA/FTV channels by default and only ask a
# user to choose a basic subscription package
my %basic_subs_pkgs;
foreach my $package_node ($basic_nodes->get_nodelist) {
my $id = $package_node->textContent;
if ($id =~ m/Free-to-air/i) {
}
elsif ($id =~ m/Free-to-view/i) {
}
else {
$basic_subs_pkgs{$id} = $id;
}
}
if (%basic_subs_pkgs) {
$writer->start_selectone( {
id => 'basic-package',
title => [ [ 'Select which basic subscription package you have', 'en' ] ],
description => [
[ "Please choose from one of the following basic subscription packages",
'en' ] ],
} );
foreach my $package (keys %basic_subs_pkgs) {
$writer->write_option( {
value => $package,
text => [ [ "$package", 'en' ] ],
} );
}
$writer->end_selectone();
}
# Check for any available premium subscription packages
my $premium_sub_nodes = $lineup_doc->find( "//lineup-entry/package[\@type='subscription'][. != 'Premium']" );
my %premium_subs_pkgs;
foreach my $package_node ($premium_sub_nodes->get_nodelist) {
my $id = $package_node->textContent;
if ($id =~ m/Pay-per-view/i) { # Ignore PPV channels
}
else {
$premium_subs_pkgs{$id} = $id;
}
}
if (%premium_subs_pkgs) {
$writer->start_selectmany( {
id => 'subscription-package',
title => [ [ 'Select which packages you subscribe to', 'en' ] ],
description => [
[ "Please choose from the following subscription packagess",
'en' ] ],
} );
foreach my $package (sort keys %premium_subs_pkgs) {
$writer->write_option( {
value => $package,
text => [ [ "$package", 'en' ] ],
} );
}
$writer->end_selectmany();
}
# Check for any individual premium channels
my $premium_chan_name_nodes = $lineup_doc->find( "//lineup-entry/package[\@type='subscription'][.='Premium']/../station/name" );
my %premium_channels;
foreach my $name_node ($premium_chan_name_nodes->get_nodelist) {
my $name = $name_node->textContent;
$premium_channels{$name} = $name;
}
if (%premium_channels) {
$writer->start_selectmany( {
id => 'premium-channel',
title => [ [ 'Select which premium channels you subscribe to', 'en' ] ],
description => [
[ "Please choose from the following premium channels",
'en' ] ],
} );
foreach my $channel (sort keys %premium_channels) {
$writer->write_option( {
value => $channel,
text => [ [ "$channel", 'en' ] ],
} );
}
$writer->end_selectmany();
}
#FIXME
# currently we ignore PPV channels during config as listings for these
# are not available. We can still output them during get-lineup though.
$writer->end( 'select-full-lineups' );
}
elsif ($stage eq 'select-full-lineups') {
$writer->start_selectone( {
id => 'full-lineups',
title => [ [ 'Include unsupported channels in lineup?', 'en' ] ],
description => [
[ "When generating a lineup you " .
"can choose to include channels which are not currently " .
"supported with listings. You should choose this option " .
"if you can receive listings for these channels from elsewhere " .
"(e.g. EIT) and/or want the full channel lineup available.",
'en' ] ],
} );
$writer->write_option( {
value => 'enabled',
text => [ [ 'Include unsupported channels', 'en' ] ],
} );
$writer->write_option( {
value => 'disabled',
text => [ [ 'Exclude unsupported channels', 'en' ] ],
} );
$writer->end_selectone();
# The select-channels stage must be the last stage called
$writer->end('select-channels');
}
else {
die "Unknown stage $stage";
}
return $result;
}
sub list_channels {
my ( $conf, $opt ) = @_;
# Update encoding if seen in new-style config file
if (exists $conf->{encoding}) {
$xml_encoding = $conf->{encoding}[0];
}
my $channels = load_available_channels($conf, $opt);
# only filter available channels if we see a lineup entry
if (exists $conf->{lineup}) {
my $lineup = $conf->{lineup}[0];
my $doc = parse_lineup_xml_doc("$lineup.xml");
$channels = get_supported_lineup_channels($conf, $opt, $doc, $channels);
}
my $result = "";
my $fh = new IO::Scalar \$result;
my $oldfh = select( $fh );
my %g_args = (OUTPUT => $fh);
# Write XMLTV to $result, rather than STDOUT
my $writer = new XMLTV::Writer(%g_args, encoding => $xml_encoding);
$writer->start(\%tv_attributes);
my $sorted_channels = sort_wanted_channels_by_name($channels);
foreach my $channel (@{$sorted_channels}) {
delete $channel->{'rt_id'};
$writer->write_channel($channel);
}
$writer->end;
select( $oldfh );
$fh->close();
return $result;
}
sub list_lineups {
my $doc = parse_lineup_xml_doc( 'lineups.xml' );
return $doc->toString();
}
#FIXME - option to include all channels inc supported
sub get_lineup {
my $conf = shift;
my $opt = shift;
if (! exists $conf->{'lineup'}) {
die "Error: No lineup is configured";
}
my $lineup = $conf->{'lineup'}[0];
my $unfiltered_lineup_doc = parse_lineup_xml_doc( "$lineup.xml" );
my $filtered_lineup_doc = filter_channels($conf, $opt, $unfiltered_lineup_doc);
return pretty_print_xml($filtered_lineup_doc);
}
# Retrieve a given lineup XML file via XMLTV::Supplement and return
# a reference to the parsed XML document
#
sub parse_lineup_xml_doc {
my $file = shift;
my $string = GetSupplement("$grabber_name/lineups", "$file");
die "Error: XML lineup document 'lineups/$file' is missing or empty, cannot continue"
if (! defined $string || $string eq '');
my $xml = XML::LibXML->new;
my $doc;
eval { $doc = $xml->parse_string( $string ) };
die "Error: Could not parse XML lineup document 'lineups/$file'" if ($@);
return $doc;
}
sub get_wanted_channels_aref {
my $conf = shift;
my $opt = shift;
my $available_channels = shift;
my $lineup;
if (exists $conf->{lineup}) {
$lineup = $conf->{lineup}[0];
my $doc = parse_lineup_xml_doc("$lineup.xml");
$available_channels = get_supported_lineup_channels($conf, $opt, $doc, $available_channels);
}
else {
$available_channels = get_supported_config_channels($conf, $opt, $available_channels);
}
return sort_wanted_channels_by_name($available_channels);
}
# Parse an XML lineup document and extract a list of channels supported by the
# grabber. Return a hashref of the available channels.
sub get_supported_lineup_channels {
my $conf = shift;
my $opt = shift;
my $unfiltered_lineup_doc = shift;
my $xmltv_channels_href = shift;
my $filtered_lineup_doc = filter_channels($conf, $opt, $unfiltered_lineup_doc);
my @stations = $filtered_lineup_doc->findnodes( "//station" );
LINEUP_STATION:
foreach my $station (@stations) {
my $id = $station->findvalue( '@rfc2838' );
if (exists $xmltv_channels_href->{$id}) {
if ($opt->{'debug'}) {
say(" Channel '$id' is available in the grabber");
}
$xmltv_channels_href->{$id}{'_matched'} = 1;
next LINEUP_STATION;
}
}
# remove any channels not flagged
foreach my $id (keys %{$xmltv_channels_href}) {
unless (exists $xmltv_channels_href->{$id}{'_matched'}) {
delete $xmltv_channels_href->{$id};
}
}
if ($opt->{'debug'}) {
say(" A total of " . scalar (keys %{$xmltv_channels_href})
. " lineup entries are supported by the grabber");
}
return $xmltv_channels_href;
}
# Iterate over the lineup document and remove any channels that are unavailable
# in the configured country and postcode, or unsupported by the grabber (i.e. no
# associated XMLTV ID). Return the document containing the remaining lineup entries.
sub filter_channels {
my $conf = shift;
my $opt = shift;
my $lineup_doc = shift;
$lineup_doc = filter_channels_by_location($conf, $opt, $lineup_doc);
$lineup_doc = filter_channels_by_package($conf, $opt, $lineup_doc);
# By default, generate a full lineup including unsupported channels. Only
# remove unsupported channels if explicitly configured
if (exists $conf->{'full-lineups'} && $conf->{'full-lineups'}[0] eq 'disabled') {
$lineup_doc = filter_channels_by_xmltv_support($conf, $opt, $lineup_doc);
}
# remove nodes used for filtering purposes above from final lineup
$lineup_doc = remove_filter_ndoes_from_lineup($conf, $opt, $lineup_doc);
return $lineup_doc;
}
sub filter_channels_by_location {
my $conf = shift;
my $opt = shift;
my $lineup_doc = shift;
my @entries = $lineup_doc->findnodes( "//lineup-entry" );
LINEUP_ENTRY:
foreach my $entry (@entries) {
# keep channels that do not have availability information
if (! $entry->exists( "availability" )) {
next LINEUP_ENTRY;
}
my $matched; # have we matched this channel against our configured location?
# keep channels matched by configured postcode
my $user_postcode = $conf->{'postcode'}[0];
if (defined $user_postcode && lc $user_postcode ne 'none') {
my @area_postcodes = $entry->findnodes( "availability[\@area='postcode']" );
foreach my $postcode_node (@area_postcodes) {
my $value = $postcode_node->textContent;
my @postcodes = split /,/, $value;
foreach my $chan_postcode (@postcodes) {
if (lc $chan_postcode eq lc $user_postcode) {
$matched++;
}
}
}
}
# keep channels that are matched by country
my $user_country = $conf->{'country'}[0];
if (defined $user_country && lc $user_country ne 'none') {
my @area_countries = $entry->findnodes( "availability[\@area='country']" );
foreach my $country_node (@area_countries) {
my $value = $country_node->textContent;
my @countries = split /,/, $value;
foreach my $chan_country (@countries) {
if (lc $chan_country eq lc $user_country) {
$matched++;
}
}
}
}
# remove this channel if we haven't matched it
remove_node($entry) unless $matched;
}
return $lineup_doc;
}
sub filter_channels_by_package {
my $conf = shift;
my $opt = shift;
my $lineup_doc = shift;
my @entries = $lineup_doc->findnodes( "//lineup-entry" );
my $conf_basic_pkg = $conf->{'basic-package'}[0];
LINEUP_ENTRY:
foreach my $entry (@entries) {
my $matched; # have we matched this channel against our configured packages?
# FTA/FTV and basic packages
my @basic_pkg_nodes = $entry->findnodes( "package[\@type='basic']" );
PACKAGE:
foreach my $pkg_node (@basic_pkg_nodes) {
my $pkg = $pkg_node->textContent;
if ($pkg =~ m/Free-to-air/i) {
$matched = 1;
last PACKAGE;
}
elsif ($pkg =~ m/Free-to-view/i) {
$matched = 1;
last PACKAGE;
}
# not a FTA/FTV basic package, first look for Virgin TV basic pkgs
elsif ($pkg =~ m/^(M|M\+|L|XL)$/i) {
if ($conf_basic_pkg eq 'XL') {
$matched = 1;
last PACKAGE;
}
if ($conf_basic_pkg eq 'L' && $pkg =~ m/^(M|M\+|L)$/i) {
$matched = 1;
last PACKAGE;
}
if ($conf_basic_pkg eq 'M+' && $pkg =~ m/^(M|M\+)$/i) {
$matched = 1;
last PACKAGE;
}
if ($conf_basic_pkg eq 'M' && $pkg =~ m/^M$/i) {
$matched = 1;
last PACKAGE;
}
}
# next look for UPC Ireland basic pkgs
elsif ($pkg =~ m/^(Value|Select|Select Extra|Max)$/i) {
if ($conf_basic_pkg eq 'Max') {
$matched = 1;
last PACKAGE;
}
if ($conf_basic_pkg eq 'Select Extra' && $pkg =~ m/^(Value|Select|Select Extra)$/i) {
$matched = 1;
last PACKAGE;
}
if ($conf_basic_pkg eq 'Select' && $pkg =~ m/^(Value|Select)$/i) {
$matched = 1;
last PACKAGE;
}
if ($conf_basic_pkg eq 'Value' && $pkg =~ m/^Value$/i) {
$matched = 1;
last PACKAGE;
}
}
else {
foreach my $conf_pkg (@{$conf->{'basic-package'}}) {
if ($pkg eq $conf_pkg) {
$matched = 1;
last PACKAGE;
}
}
}
}
# Subscription packages
my @subs_pkg_nodes = $entry->findnodes( "package[\@type='subscription'][. != 'Premium']" );
PACKAGE:
foreach my $pkg_node (@subs_pkg_nodes) {
my $pkg = $pkg_node->textContent;
foreach my $conf_pkg (@{$conf->{'subscription-package'}}) {
if ($pkg eq $conf_pkg) {
$matched = 1;
last PACKAGE;
}
}
}
# Premium channels
my @prem_chan_name_nodes = $entry->findnodes( "package[\@type='subscription'][.='Premium']/../station/name" );
CHANNEL:
foreach my $name_node (@prem_chan_name_nodes) {
my $name = $name_node->textContent;
foreach my $conf_name (@{$conf->{'premium-channel'}}) {
if ($name eq $conf_name) {
$matched = 1;
last CHANNEL;
}
}
}
# remove this channel if we haven't matched it
remove_node($entry) unless $matched;
}
return $lineup_doc;
}
sub filter_channels_by_xmltv_support {
my $conf = shift;
my $opt = shift;
my $lineup_doc = shift;
my @entries = $lineup_doc->findnodes( "//lineup-entry" );
LINEUP_ENTRY:
foreach my $entry (@entries) {
my $dn = $entry->findvalue( 'station/name' );
if (! $entry->exists( "station[\@rfc2838]" ) ) {
remove_node($entry);
next LINEUP_ENTRY;
}
if ($entry->exists( "station[\@rfc2838='unknown']" ) ) {
remove_node($entry);
next LINEUP_ENTRY;
}
}
return $lineup_doc;
}
sub remove_filter_ndoes_from_lineup {
my $conf = shift;
my $opt = shift;
my $lineup_doc = shift;
my @availability_nodes = $lineup_doc->findnodes( "//availability" );
foreach my $node (@availability_nodes) {
remove_node($node);
}
my @package_nodes = $lineup_doc->findnodes( "//package" );
foreach my $node (@package_nodes) {
remove_node($node);
}
return $lineup_doc;
}
sub get_supported_config_channels {
my $conf = shift;
my $opt = shift;
my $xmltv_channels_href = shift;
CONFIG_CHANNEL:
foreach my $chan_id (@{$conf->{'channel'}}) {
t(" Read channel '$chan_id'");
if (! exists $xmltv_channels_href->{$chan_id}) {
if (! $opt->{'quiet'}) {
say(" Configured channel '$chan_id' is unavailable");
}
next CONFIG_CHANNEL;
}
if ($opt->{'debug'}) {
say(" Channel '$chan_id' is available in the grabber");
}
$xmltv_channels_href->{$chan_id}{'_matched'} = 1;
next CONFIG_CHANNEL;
}
# remove any channels not flagged
foreach my $id (keys %{$xmltv_channels_href}) {
unless (exists $xmltv_channels_href->{$id}{'_matched'}) {
delete $xmltv_channels_href->{$id};
}
}
if ($opt->{'debug'}) {
say(" A total of " . scalar (keys %{$xmltv_channels_href})
. " config file channels are supported by the grabber");
}
return $xmltv_channels_href;
}
# Take a hashref of configured channel hashes and return a listref of channel
# hashes sorted in ascending display name order
sub sort_wanted_channels_by_name {
my $channels = shift;
my $sorted_channels = [];
my %chan_id_to_name;
# Only add the non-RT sourced timeshifted channels during configuration,
# otherwise the configuration could include both Radio Times-sourced
# timeshifted data, and the timeshifted data we create internally from a
#regular +0 channel
foreach my $chan_id (keys %{$channels}) {
my $chan_name = $channels->{$chan_id}{'display-name'}[0][0];
if ($chan_name !~ m/\(RT\)$/) {
$chan_id_to_name{$chan_id} = $chan_name;
}
}
# Create a sorted list of xmltv_ids in ascending order of the
# corresponding display name (case-insensitive)
my @sorted_chan_ids = sort {uc($chan_id_to_name{$a}) cmp uc($chan_id_to_name{$b})}
keys %chan_id_to_name;
foreach my $chan_id (@sorted_chan_ids) {
push @{$sorted_channels}, $channels->{$chan_id};
}
return $sorted_channels;
}
# Remove the given node object from an XML document
sub remove_node {
my $node = shift;
my $parent = $node->parentNode;
$parent->removeChild($node);
}
# Prettify XML output. Required when the XML document structure has been
# modified, as removing elements from a parsed XML file will leave line breaks
# in the toString output
sub pretty_print_xml {
my $doc = shift;
my $doc_string = $doc->toString();
$doc_string =~ s/>\s+</></g;
my $xml = XML::LibXML->new;
eval { $doc = $xml->parse_string( $doc_string ) };
die "Error: Could not parse XML string" if ($@);
return $doc->toString(1);
}
###############################################
############# DEBUG SUBROUTINES ###############
###############################################
sub t {
my ($message) = @_;
if ($opt->{debug}) {
print STDERR $message . "\n";
}
}
sub print_titles_with_colons {
if (%prog_titles) {
my @titles_colons;
my %precolon; # store the title elements that appear before and
my %postcolon; # after the first colon with the full title
foreach my $title (sort keys %prog_titles) {
if ($title =~ m/^([^:]+)\s*:\s*(.*)$/) {
push @titles_colons, $title;
push @{$precolon{$1}}, $title;
push @{$postcolon{$2}}, $title;
}
}
if (@titles_colons) {
say("\nStart of list of titles containing colons");
say(" " . $_) foreach @titles_colons;
# now store the possible fixups if we see more than 1 title having
# common pre/post colon text.
my @prefixups;
foreach my $text (sort keys %precolon) {
if (@{$precolon{$text}} > 1) {
push @prefixups, "2|" . $text
}
}
if (@prefixups) {
say("\nPossible fixups for title:episode :\n");
say($_) foreach sort @prefixups;
say("");
}
my @postfixups;
foreach my $text (sort keys %postcolon) {
if (@{$postcolon{$text}} > 1) {
push @postfixups, "3|" . $text
}
}
if (@postfixups) {
say("\nPossible fixups for episode:title :\n");
say($_) foreach sort @postfixups;
say("");
}
say("End of list of titles containing colons");
}
}
}
sub print_titles_with_hyphens {
if (%prog_titles) {
my @titles_hyphens;
my @fixups;
foreach my $title (sort keys %prog_titles) {
if ($title =~ m/\s+-\s+/) {
push @titles_hyphens, $title;
my $idx_hyphen = index($title, "-");
my $idx_colon = index($title, ":"); # -1 = no colon
# Do not suggest title fixup if colon precedes hyphen
if ($idx_colon == -1 || $idx_hyphen < $idx_colon) {
my $rec_title = $title;
$rec_title =~ s/\s+-\s+/: /;
push @fixups, "5|" . $title . "~" . $rec_title;
}
}
}
if (@titles_hyphens) {
say("\nStart of list of titles containing hyphens");
say(" " . $_) foreach @titles_hyphens;
if (@fixups) {
say("\nPossible fixups for hyphenated titles:\n");
say($_) foreach sort @fixups;
say("");
}
say("End of list of titles containing hyphens");
}
}
}
sub print_new_titles {
if (%prog_titles) {
my @titles_special;
foreach my $title (sort keys %prog_titles) {
push(@titles_special, $title) if ($title =~ m/Special\b/i);
}
if (@titles_special) {
say("\nStart of list of titles containing \"Special\"");
say(" " . $_) foreach @titles_special;
say("End of list of titles containing \"Special\"");
}
my @titles_new;
my @titles_premiere;
my @titles_finale;
my @titles_anniv;
foreach my $title (sort keys %prog_titles) {
push(@titles_new, $title) if ($title =~ m/^(All New|New)\b/i);
push(@titles_premiere, $title) if ($title =~ m/Premiere\b/i);
push(@titles_finale, $title) if ($title =~ m/Final\b/i);
push(@titles_finale, $title) if ($title =~ m/Finale/i);
push(@titles_anniv, $title) if ($title =~ m/Anniversary/i);
}
if (@titles_new || @titles_premiere || @titles_finale || @titles_anniv) {
say("\nStart of list of titles containing \"New/Premiere/Finale/etc...\"");
if (@titles_new) {
say(" " . $_) foreach @titles_new;
say("");
}
if (@titles_premiere) {
say(" " . $_) foreach @titles_premiere;
say("");
}
if (@titles_finale) {
say(" " . $_) foreach @titles_finale;
say("");
}
if (@titles_anniv) {
say(" " . $_) foreach @titles_anniv;
say("");
}
say("End of list of titles containing \"New/Premiere/Finale/etc...\"");
}
my @titles_day;
my @titles_night;
my @titles_week;
foreach my $title (sort keys %prog_titles) {
push(@titles_day, $title) if ($title =~ m/\bDay\b/i);
push(@titles_night, $title) if ($title =~ m/\bNight\b/i);
push(@titles_week, $title) if ($title =~ m/\bWeek\b/i);
}
if (@titles_day || @titles_night || @titles_week) {
say("\nStart of list of titles containing \"Day/Night/Week\"");
if (@titles_day) {
say(" " . $_) foreach @titles_day;
say("");
}
if (@titles_night) {
say(" " . $_) foreach @titles_night;
say("");
}
if (@titles_week) {
say(" " . $_) foreach @titles_week;
say("");
}
say("End of list of titles containing \"Day/Night/Week\"");
}
my @titles_christmas;
my @titles_newyear;
foreach my $title (sort keys %prog_titles) {
push(@titles_christmas, $title) if ($title =~ m/\bChristmas\b/i);
push(@titles_newyear, $title) if ($title =~ m/\bNew\s+Year/i);
}
if (@titles_christmas || @titles_newyear) {
say("\nStart of list of titles containing \"Christmas/New Year\"");
if (@titles_christmas) {
say(" " . $_) foreach @titles_christmas;
say("");
}
if (@titles_newyear) {
say(" " . $_) foreach @titles_newyear;
say("");
}
say("End of list of titles containing \"Christmas/New Year\"");
}
my @titles_bestof;
my @titles_highlights;
my @titles_results;
my @titles_top;
foreach my $title (sort keys %prog_titles) {
push(@titles_bestof, $title) if ($title =~ m/Best of\b/i);
push(@titles_highlights, $title) if ($title =~ m/Highlights/i);
push(@titles_results, $title) if ($title =~ m/Results?/i);
push(@titles_top, $title) if ($title =~ m/\bTop\b/i);
}
if (@titles_bestof || @titles_results || @titles_top) {
say("\nStart of list of titles containing \"Results/Best of/Highlights/etc...\"");
if (@titles_bestof) {
say(" " . $_) foreach @titles_bestof;
say("");
}
if (@titles_highlights) {
say(" " . $_) foreach @titles_highlights;
say("");
}
if (@titles_results) {
say(" " . $_) foreach @titles_results;
say("");
}
if (@titles_top) {
say(" " . $_) foreach @titles_top;
say("");
}
say("End of list of titles containing \"Results/Best of/Highlights/etc...\"");
}
}
}
sub print_uc_titles_post {
if (%prog_titles) {
my @titles_uc_post;
foreach my $title (sort keys %prog_titles) {
if ($title eq uc($title) && $title !~ m/^\d+$/) {
push @titles_uc_post, $title;
}
}
if (@titles_uc_post) {
say("\nStart of list of uppercase titles after processing");
say(" " . $_) foreach @titles_uc_post;
say("End of list of uppercase titles after processing");
}
}
}
sub print_title_variants {
if (%prog_titles) {
# iterate over each unique "normalised" title
my @titles_variants;
my @fixups;
foreach my $unique_title (sort keys %case_insens_titles) {
if (scalar keys %{$case_insens_titles{$unique_title}} > 1) {
my %variants;
# iterate over each actual title seen in listings
foreach my $title (sort keys %{$case_insens_titles{$unique_title}}) {
# need to remove 'count' key before genre processing later
my $title_cnt = delete $case_insens_titles{$unique_title}{$title}{'count'};
# hash lists of title variants keyed on frequency
push @{$variants{$title_cnt}}, $title;
my $line = " $title (";
# iterate over each title's genres
foreach my $genre (sort keys %{$case_insens_titles{$unique_title}{$title}}) {
# iterate over each title's channel availability by genre
foreach my $chan (sort keys %{$case_insens_titles{$unique_title}{$title}{$genre}}) {
$line .= $genre . "/" . $chan . " [" . $case_insens_titles{$unique_title}{$title}{$genre}{$chan} . " occurences], ";
}
}
$line =~ s/,\s*$//; # remove last comma
$line .= ")";
push @titles_variants, $line;
}
push @titles_variants, "";
# now find list of titles with highest freq and check if it contains
# a single entry to use in suggested fixups
my @title_freqs = sort {$a <=> $b} keys %variants;
my $highest_freq = $title_freqs[-1];
my $best_title;
if (@{$variants{$highest_freq}} == 1) {
# grab the title and remove key from $case_insens_titles{$unique_title}
$best_title = shift @{$variants{$highest_freq}};
delete $case_insens_titles{$unique_title}{$best_title};
# now iterate over remaining variations of title and generate fixups
foreach my $rem_title (keys %{$case_insens_titles{$unique_title}}) {
push @fixups, "5|" . $rem_title . "~" . $best_title;
}
}
}
}
if (@titles_variants) {
say("\nStart of possible title variations");
say(" " . $_) foreach @titles_variants;
if (@fixups) {
say("\nPossible fixups for title variations:\n");
say($_) foreach sort @fixups;
say("");
}
say("End of possible title variations");
}
}
}
sub print_titles_inc_years {
if (%prog_titles) {
my @titles_years;
foreach my $title (sort keys %prog_titles) {
if ($title =~ m/\b\d{4}\b/) {
push @titles_years, $title;
}
}
if (@titles_years) {
say("\nStart of list of titles including possible years");
say(" " . $_) foreach @titles_years;
say("End of list of titles including possible years");
}
}
}
sub print_titles_inc_bbfc_certs {
if (%film_titles) {
my @titles_certs;
foreach my $title (sort keys %film_titles) {
if ($title =~ m/\(U|PG|12A|15|18\)$/) {
push @titles_certs, $title;
}
}
if (@titles_certs) {
say("\nStart of list of film titles including possible BBFC certificates");
say(" " . $_) foreach @titles_certs;
say("End of list of film titles including possible BBFC certificates");
}
}
}
sub print_flagged_title_eps {
if (%flagged_title_eps && scalar keys %flagged_title_eps > 0) {
my %titles_to_output; # temp hash to store matches
foreach my $flagged_title (sort keys %flagged_title_eps) {
foreach my $title (sort keys %prog_titles) {
if (lc $flagged_title eq lc $title) {
$titles_to_output{$flagged_title} = $flagged_title;
}
}
}
# only output something if at least 1 matching title
if (%titles_to_output && scalar keys %titles_to_output > 0) {
say("\nStart of list of titles that may need fixing individually");
foreach my $title (sort keys %titles_to_output) {
say(" $title");
}
say("End of list of titles that may need fixing individually");
}
}
}
sub print_dotdotdot_titles {
if (%dotdotdot_titles && scalar keys %dotdotdot_titles > 0) {
my %titles_to_output; # temp hash to store matches
if (%prog_titles) {
DOTDOTDOT_TITLE:
# In %dotdotdot_titles, the key is the 'normalised' title to match,
# value is the full title to use in replacement
foreach my $dotdotdot_title (sort keys %dotdotdot_titles) {
PROG_TITLE:
foreach my $title (sort keys %prog_titles) {
# ignore title having ellipses already
next PROG_TITLE if $title =~ m/.*\.\.\.$/;
if ($title =~ m/\b\Q$dotdotdot_title\E\b/i) {
$titles_to_output{$title} = $dotdotdot_titles{$dotdotdot_title};
}
}
}
}
# only output something if at least 1 matching title
if (%titles_to_output && scalar keys %titles_to_output > 0) {
say("\nStart of list of potential \"...\" titles that may need fixing individually");
foreach my $title (sort keys %titles_to_output) {
say(" Title '$title' may need to be fixed based on fixup '$titles_to_output{$title}'");
}
say("End of list of potential \"...\" titles that may need fixing individually");
}
}
}
sub print_title_in_subtitle {
if (%title_ep_in_subtitle_fixed && scalar keys %title_ep_in_subtitle_fixed > 0) {
say("\nStart of list of programmes where title/ep was removed from sub-title field");
foreach my $prog_ref (sort keys %title_ep_in_subtitle_fixed) {
say(" $title_ep_in_subtitle_fixed{$prog_ref}->{'title'} / $title_ep_in_subtitle_fixed{$prog_ref}->{'episode'}");
}
say("\nEnd of list of programmes where title/ep was removed from sub-title field");
}
if (%title_in_subtitle_fixed && scalar keys %title_in_subtitle_fixed > 0) {
say("\nStart of list of programmes where title was removed from sub-title field");
foreach my $prog_ref (sort keys %title_in_subtitle_fixed) {
say(" $title_in_subtitle_fixed{$prog_ref}->{'title'} / $title_in_subtitle_fixed{$prog_ref}->{'episode'}");
}
say("\nEnd of list of programmes where title was removed from sub-title field");
}
if (%title_in_subtitle_notfixed && scalar keys %title_in_subtitle_notfixed > 0) {
say("\nStart of list of programmes where title is still present in sub-title field");
foreach my $prog_ref (sort keys %title_in_subtitle_notfixed) {
say(" $title_in_subtitle_notfixed{$prog_ref}->{'title'} / $title_in_subtitle_notfixed{$prog_ref}->{'episode'}");
}
say("\nEnd of list of programmes where title is still present in sub-title field");
}
if (%colon_in_subtitle && scalar keys %colon_in_subtitle > 0) {
say("\nStart of list of programmes where sub-title contains colon/hyphen");
foreach my $prog_ref (sort keys %colon_in_subtitle) {
say(" $colon_in_subtitle{$prog_ref}->{'title'} / $colon_in_subtitle{$prog_ref}->{'episode'}");
}
say("\nEnd of list of programmes where sub-title contains colon/hyphen");
}
}
sub print_categories {
if (%categories && scalar keys %categories > 0) {
say("\nStart of list of programme categories seen");
foreach my $category (sort keys %categories) {
say(" $category");
}
say("End of list of programme categories seen");
}
}
sub print_uncategorised_progs {
if (%uncategorised_progs && scalar keys %uncategorised_progs > 0) {
say("\nStart of list of uncategorised programmes");
foreach my $title (sort keys %uncategorised_progs) {
say(" $title");
}
say("End of list of uncategorised programmes");
}
}
sub print_cats_per_prog {
if (%cats_per_prog) {
my @titles_cats;
my @fixups;
foreach my $title (sort keys %cats_per_prog) {
if (scalar keys %{$cats_per_prog{$title}} > 1) {
push @titles_cats, " '" . $title . "' is categorised as:";
my $best_cat_cnt = 1;
my $best_cat = '';
foreach my $cat (sort keys %{$cats_per_prog{$title}}) {
push @titles_cats, " $cat (" . $cats_per_prog{$title}{$cat} . " occurences)";
if ($cats_per_prog{$title}{$cat} > $best_cat_cnt) {
$best_cat = $cat;
$best_cat_cnt = $cats_per_prog{$title}{$cat};
}
}
push @titles_cats, "";
if ($best_cat_cnt > 1) {
push @fixups, "6|" . $title . "~" . $best_cat;
}
}
}
if (@titles_cats) {
say("\nStart of programmes with multiple categories");
say(" " . $_) foreach @titles_cats;
if (@fixups) {
say("\nPossible fixups for programme categories:\n");
say($_) foreach sort @fixups;
say("");
}
say("End of programmes with multiple categories");
}
}
}
sub print_possible_prog_numbering {
if (%possible_series_nums && scalar keys %possible_series_nums > 0) {
say("\nStart of list of possible series numbering seen in listings");
foreach my $poss (sort keys %possible_series_nums) {
say(" $poss");
}
say("End of list of possible series numbering seen in listings");
}
if (%possible_episode_nums && scalar keys %possible_episode_nums > 0) {
say("\nStart of list of possible episode numbering seen in listings");
foreach my $poss (sort keys %possible_episode_nums) {
say(" $poss");
}
say("End of list of possible episode numbering seen in listings");
}
if (%possible_part_nums && scalar keys %possible_part_nums > 0) {
say("\nStart of list of possible part numbering seen in listings");
foreach my $poss (sort keys %possible_part_nums) {
say(" $poss");
}
say("End of list of possible part numbering seen in listings");
}
if (%title_text_to_remove && scalar keys %title_text_to_remove > 0) {
say("\nStart of list of titles containing \"Season\"");
foreach my $t (sort keys %title_text_to_remove) {
say(" $t");
}
say("End of list of titles containing \"Season\"");
}
}
sub print_misencoded_utf8_data {
if (%hasC27F9Fchars && scalar keys %hasC27F9Fchars > 0) {
say("\nStart of list of channels containing unhandled bytes in range [C2][7F-9F]");
foreach my $chan (sort keys %hasC27F9Fchars) {
say(" $chan ($hasC27F9Fchars{$chan})");
}
say("End of list of channels");
}
if (%hadEFBFBD && scalar keys %hadEFBFBD > 0) {
say("\nStart of list of channels containing Unicode Replacement Character");
foreach my $chan (sort keys %hadEFBFBD) {
say(" $chan ($hadEFBFBD{$chan})");
}
say("End of list of channels");
}
if (%hadC3AFC2BFC2BD && scalar keys %hadC3AFC2BFC2BD > 0) {
say("\nStart of list of channels containing double-encoded Unicode Replacement Character");
foreach my $chan (sort keys %hadC3AFC2BFC2BD) {
say(" $chan ($hadC3AFC2BFC2BD{$chan})");
}
say("End of list of channels");
}
}
sub print_empty_listings {
if (%empty_listings && scalar keys %empty_listings > 0) {
say("\nStart of list of channels providing no listings");
foreach my $chan (sort keys %empty_listings) {
say(" $chan ($empty_listings{$chan})");
}
say("End of list of channels providing no listings");
}
}
__END__
=head1 NAME
tv_grab_uk_rt - Grab TV listings for United Kingdom/Republic of Ireland
=head1 SYNOPSIS
tv_grab_uk_rt --help
tv_grab_uk_rt --version
tv_grab_uk_rt --capabilities
tv_grab_uk_rt --description
tv_grab_uk_rt [--config-file FILE]
[--days N] [--offset N]
[--output FILE] [--quiet] [--debug]
tv_grab_uk_rt --configure [--config-file FILE]
tv_grab_uk_rt --configure-api [--stage NAME]
[--config-file FILE] [--output FILE]
tv_grab_uk_rt --list-channels [--config-file FILE]
[--output FILE] [--quiet] [--debug]
tv_grab_uk_rt --list-lineups
tv_grab_uk_rt --get-lineup [--config-file FILE]
=head1 DESCRIPTION
Output TV listings in XMLTV format for many channels available in the
United Kingdom and Republic of Ireland. Source data comes from
machine-readable files made available from the Radio Times website.
=head1 USAGE
First run B<tv_grab_uk_rt --configure> to choose which channels you want to
receive listings for. Then run B<tv_grab_uk_rt> (with optional arguments) to get
around 14 days of listings for your configured channels.
=head1 OPTIONS
B<--help> Print a help message and exit.
B<--version> Show the versions of the XMLTV libraries, the grabber and of
key modules used for processing listings.
B<--capabilities> Show which capabilities the grabber supports. For more
information, see L<http://wiki.xmltv.org/index.php/XmltvCapabilities>
B<--description> Show a brief description of the grabber.
B<--config-file FILE> Specify the name of the configuration file to use.
If not specified, a default of B<~/.xmltv/tv_grab_uk_rt.conf> is used. This
is also the default file written by B<--configure> and read when grabbing.
B<--output FILE> When grabbing, write output to FILE rather than to standard
output.
B<--days N> When grabbing, grab N days of data instead of all available.
Supported values are 1-15.
B<--offset N> Start grabbing at today + N days. Supported values are 0-14.
Note that due to the format of the source data, tv_grab_uk_rt always downloads
data for all available days and then filters for days specified with --days and
--offset. Specifying --days and/or --offset in order to speed up downloads or
reduce data transfer will therefore have no effect.
B<--quiet> Suppress all progress messages normally written to standard error.
B<--debug> Provide detailed progress messages to standard error. Due to the
volume of debug information produced, it is not advised to use this option
during normal grabber use.
B<--gui OPTION> Use this option to enable a graphical interface to be used.
OPTION may be 'Tk', or left blank for the best available choice.
Additional allowed values of OPTION are 'Term' for normal terminal output
(default) and 'TermNoProgressBar' to disable the use of Term::ProgressBar.
B<--configure> Prompt for which channels/lineup to download listings for,
where to store the cache directory for retrieved listings, what character
encoding to use for output, and whether to enable programme title and bad
character fixups.
B<--list-channels> Outputs an XML document containing all channels
available to the grabber.
B<--list-lineups> Outputs an XML document containing all channel lineups
available to the grabber.
B<--get-lineup> Outputs an XML document containing the configured channel
lineup.
=head1 SOURCE DATA TERMS OF USE
All data is the copyright of the Radio Times and the use of this data is
restricted to personal use only. Commercial use of this data is forbidden.
L<http://www.radiotimes.com/>
In accessing this XML feed, you agree that you will only access its contents
for your own personal and non-commercial use and not for any commercial
or other purposes, including advertising or selling any goods or services,
including any third-party software applications available to the general public.
=head1 CHANNEL LINEUPS
A channel lineup is a list of TV and radio channels that are available in a
particular location on a particular TV platform (e.g. Freeview).
Whilst configurations containing individual "channel=..." entries are still
supported, the grabber allows a user to select their location and TV platform
at configuration time and have their channel lineup generated dynamically
at runtime. This means that if a new channel launches or a channel ceases
broadcasting, an update to the relevant lineup (stored on the XMLTV server)
will result in listings containing such channel changes without any reconfiguration by the user.
=head1 CHARACTER ENCODING
During configuration, the software asks the user to choose the character
encoding to be used for output. Currently supported encodings are UTF-8 and
ISO-8859-1.
=head1 TITLE PROCESSING
Over time, listings may contain inconsistent programme details, such as
the programme title combined with episode details for some showings of a
programme, but separate for others; or the episode title being given as the
programme title, and the programme title given as the episode title. Some
programme titles may also change slightly over time, or across channels.
Enabling title processing during configuration enables this software to
process programme titles against a list of flagged titles. The
software will correct such programme titles, which in turn should result in
better performance of PVR software which rely on consistent programme data.
Please be aware that enabling title processing will result in the grabber
taking slightly longer to complete its operation due to the extra
processing overhead.
N.B. Please note that title updates can clearly alter programme titles near
to transmission time, and it is therefore quite possible for PVR schedules
to fail if they have been configured using an old title. Whilst care is
taken to ensure title updates are made as far ahead of transmission as
possible, be aware that last minute updates can be made.
=head1 MIS-ENCODED UTF-8 SOURCE DATA
Prior to the transition of the XMLTV service to metabroadcast.com in December
2011, there was an ongoing issue with source data containing mis-encoded UTF-8
characters. Since the transition, the source data should be UTF-8 safe and
automatic processing of the data may not be required. A configuration option
is provided to permit detection and correction of such character encoding
errors and users are recommended to enable this option during configuration.
=head1 PERFORMANCE
Improvements to date and time handling in the grabber have increased performance
6-7X. Grabbing 14 days of listings with utf-8 and title fixups enabled
should take about 2 seconds per configured channel on a typical machine.
=head1 ERROR HANDLING
tv_grab_uk_rt will only terminate early if it is impossible to continue with grabbing
data. This can be due to a lack of channel configuration data, a bad/missing
configuration file, or filesystem permission problems. Running the grabber in
non-quiet mode should report why the grabber failed.
Non-fatal errors are reported during a grabber run, and can result in listings
for a channel being skipped either in part, or entirely. Progress messages
will state why data is missing when it is possible to do so. A non-zero exit
status will normally be given when the grabber has encountered problems
during listings retrieval.
=head1 ENVIRONMENT VARIABLES
The environment variable HOME can be set to change where the configuration
file is stored. All configuration is stored in $HOME/.xmltv/ by default. On
Windows it might be necessary to set HOME to a pathname containing no spaces.
The environment variable XMLTV_SUPPLEMENT can be set to change where the
supplemental XMLTV files are retrieved from. By default, the file is
retrieved from the XMLTV supplement server. See L<XMLTV::Supplement> for
more information.
If you want the grabber to use customised local copies of the supplemental
files, you should set XMLTV_SUPPLEMENT to the path of the directory containing
a tv_grab_uk_rt/ directory containing the supplement files. For example, if
your local supplement files are stored in /usr/local/share/xmltv/tv_grab_uk_rt/
you should `export XMLTV_SUPPLEMENT="/usr/local/share/xmltv/"` before running the
grabber.
=head1 RADIO LISTINGS
Ironically, the Radio Times feed does not offer listings for radio. They
have been asked about the possibility of adding radio listings, but stated
that this would require significant development effort. It has not been
ruled out entirely, but is unlikely to be added soon.
Users who would like to obtain BBC radio listings in XMLTV format are advised
to investigate a new grabber that obtains listings from the BBC Backstage
service. See L<http://wiki.xmltv.org/index.php/BBC_Backstage> for more
information.
=head1 MAILING LIST
You can subscribe to and read the XMLTV users mailing list by visiting
L<http://lists.sourceforge.net/lists/listinfo/xmltv-users>. This is a source
of help and advice for new users. A searchable archive of the list is
available at L<http://news.gmane.org/gmane.comp.tv.xmltv.general>.
=head1 SEE ALSO
L<xmltv(5)>, L<http://wiki.xmltv.org>, L<http://www.radiotimes.com/>
=head1 BUGS
If you encounter a reproducible bug, please report it on the XMLTV bug
tracker at L<http://sourceforge.net/tracker/?group_id=39046&atid=424135>,
making sure you assign the bug to the tv_grab_uk_rt category. Please check
that the bug has not already been reported.
The source data on the Radio Times website is generated daily before 0600.
Occasionally the source data may not get recreated, leaving
the source files for some (or all) channels empty. Users are encouraged
to wait at least 1 day before reporting an issue with missing listings,
as they frequently reappear in the next update or later the same day. If listings continue to
be missing from the Radio Times website, please report the fact on the XMLTV users
mailing list.
There have been several occasions in the past when the Radio Times channel index has been
missing from the Radio Times website. This file is essential to being able to
run the grabber, as it contains the list of channels having available listings
data. If this file is missing or empty, and there is no locally-cached copy of
the file, it will not be possible to run the grabber. The file usually
regenerates automatically over the course of the next day, at which point it
will be possible to run the grabber successfully.
There are no other reported ongoing issues.
=head1 AUTHOR
Since 2007 the maintainer has been Nick Morrott (knowledgejunkie at gmail dot com).
The original author was Ed Avis (ed at membled dot com). Parts of this code
were copied from tv_grab_se_swedb by Mattias Holmlund, and from the XMLTV
wiki L<http://wiki.xmltv.org/>. Regional postcode information was kindly
made available from L<http://www.ukfree.tv>.
=cut
|