/usr/share/doc/htdig-doc/html/RELEASE.html is in htdig-doc 1:3.2.0b6-12.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>
ht://Dig: Release notes
</title>
</head>
<body bgcolor="#eef7ff">
<h1>
Release notes
</h1>
<p>
ht://Dig Copyright © 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br>
Please see the file <a href="COPYING">COPYING</a> for
license information.
</p>
<hr size="4" noshade>
<p>
These are notes that go with each release of ht://Dig. There
is also a <a href="ChangeLog">ChangeLog</a> file which has
more details on the code changes.
</p>
<p>
<strong>Release notes for htdig-3.2.0b6</strong> 20 Jun 2004<br>
The next beta release of ht://Dig, 3.2.0b6, is now available.
It fixes several bugs from 3.2.0b5, and runs somewhat faster,
although still much slower than 3.1.6. (No significant speed
improvements are expected in the near future, although we are
working on it.) Calling this release a "beta" simply means
that exhausive testing, especially on non-Linux platforms, is
not yet complete. However, we consider it stable enough for
most production use.
</p>
<p>
As with 3.2.0b5, if you are upgrading
from a previous version, you should read the <a
href="upgrade.html">upgrade guide</a> first.
</p>
Bug fixes:
<ul>
<li>Correctly handle empty <code>disallow</code> entries in
robots.txt</li>
<li>No longer compile regular expressions for
every URL (improve performances)</li>
<li>Allow compressed databases on Cygwin</li>
<li>Fixed bugs in phrase searching</li>
<li>Improved parsing of the configuration file</li>
<li>bin/rundig -a handles multiple database directories</li>
<li>Ellipsis displayed correctly by htsearch</li>
<li>Allow '-' argument to '-m' ('minimal') runtime option to
htdig</li>
<li>Check validity of first URL from each server</li>
<li>No longer ignore empty configuration attributes</li>
<li>fixed bug in handling 'http_proxy', 'http_proxy_authorization',
'authorization attributes'</li>
<li>remove stale md5_db if '-i' specified</li>
<li>Make 'server_alias' case insensitive</li>
<li>fixed bugs with zlib</li>
<li>Allow &euro; HTML entity</li>
<li>fixed other minor bugs</li>
</ul>
New features:
<ul>
<li>added <a
href="attrs.html#allow_space_in_url">allow_space_in_url</a>
attribute: if set to true, htdig will handle URLs that
contain embedded spaces</li>
<li>added <a
href="attrs.html#store_phrases">store_phrases</a> attribute:
if it is false, htdig only stores the first occurrence
of each word in a document</li>
<li>added an improved version of RTF2HTML into the
contrib section</li>
<li>added <a href="http://www.openoffice.org/">OpenOffice.org</a>
support to doc2html in contrib section</li>
<li>improved date factor formula</li>
<li>improved tests</li>
<li>improved documentation</li>
<li>added man pages</li>
</ul>
<p>
<strong>Release notes for htdig-3.2.0b5</strong> 10 Nov 2003<br>
This version was slated to be 3.2.0rc1, but some final testing
is still required. It primarily fixes many bugs in 3.2.0b3, with
some limited new functionality.
As with 3.2.0b1 and 3.2.0b2, if you are upgrading
from a previous version, you should read the <a
href="upgrade.html">upgrade guide</a> first.
</p>
<ul>
<li>Fixed database bugs. Introduced zlib compression to replace
buggy internal compression.</li>
<li>Forward-ported functionality from 3.1.6
(description_meta_tag_names, use_doc_date, ignore_alt_text,
ignore_dead_servers, boolean_keywords, boolean_syntax_errors,
multimatch_factor, translate_latin1)</li>
<li>Fixed bugs in phrase searching</li>
<li>Fixed compile problems due to deprecated C++ includes</li>
<li>Fixed bugs handling double slashes in URLs</li>
<li>Suppress display of matches with weight zero</li>
<li>Fixed bugs in nesting of tags which turn off indexing</li>
</ul>
<ul>
<li>Added Native Win32 support</li>
<li>Added http_proxy_authorization attribute</li>
<li>Improved networking code, with improved cookie handling and
accept_language support</li>
<li>Implemented field-restricted searches (e.g. title:word)</li>
<li>Handle noindex_start/noindex_end as string lists</li>
<li>Implemented external converters,
text/html->text/html-internal</li>
<li>Improved support for MIME types</li>
<li>Changed licence to LGPL from GPL</li>
</ul>
<p>
<strong>Release notes for htdig-3.2.0b4</strong><br>
This beta was never issued.
</p>
<p>
<strong>Release notes for htdig-3.2.0b3</strong> 22 Feb 2001<br>
This version is still marked beta because it has still only
received limited testing and there are still revisions pending
for the 3.2 releases. However, it adds more functionality and
should address all serious bugs in the 3.2.0b2 release.
As with 3.2.0b1 and 3.2.0b2, if you are upgrading
from a previous version, you should read the <a
href="upgrade.html">upgrade guide</a> first.
</p>
<p>
<strong>Please note</strong> if you are updating from a prior
release (3.1 or 3.2), the htmerge program has changed syntax as noted
below. You will probably want to change your behavior to call
htpurge instead of htmerge after htdig as noted below.
</p>
<ul>
<li>Fixed several non-exploitable bugs in handling external
parsers or transport agents.</li>
<li>Fix bug where changes in the robots.txt would be
ignored. If a URL was indexed and later the robots.txt
changed to forbid it, the URL would be checked anyway.</li>
<li>Fixed scoring bugs introduced in 3.2.0b2.</li>
<li>Fixed a non-exploitable security issue where content-type
headers were passed incorrectly to external parsers or converters.</li>
<li>Fixed bugs in the accents fuzzy algorithm, cutting down
on the size of the accent database.</li>
<li>Fixed a bug where duplicate documents would be generated when
merging a database with itself.</li>
<li>Fixed a bug in the new regex handling for indexing limits
where large patterns could fail and would be silently ignored.</li>
<li>Fixed minor bugs with the HTTP/1.1 implementation.</li>
<li>Fix a bug where an extra config= portion of a URL would
be output when using collections.</li>
<li>Fixed a bug with content-type declarations in external parsers
with combined content-type; charset declarations.</li>
<li>Fixed a bug in the config parser that did not correctly
handle relative config <a
href="attrs.html#include">include</a> statements.</li>
<li>Fixed a bug in htfuzzy which would append to an existing
synonyms database rather than creating it anew.</li>
<li>Fixed problems with the configure script ignoring
--enable-bigfile flags.</li>
<li>Fixed problems with retrieval order--this could
potentially foul things up when limiting indexing by
hopcount.</li>
<li>Fixed some problems with the HTML in the included sample files.</li>
<li>Make the -l flag to <a href="htdig.html">htdig</a>
obsolete--this is now the default behavior -- the program
will intercept many signals and write a log file for a restart.</li>
<li>Updated database format from the mifluz/htword project.</li>
<li>Changed syntax of <a href="htmerge.html">htmerge</a>. The
program now <em>only</em> merges databases. The <a
href="htpurge.html">htpurge</a> program will "clean
up" databases after running htdig. The included
"rundig" script reflects this.</li>
<li>htload now properly loads ASCII word databases.</li>
<li>Enhanced <a
href="attrs.html#build_select_lists">build_select_lists</a>
attribute.</li>
<li>Added support for controlling the number of Page buttons
in htsearch with <a
href="attrs.html#maximum_page_buttons">maximum_page_buttons</a>.</li>
<li>Added the METADESCRIPTION htsearch template variable for
displaying the <META> description field in output along
with the normal description, instead of using the <a
href="attrs.html#use_meta_description">use_meta_description</a>
attribute.</li>
<li>Added support for permanent URL rewriting with the <a
href="attrs.html#url_rewrite_rules">url_rewrite_rules</a>
attribute. (As opposed to the <a
href="attrs.html#url_part_aliases">url_part_aliases</a>
attribute which can provide a different URL to htsearch and htdig.)</li>
<li>Added support for restricting a search to match only
documents between two dates as specified in the <a
href="hts_form.html">search form</a> as well as the <a
href="hts_templates.html">template variables</a> STARTYEAR,
STARTMONTH, STARTDAY, ENDYEAR, ENDMONTH, ENDDAY.</li>
<li>Added support for limiting duplicates based on MD5
signatures with the new attributes <a
href="attrs.html#check_unique_md5">check_unique_md5</a>, <a
href="attrs.html#check_unique_date">check_unique_date</a>, <a
href="attrs.html#md5_db">md5_db</a>.</li>
<li>The documentation has been revised to include a block:
portion to note if attributes can be included in URL or
Server blocks. See the <a href="confindex.html"
target="_top">configuration</a> documentation for more
information.</li>
<li>More attributes are set on a per-server or per-URL basis.</li>
<li>New support for nttp:// protocol.</li>
<li>Added support for auto-generating directory listings for
file:// URLs.</li>
<li>Set the default compilation to enable tests that can be
run with "make check"</li>
<li>Greatly improved htnotify program with one message per
e-mail address and support for message
templates using the new attributes <a
href="attrs.html#htnotify_webmaster">htnotify_webmaster</a>,
<a href="attrs.html#htnotify_replyto">htnotify_replyto</a>, <a
href="attrs.html#htnotify_prefix_file">htnotify_prefix_file</a>,
<a href="attrs.html#htnotify_suffix_file">htnotify_suffix_file</a>.</li>
<li>There are the usual variety of other fixes and
changes. See the <a href="ChangeLog">ChangeLog</a> for
more details.</li>
<li>Once again, a huge thank you to everyone who
contributed bug reports, fixes and patches!</li>
</ul>
<strong>Release notes for htdig-3.2.0b2</strong> 11 Apr 2000<br>
This version is still marked beta because it has still only
received limited testing. However, it adds more functionality
and should fix all known bugs in the previous 3.2.0b1 release,
including the security hole fixed in version 3.1.5 in
production versions. As with 3.2.0b1, if you are upgrading
from a previous version, you should read the <a
href="upgrade.html">upgrade guide</a> first.
</p>
<ul>
<li>Fixed several bugs in the new HTTP/1.1 implementation that would
cause problems with so-called "Chunked" data.</li>
<li>Fixed a bug in the new regex-based configuration options that
would ignore the case_sensitive attribute.</li>
<li>Fixed the robots.txt parsing to more rigorously stick to the
standard.</li>
<li>Fixed a bug where upper-case META robots directives would be
ignored.</li>
<li>Fixed a bug that could leave a connection open when it failed.</li>
<li>Fixed the timeout in the connection code to ensure that hung
connections are killed properly.</li>
<li>Fixed a bug where duplicates of modified documents could pile up
over time.</li>
<li>Fixed a bug in the SGML entity handling where numeric entities
would be ignored. (e.g. &#162; -> ¢)</li>
<li>Fixed a bug in the new configuration parser that
wouldn't accept lists including numbers</li>
<li>Fixed a potential infinite loop in the phrase
searching parser that came up when fuzzy algorithms were
used.</li>
<li>The HTML parser now ignores anything between <script> tags,
much like it does for <style> tags.</li>
<li>Fixed some performance problems in the new word database code.</li>
<li>Removed the attributes translate_quot, translate_lt, translate_gt
and translate_amp since all SGML entities are now encoded and decoded
when displayed.</li>
<li>Removed the attribute uncoded_db_compatible since the 3.2
databases are no longer compatible with previous versions anyway.</li>
<li>Removed the attribute word_list because the db.wordlist file is no
longer generated. To get an ASCII version of the database, use the
word_dump attribute.</li>
<li>Removed the pdf_parser attribute. It is now preferred to use the
external parser or external converter support with xpdf.</li>
<li>The <a
href="attrs.html#wordlist_compress">wordlist_compress</a>
attribute is now turned on by default.</li>
<li>The output from htsearch and the default and included templates
should now be more HTML-4.0 compliant.</li>
<li>Added support for searching collections of multiple
databases. To use this, supply multiple config fields or
config names separated by "|" characters. Also
see the <a
href="attrs.html#collection_names">collection_names</a> attribute.</li>
<li>Added a new accents fuzzy algorithm, which treats
accented and unaccented words the same. You must create an
<a href="attrs.html#accents_db">accents_db</a> with
htfuzzy after indexing.</li>
<li>Added new attributes <a
href="attrs.html#tcp_max_retries">tcp_max_retries</a> and
<a href="attrs.html#tcp_wait_time">tcp_wait_time</a> to
control how many times a low-level connection is retried
and how long to wait on a hung connection.</li>
<li>Add <a href="attrs.html#any_keywords">any_keywords</a>
attribute to OR the keywords field in a search form
instead of AND-ing them together.</li>
<li>Add the attributes <a
href="attrs.html#search_results_order">search_results_order</a>
and <a href="attrs.html#url_seed_score">url_seed_score</a>
to control result ranking and scoring based on URL patterns.</li>
<li>Moved the htnotify program into the new httools directory.</li>
<li>Added the programs <a href="htdump.html">htdump</a>,
<a href="htload.html">htload</a>, <a
href="htstat.html">htstat</a> and <a
href="htpurge.html">htpurge</a>.</li>
<li>There are the usual variety of other fixes and
changes. See the <a href="ChangeLog">ChangeLog</a> for
more details.</li>
<li>Once again, a huge thank you to everyone who
contributed bug reports, fixes and patches!</li>
</ul>
<p>
<strong>Release notes for htdig-3.1.5</strong> 25 Feb 2000<br>
This version cleans up some remaining bugs in the 3.1.4
release. As the latest stable release of ht://Dig, it is
recommended for all production servers.
</p>
<ul>
<li>Fixed a nasty security hole in htsearch, which would allow
users to view any file on your site that had read permission.</li>
<li>Fixed a bug that could cause problems with 8-bit
characters on some systems.</li>
<li>Made some attempts to get htsearch's output to be more HTML 4.0
compliant. It quotes all HTML tag parameters, and uses ";"
instead of "&" as parameter separator in URLs for next
pages. Reserved characters in parameters are now
encoded. Please note that this may break a variety of CGI
wrappers, for example, those written in PHP3.</li>
<li>Fixed handling of SGML entities: htdig will still decode
them to store as single characters in the database, but
htsearch now encodes some of them back for compliant results.</li>
<li>Added two new formats for variables in htsearch templates,
$%(var), which escapes the variable for a URL, and $&(var),
which HTML-escapes the variable as necessary.</li>
<li>Fixed htdig's handling of robots.txt, such that only the first
applicable User-agent field bearing its name will be used, rather
than only the last.</li>
<li>Fixed htdig's handling of servers that return 2-digit years.</li>
<li>Fixed handling of embedded quotes in quoted string lists.</li>
<li>Fixed handling of relative URLs with trailing ".." or leading
"//".</li>
<li>Fixed handling of the
<a href="attrs.html#valid_extensions">valid_extensions</a>
attribute, which sometimes failed in the previous version.</li>
<li>Enhanced the handling of local filesystem indexing with the
<a href="attrs.html#local_urls">local_urls</a>,
<a href="attrs.html#local_user_urls">local_user_urls</a> or
<a href="attrs.html#local_default_doc">local_default_doc</a>
attributes, which now allow multiple directory or file names to
be tried.</li>
<li>Added the <a
href="attrs.html#build_select_lists">build_select_lists</a>
attribute to allow the config file to specify
<select> form elements in htsearch output as a
template variable, much like $(SORT) and $(METHOD).</li>
<li>Added support for two additional configuration attributes:
<a href="attrs.html#max_keywords">max_keywords</a>, and
<a href="attrs.html#nph">nph</a>.</li>
<li>A variety of other bug fixes, and many documentation updates.
See the <a href="ChangeLog">ChangeLog</a> for details.</li>
<li>Once again, thanks to everyone who reported bugs and bug
fixes.</li>
</ul>
<p>
<strong>Release notes for htdig-3.2.0b1</strong> 4 Feb 2000<br>
This marks the first beta version of the 3.2.0 codebase,
over a year in the works. Since it has not received as much
testing as the 3.1.x series, it is *not* recommended for
production environments. A full description of how to upgrade
is provided <a href="upgrade.html">here</a>.
<blockquote><strong>NOTE:</strong> Read this document before
upgrading. You have been warned.</blockquote>
</p>
<ul>
<li>Fixed a bug in htdig where hopcounts could be calculated
incorrectly between multiple servers.</li>
<li>Fixed a bug that could cause problems with 8-bit
characters on some systems.</li>
<li>Fixed handling of unreachable servers. First, the new <a
href="attrs.html#max_retries">max_retries</a> attribute allows
htdig to attempt multiple connections. Secondly, if the server
is not available, htdig will stop trying to connect.</li>
<li>Fixed handling of SGML entities: htdig will still decode
them to store as single characters in the database, but
htsearch now encodes them back for compliant results.</li>
<li>Rewrote the database formats, allowing room for more
sophisticated searches and compression of the word database
using the new attribute <a
href="attrs.html#wordlist_compress">wordlist_compress</a>.
These changes include the removal of the word_list file
(db.wordlist) and the addition of the new <a
href="attrs.html#doc_excerpt">doc_excerpt</a> database.</li>
<li>Cleaned up many parts of the code, including the URL and
HTML parsers. Additionally, on platforms that support it, much
of the code will be built as shared libraries, which should
help memory utilization, especially under high load.</li>
<li>Removed the modification_time_is_now attribute, which is
now on by default. This means the time at indexing is taken as
the date of the document if the server does not return a
date.</li>
<li>Added the new attribute <a
href="attrs.html#use_doc_date">use_doc_date</a> to use the
date specified in a META date tag.</li>
<li>Merged all heading_factor attributes into one new
attribute, <a
href="attrs.html#heading_factor">heading_factor</a>.</li>
<li>As a result of the new database format, all _factor
attributes (like <a
href="attrs.html#title_factor">title_factor<a/> and <a
href="attrs.html#keywords_factor">keywords_factor</a> are
now dynamic--you do not have to rebuild your database to
change the scaling.</li>
<li>Changed attributes <a
href="attrs.html#bad_querystr">bad_querystr</a>, <a
href="attrs.html#exclude_urls">exclude_urls</a>, <a
href="attrs.html#limit_urls_to">limit_urls_to</a>, <a
href="attrs.html#limit_normalized">limit_normalized</a>,
<a
href="attrs.html#http_proxy_exclude">http_proxy_exclude</a>
to allow full regular expressions when the regex are
surrounded by [ and ].</li>
<li>Changed htsearch fields restrict and exclude to allow
regular expressions when the regex are surrounded by [ and
].</li>
<li>Added phrase searching support to htsearch--queries
enclosed in quotes will be checked to ensure the words
occur in that exact order in the documents.</li>
<li>Added the <a
href="attrs.html#build_select_lists">build_select_lists</a>
attribute to allow the config file to specify
<select> form elements in htsearch output as a
template variable, much like $(SORT) and $(METHOD).
<li>Added a regex fuzzy method. This will allow searches to
include regex that match words. The fuzzy method will
return up to <a
href="attrs.html#regex_max_words">regex_max_words</a> matches.</li>
<li>Added a speling [sic] fuzzy method. This attempts several
simple spelling mistakes (like transposed letters and
extra letters) to find matches. This adds the new
attribute <a
href="attrs.html#minimum_speling_length">minimum_speling_length</a>
to restrict whether small words should be
checked. Transposing letters in smaller words can give
unrelated correctly-spelled words.</li>
<li>Added support for external transport methods, using the <a
href="attrs.html#external_protocols">external_protocols</a>
attribute, an analogue of the external_parsers system.</li>
<li>Added support for HTTP/1.1, including persistent
connections. This can be configured using the new attributes <a
href="attrs.html#persistent_connections">persistent_connections</a>,
<a href="attrs.html#head_before_get">head_before_get</a>,
and <a href="attrs.html#max_connection_requests">max_connection_requests</a>.
</li>
<li>Added support for file:// URLs and support for using the
<a href="attrs.html#mime_types">mime_types</a> file to
decide whether local files are parsable.</li>
<li>Added two new formats for variables in htsearch templates,
$%(var), which escapes the variable for a URL, and $&(var),
which HTML-escapes the variable as necessary.</li>
<li>Added support for reading the list of URLs to index with
<a href="htdig.html">htdig</a> by supplying the
command-line option -.</li>
<li>Added a flag -m to <a href="htdig.html">htdig</a> to index <em>only</em> the
files given in the filename.</li>
<li>There are many more changes especially to the internal
code structure, so a huge thank you goes out to everyone
who helped make this release!
</ul>
<p>
<strong>Release notes for htdig-3.1.4</strong> 9 Dec 1999<br>
This version cleans up some remaining bugs in the 3.1.3
release. As the latest stable release of ht://Dig, it is
recommended for all production servers.
</p>
<ul>
<li>Fixed a nasty bug in URL parameter parsing, which was gobbling
up bare ampersands (&) and CGI parameter names.</li>
<li>Fixed a bug where htdig would go into an infinite loop if an
entry in <a href="attrs.html#local_urls">local_urls</a>,
<a href="attrs.html#local_user_urls">local_user_urls</a> or
<a href="attrs.html#server_aliases">server_aliases</a> was
missing the "=".</li>
<li>Fixed a bug in htsearch, where it failed when reading long
queries via the POST method.</li>
<li>Fixed a bug in htdig, where it failed to close the connection
after certain errors.</li>
<li>Fixed a bug that clobbered the hop count of initial documents.</li>
<li>Fixed bugs in HTML parser's handling of META tags. It no longer
continues indexing meta tags when indexing is turned off for the
document, and it no longer gets confused by punctuation in META
descriptions and keywords.</li>
<li>Fixed a bug in the handling of the
<a href="attrs.html#case_sensitive">case_sensitive</a>
attribute, so that it's not limited to robots.txt
parsing. Now, if false, it causes URLs to be mapped to
lowercase, to avoid mixed case duplicates as expected.</li>
<li>HTML parser now indexes text in alt parameter of img tags, and
calculates word locations more accurately than before.</li>
<li>Digging via the local filesystem can now be done even without
an HTTP server running, and a few more file types can be indexed
locally, without having to rely on the server.</li>
<li>Sender name in htnotify's e-mail messages is now quoted.</li>
<li>The <a href="attrs.html#external_parsers">external_parsers</a>
attribute is now extended to support external converters, to avoid
a lot of the complications of writing external parsers.</li>
<li>Added support for several new configuration attributes:
<a href="attrs.html#authorization">authorization</a>,
<a href="attrs.html#start_highlight">start_highlight</a>,
<a href="attrs.html#end_highlight">end_highlight</a>,
<a href="attrs.html#local_urls_only">local_urls_only</a>,
<a href="attrs.html#page_number_separator">page_number_separator</a>,
<a href="attrs.html#script_name">script_name</a>,
<a href="attrs.html#template_patterns">template_patterns</a>, and
<a href="attrs.html#valid_extensions">valid_extensions</a>.</li>
<li>The keywords input parameter to htsearch is now propagated to
followup searches, as for other input parameters.</li>
<li>The query string can now be passed to htsearch as a single
command line argument, for use in scripts.</li>
<li>Added better examples and comments in sample htdig.conf, and
added boolean match type to sample search.html form.</li>
<li>The HTML parser in htdig now turns off indexing between
<style> and </style> tags.</li>
<li>A variety of other bug fixes, and many documentation updates.
See the <a href="ChangeLog">ChangeLog</a> for details.</li>
<li>Once again, thanks to everyone who reported bugs and bug
fixes.</li>
</ul>
<p>
<strong>Release notes for htdig-3.1.3</strong> 22 Sep 1999<br>
This version fixes a number of bugs in the 3.1.2 release and
is the latest stable release of ht://Dig. It is the only version
recommended for production servers and users of all previous
versions are suggested to upgrade.
</p>
<ul>
<li>Fixed a long-standing bug where search queries containing
punctuation would not be highlighted in excerpts.</li>
<li>Fixed a bug where SGML entities inside HTML tags were not
expanded.</li>
<li>Fixed the <a
href="attrs.html#server_aliases">server_aliases</a>
attribute to default to port 80 if ommitted.
<li>Fixed a bug in URL parsing, where documents ending in the
value used for remove_default_doc were ignored. For
example, a URL ending in /left_index.html would become /.
<li>Fixed META robot parsing to correctly parse multiple
directives.</li>
<li>Fixed a coredump when generating the metaphone fuzzy
database on some systems.</li>
<li>Fixed the behavior of the <a
href="attrs.html#modification_time_is_now">modification_time_is_now</a>
attribute to work as documented.</li>
<li>Fixed the behavior of htdig to block out the
username/password set on the command-line in process
listing.</li>
<li>Fixed a bug with external parsers to prevent shell escapes
in filenames.</li>
<li>Fixed a bug on some systems, where printing a date might
crash.</li>
<li>Handles the ispell endings lists better so that suffixes
more closely match grammatical rules.</li>
<li>Changed the maximum word length to a run-time option, set
with the new attribute <a
href="attrs.html#maximum_word_length">maximum_word_length</a>.
<li>Tests for the presence of alloca.h, which would cause
problems with compiling the regex code under non-GNU
compilers.</li>
<li>Added support for <EMBED>, <OBJECT>, and
<LINK> HTML tags.
<li>A variety of other bugs were fixed, see the
<a href="ChangeLog">ChangeLog</a> for details.</li>
<li>When indexing, htdig should now attempt to index compound
words as separate words in addition to a compound word. For
example, "pdf_parser" would also be indexed as "pdf" and "parser."
<li>Once again, thanks to everyone who reported bugs and bug
fixes.</li>
</ul>
<p>
<strong>Release notes for htdig-3.1.2</strong> 21 Apr 1999<br>
This version fixes a number of bugs in the 3.1.1 release and
is the latest stable release of ht://Dig. It is highly
recommended for production servers.
</p>
<ul>
<li>Fixed a bug that ignored META description tags when they
were also added to the meta_keywords attribute.</li>
<li>Fixed the HTML comment parsing to be more lenient about
non-standard comments.</li>
<li>Fixed problems in the date-parsing code that made it Y2K
incompatible. In particular, it forgot that 2000 is a leap
year and wouldn't correctly parse dates after 29 Feb
2000.</li>
<li>Fixed a variety of bugs in the HTML parser.</li>
<li>Fixed an old bug that would exclude <strong>all</strong> URLs if
the exclude_urls attribute left empty.</li>
<li>Fixed display of META description tags. Now it always
shows the top of a description. If no description exists, it
looks for the search terms in the excerpt as usual.</li>
<li>Fixed some small memory leaks.</li>
<li>Changed the htfuzzy endings algorithm to use a more
efficient regex system. Speed improvements on non-English
languages are noted, now taking minutes for generation that
would take days!</li>
<li>Changed the noindex_start and noindex_end attributes to
allow case-insensitive matching.</li>
<li>Added on-disk versions of the builtin templates to make it
more obvious how to change the results templates.</li>
<li>Added <a href="attrs.html#date_format">date_format</a>
attribute to change the format of dates output in search results.</li>
<li>Added <a href="attrs.html#extra_word_characters">extra_word_characters</a>
attribute that defines extra characters that should be
considered part of a word, rather than punctuation.</li>
<li>Several other, relatively minor bugs were also
fixed. Many thanks to those who sent in bug reports and to
Gilles Detillieux for coordinating this release.</li>
</ul>
<p>
<strong>Release notes for htdig-3.1.1</strong> 17 Feb 1999<br>
This version cleans up some remaining bugs in the 3.1.0
release. As the latest stable release of ht://Dig, it is
recommended for all production servers.
</p>
<ul>
<li>Fixed a bug in the configure script under IRIX and Solaris 7.
</li>
<li>Fixed a minor bug with the Berkeley database code under
AlphaLinux.</li>
<li>Fixed a serious bug causing bus errors on several platforms,
notably Solaris SPARC, caused by unaligned access to database
structures.</li>
<li>Fixed some bugs in the boolean search parser.</li>
<li>Replaced the contributed parse_word_doc.pl script with a
more capable parse_doc.pl script.</li>
<li>Fixed the htnotify program to parse dates as mentioned in the
<a href="notification.html">documentation</a>.</li>
<li>Cleaned up some minor mistakes in the documentation and moved
to HTML 4.0 Transitional syntax.</li>
<li>Fixed the documentation for the <a
href="attrs.html#pdf_parser">pdf_parser</a> attribute that was
changed in version 3.1.0. This attribute must call the parser with
all command-line options.
</ul>
<p>
<strong>Release notes for htdig-3.1.0</strong> 9 Feb 1999<br>
This version marks the "full release" of version
3.1.0. Naturally, this version adds a few new feature and fixes a
large number of remaining bugs. This version is the latest stable
release of ht://Dig and is recommended for all production servers
for current bug-fixes and oft-requested
features.
</p>
<blockquote>
<p>
<strong>NOTE:</strong> You <em>must</em> rebuild
your databases from scratch after updating to this
version. Several database-related bugs were fixed and will remain
unless you rebuild from scratch. We're sorry for any
inconvenience.
</p>
</blockquote>
<ul>
<li>Fixed a variety of small memory leaks.</li>
<li>Fixed a bug that could duplicate documents in the document
databases.</li>
<li>Fixed a bug that would not remove documents marked as deleted.</li>
<li>Fixed a bug that could dump core with incorrectly defined
template_map attributes.</li>
<li>Fixed a bug that could dump core or produce bogus dates when
a server returns the date in an incorrect format.</li>
<li>Fixed a variety of string-matching bugs that caused problems
with restricting indexing and searching.</li>
<li>Fixed a bug that could dump core if logging searches and CGI
environment variables were not set.</li>
<li>Fixed a bug that would not hilight searches properly if they
contained punctuation.</li>
<li>Fixed PDF parsing to support programs beyond acroread.</li>
<li>Fixed a bug that caused problems with large robots.txt files.</li>
<li>Fixed a bug in the sample rundig script from a non-portable
test for the age of databases.</li>
<li>Fixed bugs in the fuzzy matching code that could prevent
searches from completing if fuzzy databases were not present.</li>
<li>Fixed bugs in the soundex and metaphone algorithms that
would only return the first word of several matching
words. <strong>Note</strong> that to completely fix this bug, you must
rebuild your soundex and metaphone databases.</li>
<li>Fixed up many compilation warnings and errors.</li>
<li>Fixed a performance slowdown in htsearch when
<a href="attrs.html#backlink_factor">backlink_factor</a> and
<a href="attrs.html#date_factor">date_factor</a> are zero and can
be ignored.</li>
<li>Improved performance when a server ignores the
If-Modified-Since request during update digs.</li>
<li>Added a warning message if the locale: option is set
to a locale that is not present.</li>
<li>Some minor performance improvements.</li>
<li>Allow "include" keyword in <a href="cf_general.html">config
file</a> to include other config files.</li>
<li>Uses latest (2.6.4) version of the Berkeley database.</li>
<li>Two databases may be merged together using
<a href="htmerge.html">htmerge</a>.</li>
<li>The <a href="htdig.html">htdig</a> program can be safely
stopped and restarted in the middle of a dig. The dig will write
the progress to the file specified by the new
<a href="attrs.html#url_log">url_log</a> option.</li>
<li>Added support for anchors in excerpts with the
<a href="attrs.html#add_anchors_to_excerpt">add_anchors_to_excerpt</a>
option and the ANCHOR template variable.</li>
<li>Added support for sorting results in increasing or
decreasing order of document date, size, title and score using
the <a href="hts_form.html">search form</a>. Note that changing
sort from the default of score will result in a performance
decrease.</li>
<li>Added config options <a href="attrs.html#sort">sort</a> and
<a href="attrs.html#sort_names">sort_names</a> to change the
default sort and names used in the SORT template variable.
<li>Added the option <a
href="attrs.html#compression_level">compression_level</a> to
compress the document database if the zlib library is
present.</li>
<li>Added the options
<a href="attrs.html#noindex_start">noindex_start</a> and
<a href="attrs.html#noindex_stop">noindex_stop</a> to delimit
sections of HTML documents to be ignored.</li>
<li>Added the option
<a href="attrs.html#allow_in_form">allow_in_form</a> to allow
specific config options to be set in the search form.</li>
<li>Added the option
<a href="attrs.html#bad_querystr">bad_querystr</a> to ingore URLs
containing specified CGI queries.</li>
<li>Added the option
<a href="attrs.html#search_results_wrapper">search_results_wrapper</a>
to replace separate header and footer files. For mor
information, see the <a href="hts_general.html">general
htsearch</a> documentation.</li>
<li>Added option
<a href="attrs.html#no_title_text">no_title_text</a> to allow
configuration of the text used when no title is found.</li>
<li>Added option
<a href="attrs.html#url_part_aliases">url_part_aliases</a> to allow
rewriting portions of URLs.</li>
<li>Added option
<a href="attrs.html#common_url_parts">common_url_parts</a> to
compression common portions of URLs. Requires rebuilding
databases when changed.</li>
<li>Added option
<a href="attrs.html#remove_default_doc">remove_default_doc</a> to
control whether ht://Dig strips off the default document in a
folder. Set to empty will prevent problems with servers that
treat / and /index.html as different URLs.</li>
<li>Of course there are many other bug-fixes and small
enhancements. Many thanks to everyone who reported a bug or
contributed code for this release!</li>
</ul>
<p>
<strong>Release notes for htdig-3.1.0b4</strong> 22 Dec 1998<br>
This version fixes a security hole in htnotify. The hole has been
present in previous versions but was inadevertently made worse in
the 3.1.0 beta releases. Malicious users could contstruct pages
that executed commands running under the shell of the user running
htnotify. <strong>It is highly recommended that users of previous
versions switch to this release.</strong>
</p>
<ul>
<li>Fixed a memory leak in htnotify and htsearch.</li>
<li>Updated the contributed parse_word_doc.pl script.</li>
</ul>
<p>
<strong>Release notes for htdig-3.1.0b3</strong> 15 Dec 1998<br>
This version adds only a few features and a significant number of
bug fixes. This version has been pretty thoroughly tested. Though
there are a few remaining issues, it is hoped that this will be
near the end of the beta releases before version 3.1.0. Note that
it's recommended to update your databases to eliminate the
possibility of subtle changes in the database format.
</p>
<ul>
<li>Fixed a bug which would ignore the proxy settings,
introduced in version 3.1.0b2.</li>
<li>Fixed a bug where words would remain from deleted
documents.</li>
<li>Fixed a bug where SGML < was considered part of a tag
in the HTML parser, introduced in verison 3.1.0b2.</li>
<li>Fixed a bug where empty boolean searches would dump
core.</li>
<li>Fixed a bug where boolean "and," "or," and "not" would be
removed from a search string, causing a sytnax error.</li>
<li>Fixed a bug which wouldn't keep track of the hopcounts
correctly.</li>
<li>Added support for META refresh tags, contributed by Aidas
Kasparas</li>
<li>Added support for using CGI
<a href="http://hoohoo.ncsa.uiuc.edu/cgi/">environment
variables</a> in the search templates, contributed by Gilles
Detillieux.</li>
<li>Improved memory requirements <strong>slightly</strong> through
fixing a memory leak in htdig and a general system-wide
adjustment.</li>
<li>Improved support for multiple exclude and restrict items
through htsearch, contributed by William Rhee and Gilles.</li>
<li>Improved support to compile under CygWinB20, contributed
by Klaus Mueller.</li>
<li>Upgraded to the latest version (2.5.9) of the
<a href="http://www.sleepycat.com/">Berkeley DB</a>
<li>Added a new option
<a href="attrs.html#server_wait_time">server_wait_time</a> to
give a delay between connections to a server. Currently this
can also affect local filesystem digging if set.</li>
<li>Added a new option
<a href="attrs.html#server_max_docs">server_max_docs</a> to limit
the number of documents pulled down from a server in one dig.</li>
<li>Added a new option
<a href="attrs.html#http_proxy_exclude">http_proxy_exclude</a>
to ignore the proxy setting on certain URLs.</li>
<li>Added a new option
<a href="attrs.html#no_excerpt_show_top">no_excerpt_show_top</a>to
show the top of a document when there is no excerpt.</li>
<li>Added new options
<a href="attrs.html#date_factor">date_factor</a>,
<a href="attrs.html#backlink_factor">backlink_factor</a>, and
<a href="attrs.html#description_factor">description_factor</a> to
improve search rankings. Respectively, they can give higher
rankings to more recent documents, documents with a high
number of links pointing to them, and documents with relevant
URL descriptions pointing to them. See the documentation for
more information.</li>
<li>Added a set of contributed scripts called multidig to help
work with multiple sets of URLs and databases.</li>
<li>Fixed many compilation problems under AIX, thanks to
Alexander Bergolth!</li>
<li>
Many other bugs were fixed, so a big thanks to everyone
who submitted a bug report, patch or gave other feedback! See the
<a href="ChangeLog">ChangeLog</a> for more details.
</li>
</ul>
<p>
<strong>Release notes for htdig-3.1.0b2</strong> 1 Nov 1998<br>
This version adds a few minor features as well as many
bugfixes. It is still considered beta as some bug reports have not
been fully examined.
</p>
<ul>
<li>
Fixed a <strong>major</strong> database corruption
problem. Since this bug corrupted the document databases, to
completely fix it, you will need to rebuild your databases from
scratch.
</li>
<li>
Fixed many problems with the Makefiles and configure
scripts. Using <code>./configure --prefix=</code> now works.
</li>
<li>
Added fixes for connection problems with Digital Alpha-based
systems contributed by Paul J. Meyer!
</li>
<li>
Added support for syslog-based htsearch logging. See the
<a href="attrs.html#logging">config documentation</a> for more
details. Thanks to Leo Bergolth for this!
</li>
<li>
Added fixes to work with DNS aliases (as opposed to virtual
hosts) through the
<a href="attrs.html#server_aliases">server_aliases</a> and
<a href="attrs.html#limit_normalized">limit_normalized</a> options
as contributed by Leo Bergolth.
</li>
<li>
Added cleanups of the HTML parser and the connection timeout
code contributed by René Seindal.
</li>
<li>
Now supports case insensitive servers through the
<a href="attrs.html#case_sensitive">case_sensitive</a> option.
</li>
<li>
Now supports ISO 8601 date format, using the
<a href="attrs.html#iso_8601">iso_8601</a> option.
</li>
<li>
Added a wrapper to emulate Exite for Web Servers (EWS)
contributed by John Grohol.
</li>
<li>
Added fixes to the contrib whatsnew.pl script to work with DB2
contributed by Jacques Reynes.
</li>
<li>
Added a new contributed synonyms file from John Banbury
<li>
Added a new template variable: CURRENT, the number of the
current match, from a patch by René Seindal.
<li>
Many other minor bugs were fixed, so a big thanks to everyone
who submitted a bug report or a patch! See the
<a href="ChangeLog">ChangeLog</a> for more details.
</li>
</ul>
<br>
<p>
<strong>Release notes for htdig-3.1.0b1</strong> 8 Sep
1998<br>
This version adds several major new features as well as some
bug-fixes. It is considered a beta release since it has only seen
limited testing.
</p>
<blockquote>
<p>
<font face="Helvetica" size="+1">It is <strong>
extremely</strong> important that you rebuild all your databases made
with previous versions. This version no longer uses the GDBM database
format and databases produced with it will be incompatible with other
versions. Do not blame me for anything if you didn't do this. You have
been warned...</font>
</p>
</blockquote>
<ul>
<li>
Added patches made by Pasi Eronen to support local filesystem access
</li>
<li>
Added a PDF parser contributed by Sylvain Wallez
</li>
<li>
Added support for META description and robots tags
</li>
<li>
Converted the database code to use the BerkeleyDB format, contibuted
by Esa Ahola and Jesse op den Brouw.
</li>
<li>
Added a prefix fuzzy algorithm, contributed by Esa and Jesse.
</li>
<li>
Various other bugs were fixed. Thanks for all the patches
that were sent to me and the mailing list!
</li>
</ul>
<br>
<p>
<strong>Release notes for htdig-3.0.8b2</strong> 15 Aug
1997<br>
This new version contains most of the patches that Pasi Eronen
has posted to the list plus some other random fixes.
</p>
<p>
<strong>Release notes for htdig-3.0.8b1</strong>
27-Apr-1997<br>
I consider this a beta release since I have not had time to
test everything. Use at your own risk...
</p>
<ul>
<li>
Base tag problem fixed
</li>
<li>
URL parser somewhat more robust
</li>
<li>
Date parsing bug fixed
</li>
<li>
Added Substring fuzzy algorithm.
</li>
<li>
Various other bugs were fixed. Thanks for all the patches
that were sent to me!
</li>
</ul>
<p>
<strong>Release notes for htdig-3.0.7</strong> 12-Jan-1997<br>
More bug fixes and some minor new functionality. Hopefully,
I'll be able to finish up work on version 3.1 at some point in
the near future.<br>
I have recently received some more patches for various things,
but I have not incorporated those, yet. Next version.
</p>
<ul>
<li>
The problem with the missing words has been fixed. This was
a problem in the Dictionary class.
</li>
<li>
htsearch is a *lot* faster due to a patch by Esa Ahola.
</li>
<li>
htfuzzy has some work done to it. With the addition of the
new rx-1.4 library, the endings algorithm now actually
works for languages other than English... It still takes an
awfully long time to build the tables for languages with
lots of rules.
</li>
<li>
URLs now can be of the dubious form http:foo.html I have
never seen this used and think it is bogus, but alas, it
works now.
</li>
<li>
A search form can now manually add words to any search
using the new <em>keywords</em> form attribute.
</li>
<li>
A problem in the plaintext parser used to cause bogus HTML
in search results. This has been fixed.
</li>
<li>
New documentation format. Lots of new documentation, as
well.
</li>
<li>
New robotstxt_name attribute. Used to match the
'user-agent' lines in robots.txt files.
</li>
<li>
The <base> tag is now properly supported.
</li>
<li>
Preliminary support for lots of new features, including:
<ul>
<li>
External document parsers. You'll be able to write your
own document parser for that special document type that
ht://Dig doesn't know about.
</li>
<li>
New fuzzy search algorithms: substring, regex,
globbing, etc.
</li>
</ul>
</li>
</ul>
<p>
<strong>Release notes for htdig-3.0.6</strong> 26-Oct-1996<br>
Just a single bug fix and one additional feature in this
release.
</p>
<ul>
<li>
Fixed the problem that caused frequent crashes with virtual
memory exhausted.
</li>
<li>
Added a new attribute, keywords_meta_tag_names, which
should contain a list of meta tag names for which the
content should be used as keywords. The default is set to
"keywords htdig-keywords"
</li>
</ul>
<p>
<strong>Release notes for htdig-3.0.5</strong> 13-Oct-1996<br>
This release consists of more bug fixes.<br>
I want to thank Elliot Lee <sopwith@cuc.edu> for his
help with tracking down several bugs.
</p>
<ul>
<li>
Fixed problem with accent characters. Words with SGML
entities and iso-8859-1 characters will now be indexed
correctly.
</li>
<li>
Changed the auto configuration to detect the need for a
prototype for the gethostname() function. (This was
supposed to be fixed before, but wasn't)
</li>
<li>
Reduced the memory requirements for all the programs by
changing the rehash() method in the Dictionary class.
Access to hashes may be a little slower, but the memory
requirements were reduced by a factor 10 or so.
</li>
<li>
Hopefully fixed a problem with the time related functions
on certain platforms. More checks are done to make sure the
functions that are used are actually available.
</li>
</ul>
<p>
<strong>Release notes for htdig-3.0.4</strong> 2-Sep-1996<br>
The previous version failed to build under Linux. This should
be fixed now.
</p>
<ul>
<li>
Fixed problem with the time stuff which caused the build of
htdig to fail.
</li>
<li>
Fixed a memory problem in htdig
</li>
</ul>
<p>
<strong>Release notes for htdig-3.0.3</strong> 2-Sep-1996<br>
Bugs bugs bugs... Will they <em>ever</em> all be found?
</p>
<p>
<strong>NOTE</strong>: I made extensive changes to the htdig.conf file
that gets installed. I would advise you to remove or rename
your existing htdig.conf and let the installation process
create a new one for you that you can then modify.
</p>
<p>
Also, since the rundig script has changed, you should remove
the old one before installing ht://Dig. (The installation
will refuse to overwrite existing files...)
</p>
<ul>
<li>
The problem with htsearch crashing on some machines has
been fixed.
</li>
<li>
A bug caused the <AREA> tab to be ignored. Fixed.
</li>
<li>
A bug in SunOS caused dates to be all screwed up.
</li>
<li>
Added lots of comments to the example htdig.conf file. Also
added some additional example attributes.
</li>
<li>
Fixed a bug in the installation process which caused rundig
to be created incorrectly.
</li>
<li>
Added a sample synonyms file. Also modified rundig to
create a synonyms database for it.
</li>
</ul>
<p>
<strong>Release notes for htdig-3.0.2</strong> 22-Aug-1996<br>
More bug fixes.
</p>
<ul>
<li>
Multiple start URLs now actually work. Before they were
just documented to work, but didn't actually work.
</li>
<li>
htmerge now will refuse to remove database files if it
detects that the call to /bin/sort failed.
</li>
<li>
htmerge can now tell /bin/sort to use a specific temporary
directory. This is done by setting the TMPDIR environment
variable.
</li>
<li>
htsearch can now search for words with non-ASCII characters
in them.
</li>
<li>
Added support for finding URLs in the <frame> and
<area> tags.
</li>
<li>
There is a problem with htsearch under Linux. It causes a
segmentation violation after the first search result is
displayed. Don't know what the problem is, yet.
</li>
<li>
Fixed bug in the auto configuration which always set the
value for NEED_PROTO_GETHOSTNAME to 1. For most systems
this actually needs to be 0.
</li>
<li>
<strong>Release notes for htdig-3.0.1</strong>
16-Aug-1996<br>
This is a maintenance release in response to several bug
reports.
<ul>
<li>
htdig now will display a list of errors when the
statistics option (-s) is used. The list gives the URL
that caused the error and a URL that referred to it.
Hopefully this information is useful for site
maintainers.
</li>
<li>
Some problems with the SGML character entities were
fixed. The major symptom was that the ';' that ends an
entity used to be included as well.
</li>
<li>
Major problems with htnotify were fixed. There were
many hardcoded things in this program that made it very
specific to SDSU and to me.
</li>
<li>
malloc.h should not be included anymore. All references
to it were replaced with stdlib.h instead. This should
make compiles on some platforms work better.
</li>
<li>
htsearch now will use the CONFIG_DIR environment
variable to override the compiled in default. (set in
the CONFIG file...) This was done so that htsearch can
be called from a simple wrapper that sets that
environment variable. Only the wrapper needs to be be
modified to get different CONFIG_DIR values.
</li>
</ul>
</ul>
<p>
<strong>Release notes for htdig-3.0</strong>
17-Jul-1996<br>
I decided to make this the <em>official</em> 3.0 release.
</p>
<blockquote>
<blockquote>
<font face="Helvetica" size="+1">It is <strong>
extremely</strong> important that you remove all traces
of earlier beta versions of the software before
installing this version or that you install in a
completely different location. Do not blame me for
anything if you didn't do this. You have been
warned...</font>
</blockquote>
</blockquote>
<ul>
<li>
htwrapper is no more. htsearch is now the CGI program
</li>
<li>
<a href="htsearch.html" target="_top">htsearch</a> now
uses templates to display the results. A template is
simply a piece of HTML code for a single match. The
HTML code includes variables that will be expanded to
the various items that are unique to each match, like
URL, EXCERPT, TITLE, etc. The template can be selected
at search time (through a menu). There are two builtin
templates: <code>builtin-short</code> and <tt>
builtin-long</code>. The <code>builtin-short</tt> template
just lists the stars and title while the <code>
builtin-long</code> template lists results in a similar
fashion to the way Alta Vista displays results.
</li>
<li>
Many runtime configuration options have been removed
and many new ones have been added. Check the
<a href="attrs.html">configuration file</a> documentation for
details. There are also some enhancements to the format
of the configuration file.
<ul>
<li>
Attribute values can now span multiple lines by
ending each line that needs to be continued with a
backslash ('\'). The file that is specified is read
in and all newlines and starting and trailing
whitespaces are reduced to a single space. If the
file is not found, nothing is included and no error
is flagged.<br>
Note that the backquote character is used, not the
regular quote character.
</li>
<li>
Attribute values can now include the contents of
files. Just put the filename in back-quotes. The
filename can use the normal variable expansion so
that things like:
<blockquote>
<code>someattribute: `${common_dir}/somefile`</code>
</blockquote>
</li>
</ul>
Notable attribute changes:
<ul>
<li>
All the attributes that set the heading text have
been removed. These attributes include:
<ul>
<li>
accessed_heading_text
</li>
<li>
datesize_heading_text
</li>
<li>
descriptions_heading_text
</li>
<li>
excerpt_heading_text
</li>
<li>
modified_heading_text
</li>
<li>
score_heading_text
</li>
<li>
size_heading_text
</li>
<li>
url_heading_text
</li>
<li>
wordlist_heading_text
</li>
<li>
field_order
</li>
</ul>
</li>
<li>
New attributes added:
<dl>
<dt>
<strong>http_proxy</strong>
</dt>
<dd>
Added to support the use of a HTTP proxy server
to index documents
</dd>
<dt>
<strong>locale</strong>
</dt>
<dd>
Added to support international character sets
</dd>
<dt>
<strong>match_method</strong>
</dt>
<dd>
New way of specifying if a search is an 'or',
'and', or 'boolean' search
</dd>
<dt>
<strong>matches_per_page</strong>
</dt>
<dd>
The new paged results uses this
</dd>
<dt>
<strong>max_doc_size</strong>
</dt>
<dd>
Limit the size of documents retrieved
</dd>
<dt>
<strong>next_page_text</strong>
</dt>
<dd>
Used in the navigation between pages
</dd>
<dt>
<strong>no_excerpt_text</strong>
</dt>
<dd>
Text displayed if no excerpt was available
(this used to be hard-coded)
</dd>
<dt>
<strong>no_next_page_text</strong>
</dt>
<dd>
Used in the navigation between pages
</dd>
<dt>
<strong>no_prev_page_text</strong>
</dt>
<dd>
Used in the navigation between pages
</dd>
<dt>
<strong>prev_page_text</strong>
</dt>
<dd>
Used in the navigation between pages
</dd>
<dt>
<strong>star_patterns</strong>
</dt>
<dd>
Allow different star images to be used
depending on the match URL
</dd>
<dt>
<strong>synonym_dictionary</strong>
</dt>
<dd>
Support for the new synonyms fuzzy algorithm
</dd>
<dt>
<strong>synonym_db</strong>
</dt>
<dd>
Support for the new synonyms fuzzy algorithm
</dd>
<dt>
<strong>syntax_error_file</strong>
</dt>
<dd>
HTML file displayed if there was a boolean
expression syntax error
</dd>
<dt>
<strong>template_map</strong>
</dt>
<dd>
Used in the support for the new result display
templates
</dd>
<dt>
<strong>template_name</strong>
</dt>
<dd>
Sets the default template name
</dd>
<dt>
<strong>text_factor</strong>
</dt>
<dd>
Added to allow normal text to have a variable
weight (0, for example...)
</dd>
</dl>
</li>
</ul>
<ul>
<li>
Some form tag names have changed. The list of
recognized form tags are in the
<a href="htsearch.html" target="_top">htsearch</a>
documentation.
</li>
<li>
Multiple start urls can be specified as a value to the
'start_url' attribute. This could be combined with the
file inclusion to read in a file of URLs to start with.
</li>
<li>
<a href="htdig.html">htdig</a> now sends the 'Referer:'
header in HTTP requests so that any link errors will be
logged in the server's log files.
</li>
<li>
In addition to the "htdig-keywords" META tag name,
<a href="htdig.html">htdig</a> now also supports just
"keywords". This is to make it more compatible with the
Alta Vista search engine.
</li>
<li>
The verbose display of <a href="htdig.html">htdig</a>
was enhanced to show '+' for a link that will be
followed and '-' for a link that was discarded.
</li>
<li>
<a href="htmerge.html">htmerge</a> was changed to use
the Unix sort program instead of doing its own sorting.
It no longer uses mmap() to map the words into memory.
This was causing problems on systems with limited
virtual memory available. (What??? You mean you DON'T
have at least a 1GB disk dedicated to swap???)
</li>
<li>
The Endings algorithm was fixed up to work properly
now. There were several well hidden bugs that made the
algorithm come up with illegal words.
</li>
<li>
The <strong>synonyms</strong> fuzzy algorithm was
added. This is simply a mapping of words to other
words. The input file is just a list of words which
causes the first word on a line to be mapped to the
rest of the words on that line. (We use this to map
course abbreviations to full course names)
</li>
<li>
SGML entities are now supported. They are translated to
their equivalent ISO-8859-1 encoding.
</li>
</ul>
</ul>
<p>
<strong>Release notes for htdig-3.0b5</strong>
</p>
<ul>
<li>
The configuration has changed. There is now a CONFIG
file which contains all the variables which control
where things get installed. 'make install' will now
actually attempt to set everything up with default or
example files.<br>
Note that some default directories have changed. For
example, the default configuration file location is not
/usr/local/etc/htdig.conf anymore. Instead it is now
defined in terms of CONFIG_DIR.
</li>
<li>
The htfuzzy/createDict.pl Perl program has been
obsoleted. Creating the endings database is now done by
htfuzzy itself. If you already have endings databases,
you don't need to recreate them, they will still work.
</li>
<li>
GNU rx-1.0 is now included with the distribution. This
is used by htfuzzy to create the endings databases.
</li>
<li>
The name of the whole search system has changed from
<em>HTDig</em> to <em>ht://Dig</em>.
</li>
<li>
The HTML documentation got a big facelift! This
includes the new logo for ht://Dig. (Thanks goes to
Keith Parks for the Images!)
</li>
<li>
htsearch got a new option '-r' which will allow it to
produce raw output. This output can easily parsed by a
wrapper program to produce custom HTML or other output
for the search results.
</li>
</ul>
<hr size="4" noshade>
Last modified: $Date: 2004/06/12 13:39:12 $
</body>
</html>
|