/usr/share/doc/enca/TODO is in enca 1.18-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | #============================================================================
# Enca v1.18 (2016-01-07) guess and convert encoding of text files
# Copyright (C) 2000-2003 David Necas (Yeti) <yeti@physics.muni.cz>
# Copyright (C) 2009-2016 Michal Cihar <michal@cihar.com>
#============================================================================
TO THE NEXT RELEASE:
(this list must be empty at the time of release)
IN FUTURE:
(should be done, but maybe not right now)
* LCUC check for cyrillic charsets.
* Backups -- like cp, mv, etc. This will be hard to get right with all the
silly converters.
* More tests
* Structured documentation (the manual page is ugly)
- keep a reasonably brief manual page
- put all the boring doc stuff somewhere else, there are possibilities:
info: searchable, has links, partly portable, has console viewers
HTML: poorly searchable, has links, most portable, has console viewers
TeX (ps): not searchable, no links, portable, most pleasant to read,
no console viewers
=> use SGML (or info itself?) and generate the others
MAYBE SOMEDAY:
(when I will have mood for it, items are freely moved here and removed again)
* Detect all-caps texts OK.
After several experiments it seems we have to
- use pair occurences, at least, with specificaly computed
difference-maximising weights
- guess in two steps
- first with uncapitalization and pair weights, and check whether the
sample looks like natural text (garbageness test, but better)
- if the first approach fails, do it as we do it now
* design better levels of verbosity/warnings (or: remove the --verbose option,
keep important messages and remove all others?)
0: only messages followed by exit(EXIT_FAILURE) (or abort()) are printed
plus `cannot convert...'
1: all nonfatal errors/warnings
2: what converters are tried, what language gets detected (do not duplicate
--details)
>2: debug
* _real_ paranoiac behaviour assuring that nothing gets lost and that
conversion output is either correctly converted text or untouched original
(requires major redesign of all the conversion stuff)
NEVER:
(you can do anything GNU GPL v2 allows, but I'll restrain)
* features that nobody needs (mm, well, ... ok, let it be)
* duplicate other tools functionality more than necessary, use them instead
* dependency on anything that is not ISO C and/or POSIX (moreover do not use
braindead features of both); important functionallity must be present
everywhere nevertheless, enca can be smaller, faster or cleverer on some
(GNU) systems
* localization; please correct my english instead ;->
* converter calling generalization (would require inlcuding the whole wordexp
thing in enca, and: launching external converter is Bad Thing(TM) anyway)
* data in run-time files (needs parser (could live with) and disallows hooks
(can't live without))
* loadable module support (it's not very portable)
-------------
KNOWN ISO C CONFLICTS:
(perhaps to be solved someday)
All constants and typedefs. They start with ENCA_ and Enca, but:
Names beginning with a capital `E' followed a digit or uppercase
letter may be used for additional error code names. [errno.h]
And additionally inside libenca (i.e. not so serious):
* libenca.h: #define EPSILON [errno.h]
* filters.c: isvbox[] [ctype.h]
* guess.c: #define isbinary [ctype.h]
* guess.c: #define istext [ctype.h]
* multibyte.c: is_valid_utf7() [ctype.h]
* multibyte.c: is_valid_utf8() [ctype.h]
Some probably can't conflict.
|