/usr/lib/R/site-library/stringi/NEWS is in r-cran-stringi 1.1.2-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 | stringi package NEWS and CHANGELOG
===============================================================================
## Known bugs in ICU
* [REGEX] #147: regex look-behind assertions may fail to find a multibyte
Unicode search pattern [solved in ICU4C 57m1, see
http://bugs.icu-project.org/trac/ticket/11554]
-------------------------------------------------------------------------------
## 1.1.2 (2016-09-30) **CRAN**
* [BUGFIX] round(), snprintf() is not C++98
-------------------------------------------------------------------------------
## 1.1.1 (2016-05-25) **CRAN**
* [BUGFIX] #214: allow a regex pattern like `.*` to match an empty string.
* [BUGFIX] #210: `stri_replace_all_fixed(c("1", "NULL"), "NULL", NA)`
now results in `c("1", NA)`.
* [NEW FEATURE] #199: `stri_sub<-` now allows for ignoring `NA` locations
(a new `omit_na` argument added).
* [NEW FEATURE] #207: `stri_sub<-` now allows for substring insertions
(via `length=0`).
* [NEW FUNCTION] #124: `stri_subset<-` functions added.
* [NEW FEATURE] #216: `stri_detect`, `stri_subset`, `stri_subset<-` gained
a `negate` argument.
* [NEW FUNCTION] #175: `stri_join_list` concatenates all strings
in a list of character vectors. Useful with, e.g., `stri_extract_all_regex`,
`stri_extract_all_words` etc.
-------------------------------------------------------------------------------
## 1.0-1 (2015-10-22) **CRAN**
* [GENERAL] #88: C API is now available for use in, e.g., Rcpp packages, see
https://github.com/gagolews/ExampleRcppStringi for an example.
* [BUGFIX] #183: Floating point exception raised in `stri_sub()` and
`stri_sub<-()` when `to` or `length` was a zero-length numeric vector.
* [BUGFIX] #180: `stri_c()` warned incorrectly (recycling rule) when using more
than 2 elements.
-------------------------------------------------------------------------------
## 0.5-5 (2015-06-28) **CRAN**
* [BACKWARD INCOMPATIBILITY] `stri_install_check()` and `stri_install_icudt()`
are now deprecated. From now on they are supposed to be used only
by the `stringi` installer.
* [BUGFIX] #176: a patch for `sys/feature_tests.h` no longer included
(the original file was copyrighted by Sun Microsystems); fixed the *Compiler
or options invalid for pre-UNIX 03 X/Open applications and pre-2001 POSIX
applications* error by forcing (conditionally) `_XPG6` conformance.
* [BUGFIX] #174: `stri_paste()` did not generate any warning when
the recycling rule is violated and `sep==""`.
* [BUGFIX] #170: `icu::setDataDirectory` no longer called if our ICU source bundle
is not used (this used to cause build problems on openSUSE).
* [BUILD TIME] #169: `./configure` now tries to switch to the "standard"
C++ compiler if a C++11 one is not properly configured.
* [BUILD TIME] `configure.win` (`Biarch: TRUE`) now mimics `autoconf`'s
`AC_SUBST` and `AC_CONFIG_FILES` so that the build process is now
more similar across different platforms.
* [NEW FEATURE] `stri_info()` now also gives information about which version
of ICU4C is in use (system or bundle).
-------------------------------------------------------------------------------
## 0.5-2 (2015-06-21) **CRAN**
* [BACKWARD INCOMPATIBILITY] The second argument to `stri_pad_*()` has
been renamed `width`.
* [GENERAL] #69: `stringi` is now bundled with ICU4C 55.1.
* [NEW FUNCTIONS] `stri_extract_*_boundaries()` extract text between text
boundaries.
* [NEW FUNCTION] #46: `stri_trans_char()` is a `stringi`-flavoured
`chartr()` equivalent.
* [NEW FUNCTION] #8: `stri_width()` approximates the *width* of a string
in a more Unicodish fashion than `nchar(..., "width")`
* [NEW FEATURE] #149: `stri_pad()` and `stri_wrap()` now by default bases on
code point widths instead of the number of code points. Moreover, the default
behavior of `stri_wrap()` is now such that it does not get rid
of non-breaking, zero width, etc. spaces
* [NEW FEATURE] #133: `stri_wrap()` silently allows for `width <= 0`
(for compatibility with `strwrap()`).
* [NEW FEATURE] #139: `stri_wrap()` gained a new argument: `whitespace_only`.
* [NEW FUNCTIONS] #137: date-time formatting/parsing:
* `stri_timezone_list()` - lists all known time zone identifiers
* `stri_timezone_set()`, `stri_timezone_get()` - manage current default time zone
* `stri_timezone_info()` - basic information on a given time zone
* `stri_datetime_symbols()` - localizable date-time formatting data
* `stri_datetime_fstr()` - convert a `strptime`-like format string
to an ICU date/time format string
* `stri_datetime_format()` - convert date/time to string
* `stri_datetime_parse()` - convert string to date/time object
* `stri_datetime_create()` - construct date-time objects
from numeric representations
* `stri_datetime_now()` - return current date-time
* `stri_datetime_fields()` - get values for date-time fields
* `stri_datetime_add()` - add specific number of date-time units
to a date-time object
* [GENERAL] #144: Performance improvements in handling ASCII strings
(these affect `stri_sub()`, `stri_locate()` and other string index-based
operations)
* [GENERAL] #143: Searching for short fixed patterns (`stri_*_fixed()`) now
relies on the current `libC`'s implementation of `strchr()` and `strstr()`.
This is very fast e.g. on `glibc` utilizing the `SSE2/3/4` instruction set.
* [BUILD TIME] #141: a local copy of `icudt*.zip` may be used on package
install; see the `INSTALL` file for more information.
* [BUILD TIME] #165: the `./configure` option `--disable-icu-bundle`
forces the use of system ICU when building the package.
* [BUGFIX] locale specifiers are now normalized in a more intelligent way:
e.g. `@calendar=gregorian` expands to `DEFAULT_LOCALE@calendar=gregorian`.
* [BUGFIX] #134: `stri_extract_all_words()` did not accept `simplify=NA`.
* [BUGFIX] #132: incorrect behavior in `stri_locate_regex()` for matches
of zero lengths
* [BUGFIX] stringr/#73: `stri_wrap()` returned `CHARSXP` instead of `STRSXP`
on empty string input with `simplify=FALSE` argument.
* [BUGFIX] #164: `libicu-dev` usage used to fail on Ubuntu
(`LIBS` shall be passed after `LDFLAGS` and the list of `.o` files).
* [BUGFIX] #168: Build now fails if `icudt` is not available.
* [BUGFIX] #135: C++11 is now used by default (see the `INSTALL` file,
however) to build `stringi` from sources. This is because ICU4C uses the
`long long` type which is not part of the C++98 standard.
* [BUGFIX] #154: Dates and other objects with a custom class attribute
were not coerced to the character type correctly.
* [BUGFIX] Force ICU `u_init()` call on `stringi` dynlib load.
* [BUGFIX] #157: many overfull hboxes in the package PDF manual has been
corrected.
-------------------------------------------------------------------------------
## 0.4-1 (2014-12-11) **CRAN**
* [IMPORTANT CHANGE] `n_max` argument in `stri_split_*()` has been renamed `n`.
* [IMPORTANT CHANGE] `simplify=FALSE` in `stri_extract_all_*()` and
`stri_split_*()` now calls `stri_list2matrix()` with `fill=""`.
`fill=NA_character_` may be obtained by using `simplify=NA`.
* [IMPORTANT CHANGE, NEW FUNCTIONS] #120: `stri_extract_words` has been
renamed `stri_extract_all_words` and `stri_locate_boundaries` -
`stri_locate_all_boundaries` as well as `stri_locate_words` -
`stri_locate_all_words`. New functions are now available:
`stri_locate_first_boundaries`, `stri_locate_last_boundaries`,
`stri_locate_first_words`, `stri_locate_last_words`,
`stri_extract_first_words`, `stri_extract_last_words`.
* [IMPORTANT CHANGE] #111: `opts_regex`, `opts_collator`, `opts_fixed`, and
`opts_brkiter` can now be supplied individually via `...`.
In other words, you may now simply call e.g.
`stri_detect_regex(str, pattern, case_insensitive=TRUE)` instead of
`stri_detect_regex(str, pattern, opts_regex=stri_opts_regex(case_insensitive=TRUE))`.
* [NEW FEATURE] #110: Fixed pattern search engine's settings can
now be supplied via `opts_fixed` argument in `stri_*_fixed()`,
see `stri_opts_fixed()`. A simple (not suitable for natural language
processing) yet very fast `case_insensitive` pattern matching can be
performed now. `stri_extract_*_fixed` is again available.
* [NEW FEATURE] #23: `stri_extract_all_fixed`, `stri_count`, and
`stri_locate_all_fixed` may now also look for overlapping pattern
matches, see `?stri_opts_fixed`.
* [NEW FEATURE] #129: `stri_match_*_regex` gained a `cg_missing` argument.
* [NEW FEATURE] #117: `stri_extract_all_*()`, `stri_locate_all_*()`,
`stri_match_all_*()` gained a new argument: `omit_no_match`.
Setting it to `TRUE` makes these functions compatible with their
`stringr` equivalents.
* [NEW FEATURE] #118: `stri_wrap()` gained `indent`, `exdent`, `initial`,
and `prefix` arguments. Moreover Knuth's dynamic word wrapping algorithm
now assumes that the cost of printing the last line is zero, see #128.
* [NEW FEATURE] #122: `stri_subset()` gained an `omit_na` argument.
* [NEW FEATURE] `stri_list2matrix()` gained an `n_min` argument.
* [NEW FEATURE] #126: `stri_split()` now is also able to act
just like `stringr::str_split_fixed()`.
* [NEW FEATURE] #119: `stri_split_boundaries()` now have
`n`, `tokens_only`, and `simplify` arguments. Additionally,
`stri_extract_all_words()` is now equipped with `simplify` arg.
* [NEW FEATURE] #116: `stri_paste()` gained a new argument:
`ignore_null`. Setting it to `TRUE` makes this function more compatible
with `paste()`.
* [OTHER] #123: `useDynLib` is used to speed up symbol look-up in
the compiled dynamic library.
* [BUGFIX] #114: `stri_paste()`: could return result in an incorrect order.
* [BUGFIX] #94: Run-time errors on Solaris caused by setting
`-DU_DISABLE_RENAMING=1` -- memory allocation errors in i.a. ICU's
UnicodeString. This setting also caused some ABSan sanity check
failures within ICU code.
-------------------------------------------------------------------------------
## 0.3-1 (2014-11-06) **CRAN**
* [IMPORTANT CHANGE] #87: `%>%` overlapped with the pipe operator from
the `magrittr` package; now each operator like `%>%` has been renamed `%s>%`.
* [IMPORTANT CHANGE] #108: Now the BreakIterator (for text boundary analysis)
may be better controlled via `stri_opts_brkiter()` (see options `type`
and `locale` which aim to replace now-removed `boundary` and `locale` parameters
to `stri_locate_boundaries`, `stri_split_boundaries`, `stri_trans_totitle`,
`stri_extract_words`, `stri_locate_words`).
* [NEW FUNCTIONS] #109: `stri_count_boundaries()` and `stri_count_words()`
count the number of text boundaries in a string.
* [NEW FUNCTIONS] #41: `stri_startswith_*()` and `stri_endswith_*()`
determine whether a string starts or ends with a given pattern.
* [NEW FEATURE] #102: `stri_replace_all_*()` gained a `vectorize_all()` parameter,
which defaults to TRUE for backward compatibility.
* [NEW FUNCTION] #91: `stri_subset_*()`, a convenient and more efficient
substitute for `str[stri_detect_*(str, ...)]`, added.
* [NEW FEATURE] #100: `stri_split_fixed()`, `stri_split_charclass()`,
`stri_split_regex()`, `stri_split_coll()` gained a `tokens_only` parameter,
which defaults to `FALSE` for backward compatibility.
* [NEW FUNCTION] #105: `stri_list2matrix()` converts lists of atomic vectors
to character matrices, useful in connection with `stri_split`
and `stri_extract`.
* [NEW FEATURE] #107: `stri_split_*()` now allow setting an `omit_empty=NA` argument.
* [NEW FEATURE] #106: `stri_split()` and `stri_extract_all()` gained a `simplify`
argument (if `TRUE`, then `stri_list2matrix(..., byrow=TRUE)`
is called on the resulting list.
* [NEW FUNCTION] #77: `stri_rand_lipsum()` generates
(pseudo)random dummy *lorem ipsum* text.
* [NEW FEATURE] #98: `stri_trans_totitle()` gained a `opts_brkiter`
parameter; it indicates which ICU BreakIterator should be used when
performing case mapping.
* [NEW FEATURE] `stri_wrap()` gained a new parameter: `normalize`.
* [BUGFIX] #86: `stri_*_fixed()`, `stri_*_coll()`, and `stri_*_regex()` could
give incorrect results if one of search strings were of length 0.
* [BUGFIX] #99: `stri_replace_all()` did not use the `replacement` arg.
* [BUGFIX] #112: Some of the objects were not PROTECTed from
being garbage collected, which might have caused spontaneous SEGFAULTS.
* [BUGFIX] Some collator's options were not passed correctly to ICU services.
* [BUGFIX] Memory leaks causes as detected by
`valgrind --tool=memcheck --leak-check=full` have been removed.
* [DOCUMENTATION] Significant extensions/clean ups in the `stringi` manual.
-------------------------------------------------------------------------------
## 0.2-5 (2014-05-16) **CRAN**
* ~~icudt-dependent examples are no longer run if `icudt` is not available.~~
-------------------------------------------------------------------------------
## 0.2-4 (2014-05-15) **CRAN**
* [BUGFIX] Issues with loading of misaligned addresses in `stri_*_fixed()`.
-------------------------------------------------------------------------------
## 0.2-3 (2014-05-14) **CRAN**
* [IMPORTANT CHANGE] `stri_cmp*()` now do not allow for passing
`opts_collator=NA`. From now on, `stri_cmp_eq`, `stri_cmp_neq`,
and the new operators `%===%`, `%!==%`, `%stri===%`, and `%stri!==%`
are locale-independent operations, which base on code point comparisons.
New functions `stri_cmp_equiv` and `stri_cmp_nequiv`
(and from now on also `%==%`, `%!=%`, `%stri==%`, and `%stri!=%`)
test for canonical equivalence.
* [IMPORTANT CHANGE] `stri_*_fixed()` search functions now perform
a locale-independent exact (byte-wise, of course after conversion to UTF-8)
pattern search. All the Collator-based, locale-dependent search routines
are now available via `stri_*_coll()`. The reason for this is that
ICU USearch has currently very poor performance and in many search tasks
in fact it is sufficient to do exact pattern matching.
* [GENERAL] `stri_*_fixed` now use a tweaked Knuth-Morris-Pratt search
algorithm, which improves the search performance drastically.
* [IMPORTANT CHANGE] `stri_enc_nf*()` and `stri_enc_isnf*()` function families
have been renamed to `stri_trans_nf*()` and `stri_trans_isnf*()`,
respectively. This is because they deal with text transforming,
and not with character encoding. Moreover, all such operations may
also be performed by ICU's Transliterator (see below).
* [NEW FUNCTION] `stri_trans_general()` and `stri_trans_list()` give access
to ICU's Transliterator: may be used to perform very general
text transforms.
* [NEW FUNCTION `stri_split_boundaries()` utilizes ICU's BreakIterator
to split strings at specific text boundaries. Moreover,
stri_locate_boundaries indicates positions of these boundaries.
* [NEW FUNCTION] `stri_extract_words()` uses ICU's BreakIterator to
extract all words from a text. Additionally, `stri_locate_words`
locates start and end positions of words in a text.
* [NEW FUNCTION] `stri_pad()`, `stri_pad_left()`, `stri_pad_right()`,
and `stri_pad_both` pad a string with a specific code point.
* [NEW FUNCTION] `stri_wrap()` breaks paragraphs of text into lines.
Two algorithms (greedy and minimal raggedness) are available.
* [IMPORTANT CHANGE] `stri_*_charclass()` search functions now
rely solely on ICU's UnicodeSet patterns. All previously accepted
charclass identifiers became invalid. However, new patterns
should now be more familiar to the users (they are regex-like).
Moreover, we observe a very nice performance gain.
* [IMPORTANT CHANGE] `stri_sort()` now does not include `NA`s
in output vectors by default, for compatibility with `sort()`.
Moreover, currently none of the input vector's attributes are preserved.
* [NEW FUNCTION] `stri_unique()` extracts unique elements from
a character vector.
* [NEW FUNCTIONS] `stri_duplicated()` and `stri_duplicated_any()`
determine duplicate elements in a character vector.
* [NEW FUNCTION] `stri_replace_na()` replaces `NA`s in a character vector
with a given string, useful for emulating e.g. R's `paste()` behavior.
* [NEW FUNCTION] `stri_rand_shuffle()` generates a random permutation
of code points in a string.
* [NEW FUNCTION] `stri_rand_strings()` generates random strings.
* [NEW FUNCTIONS] New functions and binary operators for string comparison:
`stri_cmp_eq()`, `stri_cmp_neq()`, `stri_cmp_lt()`, `stri_cmp_le()`,
`stri_cmp_gt()`, `stri_cmp_ge()`, `%==%`, `%!=%`, `%<%`, `%<=%`, `%>%`, `%>=%`.
* [NEW FUNCTION] `stri_enc_mark()` reads declared encodings of character
strings as seen by `stringi`.
* [NEW FUNCTION] `stri_enc_tonative(str)` is an alias to
`stri_encode(str, NULL, NULL)`.
* [NEW FEATURE] `stri_order()` and `stri_sort()` now have an additional argument
`na_last` (defaults to `TRUE` and `NA`, respectively).
* [NEW FEATURE] `stri_replace_all_charclass()`, `stri_extract_all_charclass()`,
and `stri_locate_all_charclass()` now have a new arg, `merge`
(defaults to `FALSE` for backward-compatibility). It may be used
to e.g. replace sequences of white spaces with a single space.
* [NEW FEATURE] `stri_enc_toutf8()` now has a new `validate` arg (defaults
to FALSE for backward-compatibility). It may be used in a (rare) case
in which a user wants to fix an invalid UTF-8 byte sequence.
stri_length (among others) now detect invalid UTF-8 byte sequences.
* [NEW FEATURE] All binary operators `%???%` now also have aliases `%stri???%`.
* [GENERAL] Performance improvements in `StriContainerUTF8`
and `StriContainerUTF16` (they affect most other functions).
* [GENERAL] Significant performance improvements in `stri_join()`,
`stri_flatten()`, `stri_cmp()`, `stri_trans_to*()`, and others.
* [GENERAL] Added 3rd mirror site for our `icudt` binary distribution.
* `U_MISSING_RESOURCE_ERROR` message in `StriException` now suggests
calling `stri_install_check()`.
* [BUGFIX] UTF-8 BOMs are now silently removed from input strings.
* [BUGFIX] no more attempts to re-encode UTF-8 encoded strings
if native encoding=UTF-8 in `StriContainerUTF8`.
* [BUGFIX] possible memory leaks when throwing errors via `Rf_error()`.
* [BUGFIX] `stri_order()` and `stri_cmp()` could return incorrect results
for `opts_collator=NA`.
* [BUGFIX] `stri_sort()` did not guarantee to return strings in UTF-8.
-------------------------------------------------------------------------------
## 0.1-25 (2014-03-12) **CRAN**
* LICENSE tweaks.
* Initial CRAN release.
-------------------------------------------------------------------------------
## 0.1-24 (2014-03-11) **devel**
* Fixed bugs detected with `ASan` and `UBSan`,
e.g. fixed `CharClass::gcmask` type (`enum` -> `uint32_t`)
(reported by `UBSan`).
* Fixed array over-runs detected with `valgrind` in `string8.h`.
* Fixed unitialized class fields in `StriContainerUTF8`
(reported by `valgrind`).
-------------------------------------------------------------------------------
## 0.1-23 (2014-03-11) **devel**
* License changed to BSD-3-clause, COPYRIGHTS updated.
* `icudt` is not shipped with `stringi` anymore;
it is now downloaded in `install.libs.R` from one of our servers.
* New functions: `stri_install_check()`, `stri_install_icudt()`.
-------------------------------------------------------------------------------
## 0.1-22 (2014-02-20) **devel**
* System ICU is used on systems which do have one (version >= 50 needed).
ICU is autodetected with `pkg-config` in `./configure`.
Pass `'--disable-pkg-config'` to `./configure` to force building
ICU from sources.
* `icudt52b` (custom subset) is now shipped with `stringi`
(for big-endian, ASCII systems).
-------------------------------------------------------------------------------
## 0.1-21 (2014-02-19) **devel**
* Fixed some Solaris-related issues while preparing `stringi`
for CRAN submission.
-------------------------------------------------------------------------------
## 0.1-20 (2014-02-17) **devel**
* ICU4C 52.1 sources included (common, i18n, stubdata + icu52dt.dat
loaded dynamically). Compilation via Makevars.
* `stringi` now does not depend on any external libraries.
-------------------------------------------------------------------------------
## 0.1-11 (2013-11-16) **devel**
* ICU4C is now statically linked on Windows.
* First OS X binary build.
* The package is being intensively tested by our students @ FMIS WUT.
-------------------------------------------------------------------------------
## 0.1-10 (2013-11-13) **devel**
* Using `pkg-config` via `./configure` to look for ICU4C libs.
-------------------------------------------------------------------------------
## 0.1-6 (2013-07-05) **devel**
* First Windows binary build.
* Compilation passed on Oracle Sun Studio compiler collection.
* By now we have implemented most of the functionality
scheduled for milestone 0.1.
-------------------------------------------------------------------------------
## 0.1-1 (2013-01-05) **devel**
* The `stringi` project has been established on GitHub.
-------------------------------------------------------------------------------
|