diff --git a/NEWS.md b/NEWS.md index 7cf63f0b7..6877436d5 100644 --- a/NEWS.md +++ b/NEWS.md @@ -40,6 +40,8 @@ 5. Non-equi joins combining an equality condition with two inequality conditions on the same column (e.g., `on = .(id == id, val >= lo, val <= hi)`) no longer error, [#7641](https://github.com/Rdatatable/data.table/issues/7641). The internal `chmatchdup` remapping of duplicate `rightcols` was overwriting the original column indices, causing downstream code to reference non-existent columns. Thanks @tarun-t for the report and fix, and @aitap for the diagnosis. +6. By-reference sub-assignments of strings to factor columns now _actually_ match the levels in UTF-8 when required and now don't result in invalid factors being created, [#7648](https://github.com/Rdatatable/data.table/issues/7648), amending a previous incomplete fix to [#6886](https://github.com/Rdatatable/data.table/issues/6886) in v1.17.2. Thanks @BASS-JN for the report and @aitap for the fix. + ### Notes 1. {data.table} now depends on R 3.5.0 (2018). diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index f144532cf..293eed980 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -20671,9 +20671,11 @@ DT = data.table(factor(rep("\uf8", 3))) # identical() to V1's only level but stored in a different CHARSXP samelevel = iconv(levels(DT$V1), from = "UTF-8", to = "latin1") DT[1, V1 := samelevel] -test(2311.1, nlevels(DT$V1), 1L) # used to be 2 +# used to fail to look up the new level, resulting in an invalid factor, #7648 +test(2311.1, as.integer(DT$V1), rep(1L, 3)) +test(2311.2, nlevels(DT$V1), 1L) # used to be 2 DT[1, V1 := factor("a", levels = c("a", samelevel))] -test(2311.2, nlevels(DT$V1), 2L) # used to be 3 +test(2311.3, nlevels(DT$V1), 2L) # used to be 3 # avoid translateChar*() in OpenMP threads, #6883 DF = list(rep(iconv("\uf8", from = "UTF-8", to = "latin1"), 2e6)) diff --git a/src/assign.c b/src/assign.c index 05a55cb5a..1b474072c 100644 --- a/src/assign.c +++ b/src/assign.c @@ -806,9 +806,9 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con newSourceD[i] = val==NA_INTEGER ? NA_INTEGER : -hash_lookup(marks, sourceLevelsD[val-1], 0); // retains NA factor levels here via TL(NA_STRING); e.g. ordered factor } } else { - const SEXP *sourceD = STRING_PTR_RO(source); for (int i=0; i