Introduce datatable.unique.names policy for duplicate handling in setnames() #4044#7647
Introduce datatable.unique.names policy for duplicate handling in setnames() #4044#7647
Conversation
|
No obvious timing issues in HEAD=issuenight Generated via commit 88085d1 Download link for the artifact containing the test results: ↓ atime-results.zip
|
| if (!length(new)) return(invisible(x)) # no changes | ||
| if (length(i) != length(new)) internal_error("length(i)!=length(new)") # nocov | ||
| } | ||
| # update the key if the column name being change is in the key |
| \item{\code{datatable.enlist}}{Experimental feature. Default is \code{NULL}. If set to a function | ||
| (e.g., \code{list}), the \code{j} expression can return a \code{list}, which will then | ||
| be "enlisted" into columns in the result.} | ||
| \item{\code{datatable.unique.names}}{A character string, default \code{"off"}. |
There was a problem hiding this comment.
it should somehow state that this currently only holds for setnames and not other functions that could create duplicate names like merge, cbind, ...
| options(datatable.unique.names = "error") | ||
| test(2366.3, setnames(copy(DT), "Petal.Length", "Sepal.Length"), error = "Duplicate column names created") | ||
| options(datatable.unique.names = "rename") | ||
| test(2366.4, names(setnames(copy(DT), "Petal.Length", "Sepal.Length")), c("Sepal.Length", "Sepal.Width", "Sepal.Length.1", "Petal.Width", "Species")) No newline at end of file |
|
|
||
| if (anyDuplicated(names_vec)) { | ||
| dups = unique(names_vec[duplicated(names_vec)]) | ||
| msg = sprintf("Duplicate column names created: %s. This may cause ambiguity.", brackify(dups)) |
There was a problem hiding this comment.
Is this problematic for a table like dt = data.table('%s'=1, b=2)?
| process_name_policy = function(names_vec) { | ||
| policy = getOption("datatable.unique.names", "off") | ||
|
|
||
| if (is.null(policy) || policy == "off") return(names_vec) |
There was a problem hiding this comment.
on a second thought NULL might be a better default for "off"
# onLoad.R
datatable.unique.names = NULL # NULL means off
| #4044 | ||
| DT = as.data.table(iris) | ||
| options(datatable.unique.names = "off") | ||
| test(2366.1, names(setnames(copy(DT), "Petal.Length", "Sepal.Length")), c("Sepal.Length", "Sepal.Width", "Sepal.Length", "Petal.Width", "Species")) |
There was a problem hiding this comment.
we now the nice options parameter in test() so it would be good to use it!
|
There are also some linters issues which need to be taken care of. We also have |

closes #4044
This PR introduces a configurable policy for handling duplicate column names created by setnames().
Changes introduced:
Added a new global option datatable.unique.names (default: "off") to preserve backward compatibility.
Supported policies:
Added a centralized helper process_name_policy() in utils.R to handle duplicate detection and enforcement.
Integrated the policy check into setnames() before reference updates to ensure keys and indices are not corrupted in "error" or "rename" modes.
hi @ben-schwen , when you have time could you please take a look?
thanks.