Skip to content

Introduce datatable.unique.names policy for duplicate handling in setnames() #4044#7647

Open
venom1204 wants to merge 3 commits intomasterfrom
issuenight
Open

Introduce datatable.unique.names policy for duplicate handling in setnames() #4044#7647
venom1204 wants to merge 3 commits intomasterfrom
issuenight

Conversation

@venom1204
Copy link
Contributor

closes #4044
This PR introduces a configurable policy for handling duplicate column names created by setnames().

Changes introduced:

Added a new global option datatable.unique.names (default: "off") to preserve backward compatibility.

Supported policies:

  • "off" – Allow duplicates silently (current behavior).
  • "warn" – Issue a warning when duplicates are created.
  • "error" – Stop execution if duplicates would be created.
  • "rename" – Automatically enforce uniqueness using make.unique().

Added a centralized helper process_name_policy() in utils.R to handle duplicate detection and enforcement.

Integrated the policy check into setnames() before reference updates to ensure keys and indices are not corrupted in "error" or "rename" modes.

  • Added validation for invalid option values with a warning and safe fallback to "off".
  • The default behavior remains unchanged, and performance is preserved in the "off" fast path.

hi @ben-schwen , when you have time could you please take a look?
thanks.

@github-actions
Copy link

github-actions bot commented Feb 25, 2026

No obvious timing issues in HEAD=issuenight
Comparison Plot

Generated via commit 88085d1

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 5 minutes and 44 seconds
Installing different package versions 11 minutes and 46 seconds
Running and plotting the test cases 4 minutes and 18 seconds

if (!length(new)) return(invisible(x)) # no changes
if (length(i) != length(new)) internal_error("length(i)!=length(new)") # nocov
}
# update the key if the column name being change is in the key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please dont delete comments

\item{\code{datatable.enlist}}{Experimental feature. Default is \code{NULL}. If set to a function
(e.g., \code{list}), the \code{j} expression can return a \code{list}, which will then
be "enlisted" into columns in the result.}
\item{\code{datatable.unique.names}}{A character string, default \code{"off"}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should somehow state that this currently only holds for setnames and not other functions that could create duplicate names like merge, cbind, ...

options(datatable.unique.names = "error")
test(2366.3, setnames(copy(DT), "Petal.Length", "Sepal.Length"), error = "Duplicate column names created")
options(datatable.unique.names = "rename")
test(2366.4, names(setnames(copy(DT), "Petal.Length", "Sepal.Length")), c("Sepal.Length", "Sepal.Width", "Sepal.Length.1", "Petal.Width", "Species")) No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline


if (anyDuplicated(names_vec)) {
dups = unique(names_vec[duplicated(names_vec)])
msg = sprintf("Duplicate column names created: %s. This may cause ambiguity.", brackify(dups))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this problematic for a table like dt = data.table('%s'=1, b=2)?

process_name_policy = function(names_vec) {
policy = getOption("datatable.unique.names", "off")

if (is.null(policy) || policy == "off") return(names_vec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a second thought NULL might be a better default for "off"

# onLoad.R
datatable.unique.names = NULL   # NULL means off

#4044
DT = as.data.table(iris)
options(datatable.unique.names = "off")
test(2366.1, names(setnames(copy(DT), "Petal.Length", "Sepal.Length")), c("Sepal.Length", "Sepal.Width", "Sepal.Length", "Petal.Width", "Species"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now the nice options parameter in test() so it would be good to use it!

@ben-schwen
Copy link
Member

There are also some linters issues which need to be taken care of.

We also have check_duplicate_names in utils.R so it might need be worth checking if we can unify both into a single function...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should setnames() produce duplicate column names without warning?

2 participants