Conversation
3bed38d to
8665c4a
Compare
eabfce9 to
0c903ac
Compare
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
|
Let's make this a draft too to cut down on CI thrash |
f5857b3 to
32e182d
Compare
32e182d to
813e2c2
Compare
f5faaf6 to
2359d28
Compare
lkdvos
left a comment
There was a problem hiding this comment.
Left some comment throughout, there are some things that I am not entirely convinced by but the rest looks great, thanks for working through all of this!
For the similarstoragetype(tensor, storagetype) calls that you added, this seems like something we should probably discuss over a separate PR, and it would be great if we could consolidate this one to get the remainder of the fixes in.
Would you be up for splitting these two things, and then getting this merged?
The same kind of holds for some of the comments I made too, if we can just postpone the things that are not obvious, but already get the other parts in, that would probably be helpful.
(Note that I am very much aware that none of this is your fault and this PR has lived for too long so the design shifts a bit, for which I do apologize!)
| function TensorKit.allocate_buffers( | ||
| tdst::CuTensorMap, tsrc::CuTensorMap, transformer::TensorKit.GenericTreeTransformer | ||
| ) | ||
| sz = TensorKit.buffersize(transformer) | ||
| # force zeros to ensure the buffers are empty | ||
| # otherwise memory re-use can fill them with garbage data | ||
| return CUDA.zeros(eltype(tdst.data), sz), CUDA.zeros(eltype(tsrc.data), sz) | ||
| end |
There was a problem hiding this comment.
This is slightly confusing to me, the zeros shouldn't be necessary (in fact, the implementation reuses the start of the buffer for each of the different blocks anyways), so I would have guessed that the similar(tsrc.data, sz) calls should be sufficient and correctly allocate device arrays here?
| Mb = storagetype(T.b) | ||
| return promote_storagetype(Ma, Mb) | ||
| return promote_storagetype(T.a, T.b) | ||
| elseif eltype(T) isa Union |
There was a problem hiding this comment.
Is this to better support BlockTensorMap? Do we ever have tensors with union scalartypes?
There was a problem hiding this comment.
Yes, it's for the block case. I don't think we can have scalar unions?
There was a problem hiding this comment.
It's a bit weird to support that here, since for generic AbstractTensorMap eltype would return a Number, which is why I asked about the scalartype thing. Maybe we can just copy this definition and overload in BlockTensorKit for AbstractBlockTensorMap?
| function TensorKit._add_general_kernel_nonthreaded!( | ||
| tdst::CuTensorMap, tsrc::CuTensorMap, p, transformer::TensorKit.GenericTreeTransformer, α, β, backend... | ||
| ) | ||
| # preallocate buffers | ||
| buffers = TensorKit.allocate_buffers(tdst, tsrc, transformer) | ||
|
|
||
| for subtransformer in transformer.data | ||
| # Special case without intermediate buffers whenever there is only a single block | ||
| if length(subtransformer[1]) == 1 | ||
| TensorKit._add_transform_single!(tdst, tsrc, p, subtransformer, α, β, backend...) | ||
| else | ||
| cu_subtransformer = tuple(CUDA.adapt(CuArray, subtransformer[1]), subtransformer[2:end]...) | ||
| TensorKit._add_transform_multi!(tdst, tsrc, p, cu_subtransformer, buffers, α, β, backend...) | ||
| end | ||
| end | ||
| return nothing | ||
| end |
There was a problem hiding this comment.
I guess the only change here is to promote the unitary basis transformation into a CuArray, which probably makes more sense to just support at the mul callsite (which I think @kshyatt already fixed, so this might no longer be required?)
There was a problem hiding this comment.
Let's remove it and see! 😈
| function TensorKit.blocktype(::Type{<:CuTensorMap{T, S}}) where {T, S} | ||
| return CuMatrix{T, CUDA.DeviceMemory} | ||
| end | ||
|
|
There was a problem hiding this comment.
| function TensorKit.blocktype(::Type{<:CuTensorMap{T, S}}) where {T, S} | |
| return CuMatrix{T, CUDA.DeviceMemory} | |
| end |
I think this is now more properly addressed through type inference.
| function similarstoragetype(::Type{TT}, ::Type{T}) where {TT <: AbstractTensorMap, T <: Number} | ||
| return similarstoragetype(storagetype(TT), T) | ||
| end |
There was a problem hiding this comment.
This is just a formatting change right?
| twistB = false | ||
| end | ||
|
|
||
| TTC = storagetype(C) |
There was a problem hiding this comment.
I guess this effectively means that we are deciding to promote inputs to the storagetype of the output. I'm not sure if I am fully convinced that we should solve this automatically at all, since I think that is also inconsistent with how regular matrices work (same for adding):
julia> CUDA.rand(2, 2) * rand(Float32, 2, 2)
ERROR: Scalar indexing is disallowed.I do think that this might be the right approach, and requiring explicit conversions in the cases of mixed inputs seems like the right call to me. (Even though I can see how that is annoying for MPSKit 😉 )
|
It's completely fine!! This has stayed open as I work through adding more tests for MPSKit, so I think we can pare off the simpler stuff we agree on, and then discuss things that are more contentious. |
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
Needed to get more MPSKit examples working