samba.sambatensor¶
SambaTensor Class¶
- class SambaTensor(torch_tensor: Tensor | None = None, shape: Iterable[int] | None = None, dtype: dtype | None = None, name: str | None = None, batch_dim: int | None = None, named_dims: Iterable[str | None] | None = None, sized_dims: Iterable[str | None] | None = None, materializer: Callable[[SambaTensor, MultithreadedRNG], Tensor] | None = None, is_complex: bool | None = None, region_name: str | None = None)¶
The
SambaTensoris the base tensor data structure for SambaFlow. It wrapstorch.Tensorand adds custom data members and methods to support graph tracing and interfacing with the device. You must useSambaTensorwhen you are running a model on the RDU (the device).Any application that runs on RDU must use SambaTensor, which supports:
Static tracing, which saves memory and compute resources
Getting and setting device memory
A
SambaTensorcan be constructed from atorch.Tensor:using
samba.from_torch_tensor(torch_tensor)(similar to constructing atorch.Tensorfrom anumpy.ndarraywithtorch.from_numpy())directly with the
SambaTensorconstructor, that is,SambaTensor(torch_tensor).
With either construction method, the new SambaTensor and the original PyTorch tensor share the same memory, so any change to the original PyTorch tensor is reflected in the new SambaTensor. This is different from
torch.Tensor(np.ndarray)which copies the data.A SambaTensor can be empty to use less memory. You can construct an empty SambaTensor with an empty PyTorch tensor using the methods listed above or with the
shapeanddtypeparameters. An empty SambaTensor is especially helpful for graph tracing where you only need tensorshapesanddtypes. An entire model can be instantiated with empty SambaTensors by using lazy parameters withsamba.lazy_param. When an empty SambaTensor is used on the RDU, the SambaTensor’smaterializeris used to initialize the data.Use
SambaTensor.torch()to retrieve the original PyTorch tensor, similar totorch.Tensor.numpy()which returns the originalnumpy.ndarray.SambaTensors can be used on the host CPU just like PyTorch tensors, though the supported methods are limited to the functions in samba.functional.
Accessing device-side weight and gradient data
The
SambaTensorprovides APIs to directly access tensor data on RDU device memory. For example,# samba_tensor.sn_data and samba_tensor.sn_grad copy data # from the device to host memory PyTorch tensors print('data on device memory:', sambatensor.sn_data) print('gradient data on device memory if it exists:', sambatensor.sn_grad) # Data copy happens anytime sn_data and sn_grad are accessed sn_weight = weight.sn_data sn_grad = weight.sn_grad
Modifying device memory weight and grad data
The
SambaTensorprovides APIs to modify tensor data on RDU device memory.samba_tensor.sn_data = tensor # as long as isinstance(tensor, torch.Tensor), like torch.Tensor or SambaTensor
or
# transfer the host data to the device samba_tensor.rdu()
You can assign a PyTorch tensor or
SambaTensortosn_data, which will copy the data to the tensor on the device. Similarly, you can assign a PyTorch tensor orSambaTensortosn_grad, which will copy the data to the tensor’s gradient on the device.# Weight and its gradient are updated on host and then copied to the device weight.sn_data = sn_weight / torch.norm(sn_weight) # weight normalization on host weight.sn_grad = sn_grad / torch.norm(sn_grad) # grad normalization on host
Alternatively, we can use
rdu()andcpu()to synchronize the data between host memory (on CPU) and device memory (on RDU) of aSambaTensor, e.g.:# Modify weights on the host only, weights on the device will # remain unchanged weight.data = weight / torch.norm(weight) weight.grad = weight / torch.norm(weight.grad) # Print device-side weights before synchronizing host-device memory print(weight.sn_data) # Copy host memory to device memory # Note: both data and grad will be synchronized weight.rdu() # Print device-side weights after device-to-host copy print(weight.sn_data) # Modify weight grad on device directly from the host weight.sn_grad = torch.zeros_like(sn_grad) # Print host-side weight gradients before synchronizing host-device memory print(weight.grad) # Copy device memory to host memory weight.cpu() # Print host-side weight gradients after device-to-host copy print(weight.grad)
The
sn_dataandsn_gradmembers of theSambaTensorclass are Python data descriptors with custom setter and getter methods. When you accesssn_dataandsn_gradfrom a SambaTensor, they return atorch.Tensorto represent the data on device memory. Any modification to this returnedtorch.Tensoris not reflected in RDU memory.Note
Tensor manipulation on the host is expensive because the computations are performed by the CPU and data synchronization between host and device is bandwidth-heavy. Do not use these four
SambaTensorAPIs (sn_data,sn_grad,rdu()andcpu()) unless necessary, e.g. when checkpointing models.SambaTensorhas similar methods and attributes astorch.Tensor. In addition,SambaTensorhas methods and members that are specific to the RDU dataflow architecture.In instances where an operation involves input SambaTensors of different data types, SambaFlow will follow the dtype promotion rules that PyTorch uses to do the computation (see information on Promotion in https://pytorch.org/docs/stable/tensor_attributes.html#torch.dtype for details). For example, when calling
samba.addwith one input of dtypebfloat16and the other input of dtypefloat32, thebfloat16SambaTensor will be promoted tofloat32.- Parameters:
torch_tensor – A
torch.Tensorobject used to construct theSambaTensor.shape – Shape of the tensor, used to implement tracing. Cannot be specified with
torch_tensor.dtype – Data type of the tensor. Should be a
torch.dtypeobject. Cannot be specified withtorch_tensor.name – User-provided name for the SambaTensor, similar to
tf.Placeholder.batch_dim – Deprecated.
named_dims – Deprecated.
sized_dims – Experimental. This argument is for a feature in development.
materializer – Function to initialize this tensor with values when transferring this tensor to the RDU. Only applicable if this tensor was lazily initialized (see
samba.session.enable_lazy_param). The function should accept parametersshape,dtype, andrequires_gradand return atorch.Tensor. The materializer does not accept atorch.Tensor.is_complex – Experimental. Whether this tensor represents a complex tensor or not.
region_name – Name for tensor’s location in memory. See
sn_region_name.
Example
>>> import torch >>> import sambaflow.samba as samba
>>> # Initialize SambaTensor with constructor >>> torch_tensor = torch.Tensor([1, 2]) >>> samba_tensor0 = samba.SambaTensor(torch_tensor)
>>> # Initialize SambaTensor with samba.from_torch_tensor >>> samba_tensor1 = samba.from_torch_tensor(torch_tensor, name="samba_tensor1")
>>> # 3 ways to initialize empty SambaTensor with shape (2,3) >>> empty_samba_tensor0 = samba.SambaTensor(torch.empty(2, 3), name="empty_samba_tensor0") >>> empty_samba_tensor1 = samba.SambaTensor(shape=(2,3), dtype=torch.bfloat16, name="empty_samba_tensor1") >>> empty_samba_tensor2 = samba.from_torch_tensor(torch.empty(2, 3), name="empty_samba_tensor2")
- __getitem__(x: int | slice | None | SambaTensor | Tuple[int | slice | None | SambaTensor]) SambaTensor¶
Indexes this SambaTensor. This function can be called with the
[]operator.Currently supported index types are:
an integer, to retrieve a single element along that dimension.
a
slice, to retrieve some subset of elements along that dimension.None, to indicate that the tensor should be unsqueezed at that index.a
SambaTensor, to gather indices indicated by the tensor. SambaTensor currently does not support indexing with multidimensional SambaTensors or multiple SambaTensors.a list, to retrieve some elements by indices along that dimension.
- Parameters:
x – the index object
Example:
>>> samba_tensor = samba.randn(2, 3) >>> samba_tensor.data tensor([[ 0.7102, -0.8594, -0.5047], [ 0.8140, -0.4194, 1.5488]]) >>> samba_tensor[:, 2].data tensor([-0.5047, 1.5488]) >>> index_tensor = samba.SambaTensor(torch.Tensor([0, 2])) >>> samba_tensor[None, :, index_tensor].data tensor([[[ 0.7102, -0.5047], [ 0.8140, 1.5488]]])
- __setitem__(x: int | slice | None | Tuple[int | slice | None], update: int | float | SambaTensor)¶
Indexes this SambaTensor and sets the data. This function can be called with the
[]operator.Currently supported index types are:
an integer, to retrieve a single element along that dimension.
a
slice, to retrieve some subset of elements along that dimension.None, to indicate that the tensor should be unsqueezed at that index.a
SambaTensor, to gather indices indicated by the tensor. SambaTensor currently does not support indexing with multidimensional SambaTensors or multiple SambaTensors.
- Parameters:
x – the index object
Example:
>>> samba_tensor = samba.zeros(2,3) >>> samba_tensor.data tensor([[0., 0., 0.], [0., 0., 0.]]) >>> samba_tensor[:, 2] = samba.ones(2) >>> samba_tensor.data tensor([[0., 0., 1.], [0., 0., 1.]]) >>> index_tensor = samba.SambaTensor(torch.Tensor([0, 2])) >>> samba_tensor[None, :, index_tensor] = 2 * samba.ones(1, 2, 2) >>> samba_tensor.data tensor([[2., 0., 2.], [2., 0., 2.]])
- backward(gradient: SambaTensor | Tensor | None = None, retain_graph: bool | None = None) None¶
Calls
torch.Tensor.backward()on the underlying PyTorch tensor and computes the gradient of the PyTorch tensor with respect to the graph leaves.The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient,
backward()also requires specifying the gradient.gradientshould be a tensor of the same type and location asselfthat contains the gradient of the differentiated function w.r.t.self.This function accumulates gradients in the leaves - you might need to zero
.gradattributes or set them toNonebefore calling it. See Default gradient layouts for details on the memory layout of accumulated gradients.- Parameters:
gradient – Gradient with respect to the tensor. If
gradientis a tensor, it is automatically converted to a tensor that does not require a gradient.Nonevalues can be specified ifselfis a scalar tensor or a tensor that doesn’t require a gradient. If aNonevalue is acceptable, then this argument is optional. Defaults toNone.retain_graph – If
False, the graph used to compute the grads is freed. In nearly all cases setting this option toTrueis not needed and often can be worked around in a much more efficient way. Defaults toNone.
- bfloat16() SambaTensor¶
self.bfloat16is equivalent toself.type(torch.bfloat16). Seetype().
- bool() SambaTensor¶
self.boolis equivalent toself.type(torch.bool). Seetype().
- clear_data() None¶
Clear the tensor data on the host.
- cpu(inplace: bool = False) None¶
Copy the data from device memory to the host. Avoid using
cpu()because this operation is bandwidth intensive.- Parameters:
inplace – whether to modify the underlying host memory in-place. Defaults to False.
- data_ptr() int¶
Returns the address of the first element of the associated PyTorch tensor.
See also
- dim() Size | int¶
Returns the number of dimensions of
selftensor.
- element_size() int¶
Returns the element size in bytes
- float() SambaTensor¶
self.floatis equivalent toself.type(torch.float). Seetype().
- int() SambaTensor¶
self.intis equivalent toself.type(torch.int). Seetype().
- static is_fast_access(name: str) bool¶
Returns
Trueif the SambaTensor with sn_name name is a fast access tensor. ReturnsFalseotherwise. Seefast_accessfor details.
- is_floating_point() bool¶
Returns
Trueifselfis a floating-point tensor, otherwise returnsFalse.
- item() float | int¶
Returns the value of this tensor as a standard Python number. This only works for tensors with one element. This operation is not differentiable.
Example
>>> x = samba.SambaTensor(torch.tensor([1.0])) >>> x.item() 1.0
- long() SambaTensor¶
self.longis equivalent toself.type(torch.long). Seetype().
- materialize_() None¶
If the tensor does not have data, materializes the tensor. Otherwise, does nothing.
- new_empty(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor¶
Returns a SambaTensor of size
sizefilled with uninitialized data. By default, the returned SambaTensor has the sametorch.dtypeas this tensor.- Parameters:
size – a list, tuple, or
torch.Sizeof integers defining the shape of the output tensor.dtype – the desired type of the returned tensor. If
None, sametorch.dtypeas this SambaTensor.requires_grad – if autograd should record operations on the returned tensor. Defaults to False.
Example:
>>> sambatensor = samba.ones((), dtype=torch.float32) >>> sambatensor.new_empty((2, 3)).data tensor([[0.0000e+00, 1.4405e-41, 0.0000e+00], [0.0000e+00, 0.0000e+00, 0.0000e+00]])
See also
- new_full(size: Tuple[int], fill_value: int | float, dtype: dtype | None = None, device: device | None = None, requires_grad: bool | None = False) SambaTensor¶
Returns a SambaTensor of size
sizefilled withfill_value. By default, the returned SambaTensor has the sametorch.dtypeas this tensor.- Parameters:
size – a list, tuple, or
torch.Sizeof integers defining the shape of the output tensor.fill_value – the number to fill the output tensor with.
dtype – the desired type of the returned tensor. If
None, sametorch.dtypeas this SambaTensor.device – the desired device of the returned tensor. If
None, sametorch.deviceas this tensor.requires_grad – if autograd should record operations on the returned tensor. Defaults to False.
Note
The PyTorch API optional keyword arg
device
is not supported on RDU and has no effect.
Example
>>> sambatensor = samba.ones((), dtype=torch.float32) >>> sambatensor.new_full((2, 3), 5.0).data tensor([[5., 5., 5.], [5., 5., 5.]])
>>> # new_full with explicit data type >>> sambatensor = samba.ones((), dtype=torch.float32) >>> sambatensor.new_full((2, 3), 5.0, dtype=torch.bfloat16).data tensor([[5., 5., 5.], [5., 5., 5.]], dtype=torch.bfloat16)
See also
- new_ones(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor¶
Returns a SambaTensor of size
sizefilled with1. By default, the returned SambaTensor has the sametorch.dtypeas this tensor.- Parameters:
size – a list, tuple, or
torch.Sizeof integers defining the shape of the output tensor.dtype – the desired type of the returned tensor. If
None, sametorch.dtypeas this SambaTensor.requires_grad – if autograd should record operations on the returned tensor. Defaults to False.
Example
>>> sambatensor = samba.randn((), dtype=torch.bfloat16) >>> sambatensor.new_ones((2, 3)).data tensor([[1., 1., 1.], [1., 1., 1.]], dtype=torch.bfloat16)
See also
- new_zeros(size: int | Iterable[int], dtype: dtype | None = None, requires_grad: bool = False) SambaTensor¶
Returns a SambaTensor of size
sizefilled with0. By default, the returned SambaTensor has the sametorch.dtypeas this tensor.- Parameters:
size – a list, tuple, or
torch.Sizeof integers defining the shape of the output tensor.dtype – the desired type of the returned tensor. If
None, sametorch.dtypeas this SambaTensor.requires_grad – if autograd should record operations on the returned tensor. Defaults to False.
Example
>>> sambatensor = samba.randn((), dtype=torch.bfloat16) >>> sambatensor.new_zeros((2, 3)).data tensor([[0., 0., 0.], [0., 0., 0.]], dtype=torch.bfloat16)
See also
- numel() int¶
Returns the number of elements.
- rdu() None¶
Synchronizes the host memory of the tensor (and its gradient if it exists) to its device memory. Similar to an in-place version of
torch.Tensor.cuda(). Avoid usingrdu()because this operation is bandwidth intensive.
- requires_grad_(requires_grad: bool = True) None¶
Change if autograd should record operations on this tensor by setting this tensor’s
requires_gradattribute in-place. Returns this tensor.requires_grad_()’s main use case is to tell autograd to begin recording operations on a SambaTensor (tensor). Iftensorhasrequires_grad=False(because it was obtained through a DataLoader, or required preprocessing or initialization),tensor.requires_grad_()causes autograd to record operations ontensor.- Parameters:
requires_grad – If autograd should record operations on this tensor. Default:
True.
See also
- reusable() bool¶
Returns
Trueif the tensor memory can be reused for host-to-device data transfers.Note
reusable()assumes that the host PyTorch tensor’s NumPy array is contiguous
- short() SambaTensor¶
self.shortis equivalent toself.type(torch.short). Seetype().
- size(dim: int | None = None) Size | int¶
Returns the size of the
selftensor. Ifdimis not specified, the returned value is atorch.Size, a subclass oftuple. Ifdimis specified, returns an int holding the size of that dimension.- Parameters:
dim – the dimension for which to retrieve the size. Defaults to None.
Example
>>> t = samba.empty(3, 4, 5) >>> t.size() torch.Size([3, 4, 5]) >>> t.size(dim=1) 4
- stride(dim: int | None = None) int | Tuple[int]¶
Returns the stride of tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension
dim. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension dim.See also
See
torch.stride().
- to(*args, **kwargs) SambaTensor¶
Performs Tensor dtype conversion. A
dtypeis inferred from the arguments ofself.to(*args, **kwargs).New in version 1.18.
Here are the ways to call
to:- to(dtype) SambaTensor
Returns a SambaTensor with the specified
dtype.
- to(other) SambaTensor
Returns a SambaTensor with the same
torch.dtypeas the SambaTensorother.
Example
>>> samba.set_seed(1) >>> sambatensor = samba.randn(2, 2) # Initially dtype=float32 >>> sambatensor.to(torch.bfloat16).data tensor([[0.6602, 0.2676], [0.0618, 0.6211]], dtype=torch.bfloat16)
>>> other_torch = torch.randn((), dtype=torch.float64) >>> sambatensor.to(other_torch).data tensor([[0.6614, 0.2669], [0.0617, 0.6213]], dtype=torch.float64)
See also
- torch() Tensor¶
Returns the SambaTensor’s underlying
torch.Tensor. This method is the equivalent oftorch.Tensor.numpy().
- torch_tensor() Tensor¶
Returns the underlying PyTorch tensor if it has data. Otherwise, materializes the tensor and returns the materialized tensor. If the tensor was lazily created and randomly initialized, then successive calls to
torch_tensor()may produce different results.
- type(dtype: str | dtype | None = None, non_blocking: bool = False, **kwargs) str | SambaTensor¶
Returns the type if dtype is not provided, else casts this object to the specified type.
If
selfis already of the correct type, no copy is performed and the original object is returned.- Parameters:
dtype – The desired dtype.
non_blocking – If
True, and the source is in pinned memory and destination is on the GPU or the source is on the GPU and the destination is in pinned memory, then the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect.kwargs – For compatibility, may contain the key
asyncin place of thenon_blockingargument. Theasyncarg is deprecated.
Note
The PyTorch API optional keyword args
non_blocking(bool, optional)**kwargs
are not supported on RDU and throw an exception.
See also
For details, see
torch.Tensor.type().
- type_as(tensor: SambaTensor) SambaTensor¶
Returns this tensor cast to the type of the given tensor.
New in version 1.18.
This is a no-op if the tensor is already of the correct type. This is equivalent to
self.type(tensor.type())- Parameters:
tensor – the tensor which has the desired type
- view_as(other: SambaTensor) SambaTensor¶
Returns a new
SambaTensorwith the same data asselfbut withother’s shape.- Parameters:
other – the
SambaTensorwhose shape is used for the newSambaTensor.
- property T: SambaTensor¶
Alias for
samba.t().
- property data: Tensor¶
Handle to the data of the underlying PyTorch tensor.
- Getter:
Gets the data of the underlying PyTorch tensor.
- Setter:
Sets the data of the underlying PyTorch tensor.
- property device: device¶
The
torch.devicewhere the host tensor is.
- property dtype: dtype¶
Returns the type of the SambaTensor
- property fast_access: bool¶
Fast access
SambaTensorsuse the pinned_memory API. By default, SambaFlow automaticallys mark all input tensors as fast access after tracing. Seesamba.session.enable_pinned_memoryfor details.- Getter:
Returns
Trueifselfis a fast access tensor, otherwise returnsFalse.- Setter:
Sets the
fast_accessproperty ofself. Can set thefast_accessproperty for an output gradient tensor even if it does not have a SambaTensor.
- property grad: Tensor¶
Handle to the underlying PyTorch tensor’s gradient
- Getter:
Gets the underlying PyTorch tensor’s gradient
- Setter:
Sets the underlying PyTorch tensor’s gradient
- property materializer: Callable[[SambaTensor, MultithreadedRNG], Tensor]¶
The SambaTensor’s materializer. The materializer is used to initialize a tensor with values when the tensor was lazily initialized.
- Getter:
Gets the SambaTensor’s materializer
- Setter:
Sets the SambaTensor’s materializer
- property materializer_provided: bool¶
Returns
Trueif a materializer is specified, otherwise returnsFalse.
- property requires_grad: bool¶
Trueif gradients need to be computed for this SambaTensor,Falseotherwise.- Getter:
Returns
Trueif gradients need to be computed for this SambaTensor, otherwise returnsFalse.- Setter:
Sets whether gradients need to be computed for this SambaTensor.
See also
See
torch.Tensor.requires_grad().
- sn_data¶
Handle to the RDU device memory of a SambaTensor.
- Getter:
When accessed, returns a new
torch.Tensorwith a copy of its device memory.- Setter:
When set, copies the data from the given tensor to its device memory.
- sn_grad¶
Similar to
sn_data, handle to the RDU device memory of its gradient tensor.selfmust have been compiled withrequires_grad = True.- Getter:
When accessed, copies
self’s gradient from device memory to the host as a newtorch.Tensor.- Setter:
When set, copies the data from the given tensor to
self’s gradient in device memory.
- property sn_grad_name: str¶
Name of the SambaTensor’s gradient tensor.
- property sn_name: str¶
Unique string identifier of each tensor that is initialized on the RDU device memory. If not initialized, it is the empty string (
'').- Getter:
Gets the SambaTensor’s sn_name.
- Setter:
Sets the SambaTensor’s sn_name.
- property sn_region_name: str¶
Handle to
self’s region name, used to denote a tensor’s location in memory. If the tensor was created without a region name, thesn_nameis set as the region name. If two SambaTensors share the samesn_region_name, then they share the same location in device memory.- Getter:
Gets the SambaTensor’s sn_region_name.
- Setter:
Sets the SambaTensor’s sn_region_name.
Example:
# if region_name is unspecified, sn_region_name will default to the sn_name, so sn_region_name will be "t0" t0 = samba.SambaTensor(torch.Tensor([1, 2]), name="t0") # sn_region_name will be "t1_other" t1 = samba.SambaTensor(torch.Tensor([3, 4]), name="t1", region_name="t1_other") # sn_region_name will be "t0", so SambaTensors t0 and t2 will share the same memory t2 = samba.SambaTensor(torch.Tensor([1, 2]), name="t2", region_name="t0")
SambaTensor Utility Functions¶
- from_torch_tensor(tensor: torch.Tensor, name: str | None = None, batch_dim: int | None = None, named_dims: Iterable[str | None] | None = None, region_name: str | None = None) SambaTensor¶
Converts a PyTorch tensor to a
SambaTensor. Iftensoris aSambaTensor,from_torch_tensordoes nothing.- Parameters:
tensor – the
torch.TensororSambaTensorto convert to aSambaTensor.name – user-provided name of the source tensor.
batch_dim – Deprecated.
named_dims – Deprecated.
region_name – name for tensor’s location in memory.
- Returns:
SambaTensororNoneiftensorisNone
- to_torch(obj: SambaTensor | Tensor | None) Tensor | None¶
Converts a SambaTensor to a PyTorch tensor. If
objis a PyTorch tensor,to_torchdoes nothing.- Parameters:
obj – The tensor to convert to a PyTorch tensor.
- Returns:
A PyTorch tensor if
objis a SambaTensor or a PyTorch tensor. IfobjisNone, returnsNone.
