Loading Generic Array DataΒΆ
Even if your data is not strictly related to fields commonly used in astrophysical codes or your code is not supported yet, you can still feed it to yt
to use its advanced visualization and analysis facilities. The only requirement is that your data can be represented as three-dimensional NumPy arrays with a consistent grid structure. What follows are some common examples of loading in generic array data that you may find useful.
Generic Unigrid Data¶
The simplest case is that of a single grid of data spanning the domain, with one or more fields. The data could be generated from a variety of sources; we'll just give three common examples:
Data generated "on-the-fly"¶
The most common example is that of data that is generated in memory from the currently running script or notebook.
from yt.imods import *
from yt.utilities.physical_constants import cm_per_kpc, cm_per_mpc
In this example, we'll just create a 3-D array of random floating-point data using NumPy:
arr = np.random.random(size=(64,64,64))
To load this data into yt
, we need to assign it a field name, in this case "Density", and place it into a dictionary. Then, we call load_uniform_grid
:
data = dict(Density = arr)
bbox = np.array([[-1.5, 1.5], [-1.5, 1.5], [-1.5, 1.5]])
pf = load_uniform_grid(data, arr.shape, cm_per_mpc, bbox=bbox, nprocs=64)
load_uniform_grid
takes the following arguments and optional keywords:
data
: This is a dict of numpy arrays, where the keys are the field names.domain_dimensions
: The domain dimensions of the unigridsim_unit_to_cm
: Conversion factor from simulation units to centimetersbbox
: Size of computational domain in units sim_unit_to_cmnprocs
: If greater than 1, will create this number of subarrays out of datasim_time
: The simulation time in secondsperiodicity
: A tuple of booleans that determines whether the data will be treated as periodic along each axis
This example creates a yt
-native parameter file pf
that will treat your array as a density field in cubic domain of 3 Mpc edge size (3 * 3.0856e24 cm) and simultaneously divide the domain into nprocs
= 64 chunks, so that you can take advantage of the underlying parallelism.
The resulting pf
functions exactly like a parameter file from any other dataset--it can be sliced, and we can show the grid boundaries:
slc = SlicePlot(pf, 2, ["Density"])
slc.set_cmap("Density", "Blues")
slc.annotate_grids(cmap=None)
slc.show()
Particle fields are detected as one-dimensional fields. The number of particles is set by the number_of_particles
key in data
. Particle fields are then added as one-dimensional arrays in a similar manner as the three-dimensional grid fields:
posx_arr = np.random.uniform(low=-1.5, high=1.5, size=10000)
posy_arr = np.random.uniform(low=-1.5, high=1.5, size=10000)
posz_arr = np.random.uniform(low=-1.5, high=1.5, size=10000)
data = dict(Density = np.random.random(size=(64,64,64)),
number_of_particles = 10000,
particle_position_x = posx_arr,
particle_position_y = posy_arr,
particle_position_z = posz_arr)
bbox = np.array([[-1.5, 1.5], [-1.5, 1.5], [-1.5, 1.5]])
pf = load_uniform_grid(data, data["Density"].shape, cm_per_mpc, bbox=bbox, nprocs=4)
In this example only the particle position fields have been assigned. number_of_particles
must be the same size as the particle arrays. If no particle arrays are supplied then number_of_particles
is assumed to be zero. Take a slice, and overlay particle positions:
slc = SlicePlot(pf, "z", ["Density"])
slc.set_cmap("Density", "Blues")
slc.annotate_particles(0.25, p_size=12.0, col="Red")
slc.show()
HDF5 data¶
HDF5 is a convenient format to store data. If you have unigrid data stored in an HDF5 file, it is possible to load it into memory and then use load_uniform_grid
to get it into yt
:
import h5py
from yt.config import ytcfg
data_dir = ytcfg.get('yt','test_data_dir')
f = h5py.File(data_dir+"/UnigridData/turb_vels.h5", "r") # Read-only access to the file
The HDF5 file handle's keys correspond to the datasets stored in the file:
print f.keys()
We can iterate over the items in the file handle to get the data into a dictionary, which we will then load:
data = {k:v for k,v in f.items()}
bbox = np.array([[-0.5, 0.5], [-0.5, 0.5], [-0.5, 0.5]])
pf = load_uniform_grid(data, data["Density"].shape, 250.*cm_per_kpc, bbox=bbox, nprocs=8, periodicity=(False,False,False))
In this case, the data came from a simulation which was 250 kpc on a side. An example projection of two fields:
prj = ProjectionPlot(pf, "z", ["z-velocity","Temperature"], weight_field="Density")
prj.show()
Volume Rendering Loaded Data¶
Volume rendering requires defining a TransferFunction
to map data to color and opacity and a camera
to create a viewport and render the image.
#Find the min and max of the field
mi, ma = pf.h.all_data().quantities["Extrema"]('Temperature')[0]
#Reduce the dynamic range
mi += 1.5e7
ma -= 0.81e7
Create a Transfer Function that goes from the minimum to the maximum of the data:
tf = ColorTransferFunction((mi, ma), grey_opacity=False)
Define the properties and size of the camera
viewport:
# Choose a vector representing the viewing direction.
L = [0.5, 0.5, 0.5]
# Define the center of the camera to be the domain center
c = pf.domain_center
# Define the width of the image
W = 1.5*pf.domain_width
# Define the number of pixels to render
Npixels = 512
Create a camera
object and
cam = pf.h.camera(c, L, W, Npixels, tf, fields=['Temperature'],
north_vector=[0,0,1], steady_north=True,
sub_samples=5, no_ghost=False,log_fields=False)
cam.transfer_function.map_to_colormap(mi,ma,
scale=15.0, colormap='algae')
cam.show()
FITS image data¶
The FITS file format is a common astronomical format for 2-D images, but it can store three-dimensional data as well. The AstroPy project has modules for FITS reading and writing, which were incorporated from the PyFITS library.
import astropy.io.fits as pyfits
# Or, just import pyfits if that's what you have installed
Using pyfits
we can open a FITS file. If we call info()
on the file handle, we can figure out some information about the file's contents. The file in this example has a primary HDU (header-data-unit) with no data, and three HDUs with 3-D data. In this case, the data consists of three velocity fields:
f = pyfits.open(data_dir+"/UnigridData/velocity_field_20.fits.gz")
f.info()
We can put it into a dictionary in the same way as before, but we slice the file handle f
so that we don't use the PrimaryHDU
. hdu.name
is the field name and hdu.data
is the actual data. We can check that we got the correct fields.
data = {hdu.name.lower():hdu.data for hdu in f[1:]}
print data.keys()
Now we load the data into yt
. This particular file doesn't have any coordinate information, but let's assume that the box size is a Mpc. Since these are velocity fields, we can overlay velocity vectors on slices, just as if we had loaded in data from a supported code.
pf = load_uniform_grid(data, data["x-velocity"].shape, cm_per_mpc)
slc = SlicePlot(pf, "x", ["x-velocity","y-velocity","z-velocity"])
slc.annotate_velocity()
slc.show()
Generic AMR Data¶
In a similar fashion to unigrid data, data gridded into rectangular patches at varying levels of resolution may also be loaded into yt
. In this case, a list of grid dictionaries should be provided, with the requisite information about each grid's properties. This example sets up two grids: a top-level grid (level == 0
) covering the entire domain and a subgrid at level == 1
.
grid_data = [
dict(left_edge = [0.0, 0.0, 0.0],
right_edge = [1.0, 1.0, 1.],
level = 0,
dimensions = [32, 32, 32]),
dict(left_edge = [0.25, 0.25, 0.25],
right_edge = [0.75, 0.75, 0.75],
level = 1,
dimensions = [32, 32, 32])
]
We'll just fill each grid with random density data, with a scaling with the grid refinement level.
for g in grid_data: g["Density"] = np.random.random(g["dimensions"]) * 2**g["level"]
Particle fields are supported by adding 1-dimensional arrays to each grid
and setting the number_of_particles
key in each grid
's dict. If a grid has no particles, set number_of_particles = 0
, but the particle fields still have to be defined since they are defined elsewhere; set them to empty NumPy arrays:
grid_data[0]["number_of_particles"] = 0 # Set no particles in the top-level grid
grid_data[0]["particle_position_x"] = np.array([]) # No particles, so set empty arrays
grid_data[0]["particle_position_y"] = np.array([])
grid_data[0]["particle_position_z"] = np.array([])
grid_data[1]["number_of_particles"] = 1000
grid_data[1]["particle_position_x"] = np.random.uniform(low=0.25, high=0.75, size=1000)
grid_data[1]["particle_position_y"] = np.random.uniform(low=0.25, high=0.75, size=1000)
grid_data[1]["particle_position_z"] = np.random.uniform(low=0.25, high=0.75, size=1000)
Then, call load_amr_grids
:
pf = load_amr_grids(grid_data, [32, 32, 32], 1.0)
load_amr_grids
also takes the same keywords bbox
and sim_time
as load_uniform_grid
. Let's take a slice:
slc = SlicePlot(pf, "z", ["Density"])
slc.annotate_particles(0.25, p_size=15.0, col="Pink")
slc.show()
Caveats for Loading Generic Array Data¶
- Units will be incorrect unless the data has already been converted to cgs.
- Particles may be difficult to integrate.
- Data must already reside in memory before loading it in to
yt
, whether it is generated at runtime or loaded from disk. - Some functions may behave oddly, and parallelism will be disappointing or non-existent in most cases.
- No consistency checks are performed on the hierarchy
- Consistency between particle positions and grids is not checked;
load_amr_grids
assumes that particle positions associated with one grid are not bounded within another grid at a higher level, so this must be ensured by the user prior to loading the grid data.
(Loading_Generic_Array_Data.ipynb; Loading_Generic_Array_Data_evaluated.ipynb; Loading_Generic_Array_Data.py)