mirror of
https://github.com/AllenDowney/AstronomicalData.git
synced 2025-12-05 20:40:17 -08:00
943 lines
24 KiB
Plaintext
943 lines
24 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 4. Transformation and Selection\n",
|
|
"\n",
|
|
"In the previous lesson, we identified stars with the proper motion we expect for GD-1.\n",
|
|
"\n",
|
|
"Now we'll do the same selection in an ADQL query, which will make it possible to work with a larger region of the sky and still download less data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Outline\n",
|
|
"\n",
|
|
"Here are the steps in this lesson:\n",
|
|
"\n",
|
|
"1. Using data from the previous lesson, we'll identify the values of proper motion for stars likely to be in GD-1.\n",
|
|
"\n",
|
|
"2. Then we'll compose an ADQL query that selects stars based on proper motion, so we can download only the data we need.\n",
|
|
"\n",
|
|
"That will make it possible to search a bigger region of the sky in a single query.\n",
|
|
"We'll also see how to write the results to a CSV file.\n",
|
|
"\n",
|
|
"After completing this lesson, you should be able to\n",
|
|
"\n",
|
|
"* Transform proper motions from one frame to another.\n",
|
|
"\n",
|
|
"* Compute the convex hull of a set of points.\n",
|
|
"\n",
|
|
"* Write an ADQL query that selects based on proper motion.\n",
|
|
"\n",
|
|
"* Save data in CSV format."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"source": [
|
|
"## Installing libraries\n",
|
|
"\n",
|
|
"If you are running this notebook on Colab, you can run the following cell to install the libraries we'll use.\n",
|
|
"\n",
|
|
"If you are running this notebook on your own computer, you might have to install these libraries yourself. See the instructions in the preface."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 37,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# If we're running on Colab, install libraries\n",
|
|
"\n",
|
|
"import sys\n",
|
|
"IN_COLAB = 'google.colab' in sys.modules\n",
|
|
"\n",
|
|
"if IN_COLAB:\n",
|
|
" !pip install astroquery astro-gala"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Reload the data\n",
|
|
"\n",
|
|
"You can [download the data from the previous lesson](https://github.com/AllenDowney/AstronomicalData/raw/main/data/gd1_data.hdf) or run the following cell, which downloads it if necessary."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 38,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from os.path import basename, exists\n",
|
|
"\n",
|
|
"def download(url):\n",
|
|
" filename = basename(url)\n",
|
|
" if not exists(filename):\n",
|
|
" from urllib.request import urlretrieve\n",
|
|
" local, _ = urlretrieve(url, filename)\n",
|
|
" print('Downloaded ' + local)\n",
|
|
"\n",
|
|
"download('https://github.com/AllenDowney/AstronomicalData/raw/main/' +\n",
|
|
" 'data/gd1_data.hdf')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we can reload `centerline_df` and `selected_df`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 39,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pandas as pd\n",
|
|
"\n",
|
|
"filename = 'gd1_data.hdf'\n",
|
|
"centerline_df = pd.read_hdf(filename, 'centerline_df')\n",
|
|
"selected_df = pd.read_hdf(filename, 'selected_df')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Selection by proper motion\n",
|
|
"\n",
|
|
"Let's review how we got to this point.\n",
|
|
"\n",
|
|
"1. We made an ADQL query to the Gaia server to get data for stars in the vicinity of GD-1.\n",
|
|
"\n",
|
|
"2. We transformed the coordinates to the `GD1Koposov10` frame so we could select stars along the centerline of GD-1.\n",
|
|
"\n",
|
|
"3. We plotted the proper motion of the centerline stars to identify the bounds of the overdense region.\n",
|
|
"\n",
|
|
"4. We made a mask that selects stars whose proper motion is in the overdense region.\n",
|
|
"\n",
|
|
"At this point we have downloaded data for a relatively large number of stars (more than 100,000) and selected a relatively small number (around 1000).\n",
|
|
"\n",
|
|
"It would be more efficient to use ADQL to select only the stars we need. That would also make it possible to download data covering a larger region of the sky.\n",
|
|
"\n",
|
|
"However, the selection we did was based on proper motion in the `GD1Koposov10` frame. In order to do the same selection in ADQL, we have to work with proper motions in ICRS.\n",
|
|
"\n",
|
|
"As a reminder, here's the rectangle we selected based on proper motion in the `GD1Koposov10` frame."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 40,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pm1_min = -8.9\n",
|
|
"pm1_max = -6.9\n",
|
|
"pm2_min = -2.2\n",
|
|
"pm2_max = 1.0"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 41,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def make_rectangle(x1, x2, y1, y2):\n",
|
|
" \"\"\"Return the corners of a rectangle.\"\"\"\n",
|
|
" xs = [x1, x1, x2, x2, x1]\n",
|
|
" ys = [y1, y2, y2, y1, y1]\n",
|
|
" return xs, ys"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 42,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pm1_rect, pm2_rect = make_rectangle(\n",
|
|
" pm1_min, pm1_max, pm2_min, pm2_max)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Since we'll need to plot proper motion several times, we'll use the following function."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 43,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import matplotlib.pyplot as plt\n",
|
|
"\n",
|
|
"def plot_proper_motion(df):\n",
|
|
" \"\"\"Plot proper motion.\n",
|
|
" \n",
|
|
" df: DataFrame with `pm_phi1` and `pm_phi2`\n",
|
|
" \"\"\"\n",
|
|
" x = df['pm_phi1']\n",
|
|
" y = df['pm_phi2']\n",
|
|
" plt.plot(x, y, 'ko', markersize=0.3, alpha=0.3)\n",
|
|
"\n",
|
|
" plt.xlabel('Proper motion phi1 (GD1 frame)')\n",
|
|
" plt.ylabel('Proper motion phi2 (GD1 frame)')\n",
|
|
"\n",
|
|
" plt.xlim(-12, 8)\n",
|
|
" plt.ylim(-10, 10)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The following figure shows:\n",
|
|
"\n",
|
|
"* Proper motion for the stars we selected along the center line of GD-1,\n",
|
|
"\n",
|
|
"* The rectangle we selected, and\n",
|
|
"\n",
|
|
"* The stars inside the rectangle highlighted in green."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 44,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"plot_proper_motion(centerline_df)\n",
|
|
"\n",
|
|
"plt.plot(pm1_rect, pm2_rect)\n",
|
|
"\n",
|
|
"x = selected_df['pm_phi1']\n",
|
|
"y = selected_df['pm_phi2']\n",
|
|
"plt.plot(x, y, 'gx', markersize=0.3, alpha=0.3);"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we'll make the same plot using proper motions in the ICRS frame, which are stored in columns `pmra` and `pmdec`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 45,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"x = centerline_df['pmra']\n",
|
|
"y = centerline_df['pmdec']\n",
|
|
"plt.plot(x, y, 'ko', markersize=0.3, alpha=0.3)\n",
|
|
"\n",
|
|
"x = selected_df['pmra']\n",
|
|
"y = selected_df['pmdec']\n",
|
|
"plt.plot(x, y, 'gx', markersize=1, alpha=0.3)\n",
|
|
" \n",
|
|
"plt.xlabel('Proper motion ra (ICRS frame)')\n",
|
|
"plt.ylabel('Proper motion dec (ICRS frame)')\n",
|
|
"\n",
|
|
"plt.xlim([-10, 5])\n",
|
|
"plt.ylim([-20, 5]);"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The proper motions of the selected stars are more spread out in this frame, which is why it was preferable to do the selection in the GD-1 frame.\n",
|
|
"\n",
|
|
"But now we can define a polygon that encloses the proper motions of these stars in ICRS, and use that polygon as a selection criterion in an ADQL query."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Convex Hull\n",
|
|
"\n",
|
|
"SciPy provides a function that computes the [convex hull](https://en.wikipedia.org/wiki/Convex_hull) of a set of points, which is the smallest convex polygon that contains all of the points.\n",
|
|
"\n",
|
|
"To use it, we'll select columns `pmra` and `pmdec` and convert them to a NumPy array."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 46,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"\n",
|
|
"points = selected_df[['pmra','pmdec']].to_numpy()\n",
|
|
"points.shape"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"NOTE: If you are using an older version of Pandas, you might not have `to_numpy()`; you can use `values` instead, like this:\n",
|
|
"\n",
|
|
"```\n",
|
|
"points = selected_df[['pmra','pmdec']].values\n",
|
|
"\n",
|
|
"```\n",
|
|
"\n",
|
|
"We'll pass the points to `ConvexHull`, which returns an object that contains the results. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 47,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from scipy.spatial import ConvexHull\n",
|
|
"\n",
|
|
"hull = ConvexHull(points)\n",
|
|
"hull"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"`hull.vertices` contains the indices of the points that fall on the perimeter of the hull."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 48,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"hull.vertices"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can use them as an index into the original array to select the corresponding rows."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 49,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pm_vertices = points[hull.vertices]\n",
|
|
"pm_vertices"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To plot the resulting polygon, we have to pull out the x and y coordinates."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 50,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pmra_poly, pmdec_poly = np.transpose(pm_vertices)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This use of `transpose` is a useful NumPy idiom. Because `pm_vertices` has two columns, its [matrix transpose](https://en.wikipedia.org/wiki/Transpose) has two rows, which are assigned to the two variables `pmra_poly` and `pmdec_poly`.\n",
|
|
"\n",
|
|
"The following figure shows proper motion in ICRS again, along with the convex hull we just computed."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 51,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"x = centerline_df['pmra']\n",
|
|
"y = centerline_df['pmdec']\n",
|
|
"plt.plot(x, y, 'ko', markersize=0.3, alpha=0.3)\n",
|
|
"\n",
|
|
"x = selected_df['pmra']\n",
|
|
"y = selected_df['pmdec']\n",
|
|
"plt.plot(x, y, 'gx', markersize=0.3, alpha=0.3)\n",
|
|
"\n",
|
|
"plt.plot(pmra_poly, pmdec_poly)\n",
|
|
" \n",
|
|
"plt.xlabel('Proper motion phi1 (ICRS frame)')\n",
|
|
"plt.ylabel('Proper motion phi2 (ICRS frame)')\n",
|
|
"\n",
|
|
"plt.xlim([-10, 5])\n",
|
|
"plt.ylim([-20, 5]);"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"So `pm_vertices` represents the polygon we want to select.\n",
|
|
"The next step is to use it as part of an ADQL query."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Assembling the query\n",
|
|
"\n",
|
|
"In Lesson 2 we used the following query to select stars in a polygonal region."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 52,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"query5_base = \"\"\"SELECT\n",
|
|
"{columns}\n",
|
|
"FROM gaiadr2.gaia_source\n",
|
|
"WHERE parallax < 1\n",
|
|
" AND bp_rp BETWEEN -0.75 AND 2 \n",
|
|
" AND 1 = CONTAINS(POINT(ra, dec), \n",
|
|
" POLYGON({point_list}))\n",
|
|
"\"\"\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In this lesson we'll make two changes:\n",
|
|
"\n",
|
|
"1. We'll select stars with coordinates in a larger region.\n",
|
|
"\n",
|
|
"2. We'll add another clause to select stars whose proper motion is in the polygon we just computed, `pm_vertices`.\n",
|
|
"\n",
|
|
"Here are the coordinates of the larger rectangle in the GD-1 frame."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 53,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import astropy.units as u\n",
|
|
"\n",
|
|
"phi1_min = -70 * u.degree\n",
|
|
"phi1_max = -20 * u.degree\n",
|
|
"phi2_min = -5 * u.degree\n",
|
|
"phi2_max = 5 * u.degree"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We selected these bounds by trial and error, defining the largest region we can process in a single query."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 54,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"phi1_rect, phi2_rect = make_rectangle(\n",
|
|
" phi1_min, phi1_max, phi2_min, phi2_max)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here's how we transform it to ICRS, as we saw in Lesson 2."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 55,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from gala.coordinates import GD1Koposov10\n",
|
|
"from astropy.coordinates import SkyCoord\n",
|
|
"\n",
|
|
"gd1_frame = GD1Koposov10()\n",
|
|
"corners = SkyCoord(phi1=phi1_rect, \n",
|
|
" phi2=phi2_rect, \n",
|
|
" frame=gd1_frame)\n",
|
|
"\n",
|
|
"corners_icrs = corners.transform_to('icrs')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To use `corners_icrs` as part of an ADQL query, we have to convert it to a string. \n",
|
|
"Here's the function from Lesson 2 we used to do that."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 56,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def skycoord_to_string(skycoord):\n",
|
|
" \"\"\"Convert SkyCoord to string.\"\"\"\n",
|
|
" t = skycoord.to_string()\n",
|
|
" s = ' '.join(t)\n",
|
|
" return s.replace(' ', ', ')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 57,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"point_list = skycoord_to_string(corners_icrs)\n",
|
|
"point_list"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here are the columns we want to select."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 58,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"columns = 'source_id, ra, dec, pmra, pmdec'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we have everything we need to assemble the query.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 59,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"query5 = query5_base.format(columns=columns, \n",
|
|
" point_list=point_list)\n",
|
|
"print(query5)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"But don't try to run that query.\n",
|
|
"Because it selects a larger region, there are too many stars to handle in a single query.\n",
|
|
"Until we select by proper motion, that is."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Selecting proper motion\n",
|
|
"\n",
|
|
"Now we're ready to add a `WHERE` clause to select stars whose proper motion falls in the polygon defined by `pm_vertices`.\n",
|
|
"\n",
|
|
"To use `pm_vertices` as part of an ADQL query, we have to convert it to a string.\n",
|
|
"Using `flatten` and `array2string`, we can almost get the format we need."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 60,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = np.array2string(pm_vertices.flatten(), \n",
|
|
" max_line_width=1000,\n",
|
|
" separator=',')\n",
|
|
"s"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We just have to remove the brackets."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 61,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"pm_point_list = s.strip('[]')\n",
|
|
"pm_point_list"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Exercise\n",
|
|
"\n",
|
|
"Define `query6_base`, starting with `query5_base` and adding a new clause to select stars whose coordinates of proper motion, `pmra` and `pmdec`, fall within the polygon defined by `pm_point_list`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 62,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Solution goes here"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Exercise\n",
|
|
"\n",
|
|
"Use `format` to format `query6_base` and define `query6`, filling in the values of `columns`, `point_list`, and `pm_point_list`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 63,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Solution goes here"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we can run the query like this:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 64,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from astroquery.gaia import Gaia\n",
|
|
"\n",
|
|
"job = Gaia.launch_job_async(query6)\n",
|
|
"print(job)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"And get the results."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 65,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"candidate_table = job.get_results()\n",
|
|
"len(candidate_table)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We call the results `candidate_table` because it contains stars that are good candidates for GD-1.\n",
|
|
"\n",
|
|
"For the next lesson, we'll need `point_list` and `pm_point_list` again, so we should save them in a file.\n",
|
|
"There are several ways we could do that, but since we are already storing data in an HDF file, let's do the same with these variables.\n",
|
|
"\n",
|
|
"We've seen how to save a `DataFrame` in an HDF file.\n",
|
|
"We can do the same thing with a Pandas `Series`.\n",
|
|
"To make one, we'll start with a dictionary:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 66,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"d = dict(point_list=point_list, pm_point_list=pm_point_list)\n",
|
|
"d"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"And use it to initialize a `Series.`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 67,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"point_series = pd.Series(d)\n",
|
|
"point_series"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we can save it in the usual way."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 68,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"filename = 'gd1_data.hdf'\n",
|
|
"point_series.to_hdf(filename, 'point_series')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Plotting one more time\n",
|
|
"\n",
|
|
"Let's see what the results look like."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 69,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"x = candidate_table['ra']\n",
|
|
"y = candidate_table['dec']\n",
|
|
"plt.plot(x, y, 'ko', markersize=0.3, alpha=0.3)\n",
|
|
"\n",
|
|
"plt.xlabel('ra (degree ICRS)')\n",
|
|
"plt.ylabel('dec (degree ICRS)');"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here we can see why it was useful to transform these coordinates. In ICRS, it is more difficult to identity the stars near the centerline of GD-1.\n",
|
|
"\n",
|
|
"So let's transform the results back to the GD-1 frame.\n",
|
|
"Here's the code we used to transform the coordinates and make a Pandas `DataFrame`, wrapped in a function."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 70,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from gala.coordinates import reflex_correct\n",
|
|
"\n",
|
|
"def make_dataframe(table):\n",
|
|
" \"\"\"Transform coordinates from ICRS to GD-1 frame.\n",
|
|
" \n",
|
|
" table: Astropy Table\n",
|
|
" \n",
|
|
" returns: Pandas DataFrame\n",
|
|
" \"\"\"\n",
|
|
" skycoord = SkyCoord(\n",
|
|
" ra=table['ra'], \n",
|
|
" dec=table['dec'],\n",
|
|
" pm_ra_cosdec=table['pmra'],\n",
|
|
" pm_dec=table['pmdec'], \n",
|
|
" distance=8*u.kpc, \n",
|
|
" radial_velocity=0*u.km/u.s)\n",
|
|
"\n",
|
|
" gd1_frame = GD1Koposov10()\n",
|
|
" transformed = skycoord.transform_to(gd1_frame)\n",
|
|
" skycoord_gd1 = reflex_correct(transformed)\n",
|
|
"\n",
|
|
" df = table.to_pandas()\n",
|
|
" df['phi1'] = skycoord_gd1.phi1\n",
|
|
" df['phi2'] = skycoord_gd1.phi2\n",
|
|
" df['pm_phi1'] = skycoord_gd1.pm_phi1_cosphi2\n",
|
|
" df['pm_phi2'] = skycoord_gd1.pm_phi2\n",
|
|
" return df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here's how we use it:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 71,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"candidate_df = make_dataframe(candidate_table)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"And let's see the results."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 72,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"x = candidate_df['phi1']\n",
|
|
"y = candidate_df['phi2']\n",
|
|
"plt.plot(x, y, 'ko', markersize=0.5, alpha=0.5)\n",
|
|
"\n",
|
|
"plt.xlabel('phi1 (degree GD1)')\n",
|
|
"plt.ylabel('phi2 (degree GD1)');"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We're starting to see GD-1 more clearly.\n",
|
|
"We can compare this figure with this panel from Figure 1 from the original paper:\n",
|
|
"\n",
|
|
"<img height=\"150\" src=\"https://github.com/datacarpentry/astronomy-python/raw/gh-pages/fig/gd1-2.png\">\n",
|
|
"\n",
|
|
"This panel shows stars selected based on proper motion only, so it is comparable to our figure (although notice that it covers a wider region)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In the next lesson, we will use photometry data from Pan-STARRS to do a second round of filtering, and see if we can replicate this panel.\n",
|
|
"\n",
|
|
"<img height=\"150\" src=\"https://github.com/datacarpentry/astronomy-python/raw/gh-pages/fig/gd1-4.png\">\n",
|
|
"\n",
|
|
"Later we'll see how to add annotations like the ones in the figure and customize the style of the figure to present the results clearly and compellingly."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Summary\n",
|
|
"\n",
|
|
"In the previous lesson we downloaded data for a large number of stars and then selected a small fraction of them based on proper motion.\n",
|
|
"\n",
|
|
"In this lesson, we improved this process by writing a more complex query that uses the database to select stars based on proper motion. This process requires more computation on the Gaia server, but then we're able to either:\n",
|
|
"\n",
|
|
"1. Search the same region and download less data, or\n",
|
|
"\n",
|
|
"2. Search a larger region while still downloading a manageable amount of data.\n",
|
|
"\n",
|
|
"In the next lesson, we'll learn about the database `JOIN` operation and use it to download photometry data from Pan-STARRS."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Best practices\n",
|
|
"\n",
|
|
"* When possible, \"move the computation to the data\"; that is, do as much of the work as possible on the database server before downloading the data.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"celltoolbar": "Tags",
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.8"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|