{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "---\n", "title: \"Photometry\"\n", "teaching: 3000\n", "exercises: 0\n", "questions:\n", "\n", "- \"How do we use Matplotlib to select a polygon and Pandas to merge data from multiple tables?\"\n", "\n", "objectives:\n", "\n", "- \"Use Matplotlib to specify a polygon and determine which points fall inside it.\"\n", "\n", "- \"Use Pandas to merge data from multiple `DataFrames`, much like a database `JOIN` operation.\"\n", "\n", "keypoints:\n", "\n", "- \"If you want to perform something like a database `JOIN` operation with data that is in a Pandas `DataFrame`, you can use the `join` or `merge` function. In many cases, `merge` is easier to use because the arguments are more like SQL.\"\n", "\n", "- \"Use Matplotlib options to control the size and aspect ratio of figures to make them easier to interpret.\"\n", "\n", "- \"Matplotlib also provides operations for working with points, polygons, and other geometric entities, so it's not just for making figures.\"\n", "\n", "- \"Be sure to record every element of the data analysis pipeline that would be needed to replicate the results.\"\n", "\n", "---\n", "\n", "{% include links.md %}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Photometry\n", "\n", "This is the sixth in a series of notebooks related to astronomy data.\n", "\n", "As a continuing example, we will replicate part of the analysis in a recent paper, \"[Off the beaten path: Gaia reveals GD-1 stars outside of the main stream](https://arxiv.org/abs/1805.00425)\" by Adrian M. Price-Whelan and Ana Bonaca.\n", "\n", "In the previous lesson we downloaded photometry data from Pan-STARRS, which is available from the same server we've been using to get Gaia data. \n", "\n", "The next step in the analysis is to select candidate stars based on the photometry data. \n", "The following figure from the paper is a color-magnitude diagram showing the stars we previously selected based on proper motion:\n", "\n", "\n", "\n", "In red is a theoretical isochrone, showing where we expect the stars in GD-1 to fall based on the metallicity and age of their original globular cluster. \n", "\n", "By selecting stars in the shaded area, we can further distinguish the main sequence of GD-1 from mostly younger background stars." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outline\n", "\n", "Here are the steps in this notebook:\n", "\n", "1. We'll reload the data from the previous notebook and make a color-magnitude diagram.\n", "\n", "2. We'll use an isochrone computed by MIST to specify a polygonal region in the color-magnitude diagram and select the stars inside it.\n", "\n", "3. Then we'll merge the photometry data with the list of candidate stars, storing the result in a Pandas `DataFrame`.\n", "\n", "After completing this lesson, you should be able to\n", "\n", "* Use Matplotlib to specify a `Polygon` and determine which points fall inside it.\n", "\n", "* Use Pandas to merge data from multiple `DataFrames`, much like a database `JOIN` operation." ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "## Installing libraries\n", "\n", "If you are running this notebook on Colab, you can run the following cell to install Astroquery and the other libraries we'll use.\n", "\n", "If you are running this notebook on your own computer, you might have to install these libraries yourself. See the instructions in the preface." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# If we're running on Colab, install libraries\n", "\n", "import sys\n", "IN_COLAB = 'google.colab' in sys.modules\n", "\n", "if IN_COLAB:\n", " !pip install astroquery astro-gala wget" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reload the data\n", "\n", "The following cell downloads the photometry data we created in the previous notebook." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "from wget import download\n", "\n", "filename = 'gd1_photo.fits'\n", "filepath = 'https://github.com/AllenDowney/AstronomicalData/raw/main/data/'\n", "\n", "if not os.path.exists(filename):\n", " print(download(filepath+filename))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can read the data back into an Astropy `Table`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from astropy.table import Table\n", "\n", "photo_table = Table.read(filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting photometry data\n", "\n", "Now that we have photometry data from Pan-STARRS, we can replicate the [color-magnitude diagram](https://en.wikipedia.org/wiki/Galaxy_color%E2%80%93magnitude_diagram) from the original paper:\n", "\n", "\n", "\n", "The y-axis shows the apparent magnitude of each source with the [g filter](https://en.wikipedia.org/wiki/Photometric_system).\n", "\n", "The x-axis shows the difference in apparent magnitude between the g and i filters, which indicates color.\n", "\n", "Stars with lower values of (g-i) are brighter in g-band than in i-band, compared to other stars, which means they are bluer.\n", "\n", "Stars in the lower-left quadrant of this diagram are less bright and less metallic than the others, which means they are [likely to be older](http://spiff.rit.edu/classes/ladder/lectures/ordinary_stars/ordinary.html).\n", "\n", "Since we expect the stars in GD-1 to be older than the background stars, the stars in the lower-left are more likely to be in GD-1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function takes a table containing photometry data and draws a color-magnitude diagram.\n", "The input can be an Astropy `Table` or Pandas `DataFrame`, as long as it has columns named `g_mean_psf_mag` and `i_mean_psf_mag`.\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "def plot_cmd(table):\n", " \"\"\"Plot a color magnitude diagram.\n", " \n", " table: Table or DataFrame with photometry data\n", " \"\"\"\n", " y = table['g_mean_psf_mag']\n", " x = table['g_mean_psf_mag'] - table['i_mean_psf_mag']\n", "\n", " plt.plot(x, y, 'ko', markersize=0.3, alpha=0.3)\n", "\n", " plt.xlim([0, 1.5])\n", " plt.ylim([14, 22])\n", " plt.gca().invert_yaxis()\n", "\n", " plt.ylabel('$g_0$')\n", " plt.xlabel('$(g-i)_0$')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`plot_cmd` uses a new function, `invert_yaxis`, to invert the `y` axis, which is conventional when plotting magnitudes, since lower magnitude indicates higher brightness.\n", "\n", "`invert_yaxis` is a little different from the other functions we've used. You can't call it like this:\n", "\n", "```\n", "plt.invert_yaxis() # doesn't work\n", "```\n", "\n", "You have to call it like this:\n", "\n", "```\n", "plt.gca().invert_yaxis() # works\n", "```\n", "\n", "`gca` stands for \"get current axis\". It returns an object that represents the axes of the current figure, and that object provides `invert_yaxis`.\n", "\n", "**In case anyone asks:** The most likely reason for this inconsistency in the interface is that `invert_yaxis` is a lesser-used function, so it's not made available at the top level of the interface." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's what the results look like." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_cmd(photo_table)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our figure does not look exactly like the one in the paper because we are working with a smaller region of the sky, so we don't have as many stars. But we can see an overdense region in the lower left that contains stars with the photometry we expect for GD-1.\n", "\n", "In the next section we'll use an isochrone to specify a polygon that contains this overdense regioin." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Isochrone\n", "\n", "Based on our best estimates for the ages of the stars in GD-1 and their metallicity, we can compute a [stellar isochrone](https://en.wikipedia.org/wiki/Stellar_isochrone) that predicts the relationship between their magnitude and color.\n", "\n", "In fact, we can use [MESA Isochrones & Stellar Tracks](http://waps.cfa.harvard.edu/MIST/) (MIST) to compute it for us.\n", "\n", "Using the [MIST Version 1.2 web interface](http://waps.cfa.harvard.edu/MIST/interp_isos.html), we computed an isochrone with the following parameters:\n", " \n", "* Rotation initial v/v_crit = 0.4\n", "\n", "* Single age, linear scale = 12e9\n", "\n", "* Composition [Fe/H] = -1.35\n", "\n", "* Synthetic Photometry, PanStarrs\n", "\n", "* Extinction av = 0\n", "\n", "The following cell downloads the results:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import os\n", "from wget import download\n", "\n", "filename = 'MIST_iso_5fd2532653c27.iso.cmd'\n", "filepath = 'https://github.com/AllenDowney/AstronomicalData/raw/main/data/'\n", "\n", "if not os.path.exists(filename):\n", " print(download(filepath+filename))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To read this file we'll download a Python module [from this repository](https://github.com/jieunchoi/MIST_codes)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import os\n", "from wget import download\n", "\n", "filename = 'read_mist_models.py'\n", "filepath = 'https://github.com/jieunchoi/MIST_codes/raw/master/scripts/'\n", "\n", "if not os.path.exists(filename):\n", " print(download(filepath+filename))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can read the file:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reading in: MIST_iso_5fd2532653c27.iso.cmd\n" ] } ], "source": [ "import read_mist_models\n", "\n", "filename = 'MIST_iso_5fd2532653c27.iso.cmd'\n", "iso = read_mist_models.ISOCMD(filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is an `ISOCMD` object." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "read_mist_models.ISOCMD" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(iso)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It contains a list of arrays, one for each isochrone." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(iso.isocmds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We only got one isochrone." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(iso.isocmds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So we can select it like this:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "iso_array = iso.isocmds[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's a NumPy array:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(iso_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But it's an unusual NumPy array, because it contains names for the columns." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype([('EEP', '= 0) & (iso_array['phase'] < 3)\n", "phase_mask.sum()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "354" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "main_sequence = iso_array[phase_mask]\n", "len(main_sequence)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The other two columns we'll use are `PS_g` and `PS_i`, which contain simulated photometry data for stars with the given age and metallicity, based on a model of the Pan-STARRS sensors.\n", "\n", "We'll use these columns to superimpose the isochrone on the color-magnitude diagram, but first we have to use a [distance modulus](https://en.wikipedia.org/wiki/Distance_modulus) to scale the isochrone based on the estimated distance of GD-1.\n", "\n", "We can use the `Distance` object from Astropy to compute the distance modulus." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "14.4604730134524" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import astropy.coordinates as coord\n", "import astropy.units as u\n", "\n", "distance = 7.8 * u.kpc\n", "distmod = coord.Distance(distance).distmod.value\n", "distmod" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compute the scaled magnitude and color of the isochrone." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "g = main_sequence['PS_g'] + distmod\n", "gi = main_sequence['PS_g'] - main_sequence['PS_i']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make this data easier to work with, we'll put it in a Pandas `Series` with that contains `gi` as the index and `g` as the values." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.195021 28.294743\n", "2.166076 28.189718\n", "2.129312 28.051761\n", "2.093721 27.916194\n", "2.058585 27.780024\n", "dtype: float64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "iso_series = pd.Series(g, index=gi)\n", "iso_series.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can plot it on the color-magnitude diagram like this." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_cmd(photo_table)\n", "iso_series.plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The theoretical isochrone passes through the overdense region where we expect to find stars in GD-1.\n", "\n", "Let's save this result so we can reload it later without repeating the steps in this section." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "filename = 'gd1_isochrone.hdf5'\n", "\n", "iso_series.to_hdf(filename, 'iso_series')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making a polygon\n", "\n", "The following cell downloads the isochrone series we made in the previous section, if necessary." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import os\n", "from wget import download\n", "\n", "filename = 'gd1_isochrone.hdf5'\n", "filepath = 'https://github.com/AllenDowney/AstronomicalData/raw/main/data/'\n", "\n", "if not os.path.exists(filename):\n", " print(download(filepath+filename))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can read the isochrone back in." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.195021 28.294743\n", "2.166076 28.189718\n", "2.129312 28.051761\n", "2.093721 27.916194\n", "2.058585 27.780024\n", "dtype: float64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iso_series = pd.read_hdf(filename, 'iso_series')\n", "iso_series.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To select the stars in the overdense region of the color-magnitude diagram, we want to stretch the isochrone into a polygon.\n", "\n", "We'll use the following formulas to compute the left and right sides of the polygons." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "g = iso_series.to_numpy()\n", "gi = iso_series.index" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "left_gi = gi - 0.4 * (g/28)**5\n", "right_gi = gi + 0.7 * (g/28)**5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To explain the terms:\n", "\n", "* We divide magnitudes by 28 to normalize them onto the range from 0 to 1.\n", "\n", "* Raising the normalized magnitudes to the 5th power [DOES WHAT?]\n", "\n", "* Then we add and subtract the result from `gi` to shift the isochrone left and right. The factors 0.4 and 0.7 were chosen by eye to enclose the overdense region." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make the shifted isochrones easier to work with, we'll put them in a Pandas `Series` with that contains both `g` and the scaled values of `gi`." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.773520 28.294743\n", "1.752340 28.189718\n", "1.725601 28.051761\n", "1.699671 27.916194\n", "1.674053 27.780024\n", "dtype: float64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "left_series = pd.Series(g, index=left_gi)\n", "left_series.head()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.932648 28.294743\n", "2.890114 28.189718\n", "2.835806 28.051761\n", "2.783308 27.916194\n", "2.731517 27.780024\n", "dtype: float64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "right_series = pd.Series(g, index=right_gi)\n", "right_series.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can plot them on the color-magnitude diagram like this." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_cmd(photo_table)\n", "left_series.plot()\n", "right_series.plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like the scaled isochrones bound the overdense area well, but they also include stars with magnitudes higher than we expect for stars in GD-1, so we'll use another mask to limit the range of `g`." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "117" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g_mask = (g > 18.0) & (g < 21.5)\n", "g_mask.sum()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(117, 117)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "left = left_series[g_mask]\n", "right = right_series[g_mask]\n", "\n", "len(left), len(right)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's what they look like:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_cmd(photo_table)\n", "left.plot()\n", "right.plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to assemble the two halves into a polygon. We can use `append` to make a new `Series` that contains both halves.\n", "\n", "And we'll use the slice `[::-1]` to reverse the elements of `right` so the result forms a loop. [See here for an explanation of this idiom](https://stackoverflow.com/questions/5876998/reversing-a-list-using-slice-notation)." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.587571 21.411746\n", "0.567801 21.322466\n", "0.548134 21.233380\n", "0.528693 21.144427\n", "0.509300 21.054549\n", "dtype: float64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loop = left.append(right[::-1])\n", "loop.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following lines add metadata by assigning names to the values and the index in `loop`." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "gi\n", "0.587571 21.411746\n", "0.567801 21.322466\n", "0.548134 21.233380\n", "0.528693 21.144427\n", "0.509300 21.054549\n", "Name: g, dtype: float64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loop.name = 'g'\n", "loop.index.name = 'gi'\n", "loop.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's what it looks like" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "loop.plot()\n", "plot_cmd(photo_table)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we'll use this polygon to identify stars in the overdense region." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Which points are in the polygon?\n", "\n", "Matplotlib provides a `Path` object that we can use to check which points fall in the polygon we just constructed.\n", "\n", "To make a `Path`, we need a list of coordinates in the form of an array with two columns.\n", "\n", "Currently `loop` is a `Series` with the values of `gi` in the index:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "gi\n", "0.587571 21.411746\n", "0.567801 21.322466\n", "0.548134 21.233380\n", "0.528693 21.144427\n", "0.509300 21.054549\n", "Name: g, dtype: float64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loop.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can move them out of the index into a column using `reset_index`:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gig
00.58757121.411746
10.56780121.322466
20.54813421.233380
30.52869321.144427
40.50930021.054549
\n", "
" ], "text/plain": [ " gi g\n", "0 0.587571 21.411746\n", "1 0.567801 21.322466\n", "2 0.548134 21.233380\n", "3 0.528693 21.144427\n", "4 0.509300 21.054549" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loop_df = loop.reset_index()\n", "loop_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is a `DataFrame` with one column for `gi` and one column for `g`, so we can pass it to `Path` like this:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Path(array([[ 0.58757135, 21.41174601],\n", " [ 0.56780097, 21.32246601],\n", " [ 0.54813409, 21.23338001],\n", " [ 0.5286928 , 21.14442701],\n", " [ 0.50929987, 21.05454901],\n", " [ 0.48991266, 20.96383501],\n", " [ 0.47084777, 20.87386601],\n", " [ 0.45222635, 20.78511001],\n", " [ 0.43438902, 20.69865301],\n", " [ 0.42745198, 20.66469601],\n", " [ 0.42067029, 20.63135301],\n", " [ 0.41402867, 20.59850601],\n", " [ 0.40738016, 20.56529901],\n", " [ 0.40088387, 20.53264001],\n", " [ 0.39449608, 20.50023501],\n", " [ 0.38843797, 20.46871801],\n", " [ 0.38251577, 20.43765101],\n", " [ 0.3766547 , 20.40653701],\n", " [ 0.37088531, 20.37564701],\n", " [ 0.36522325, 20.34505401],\n", " [ 0.35962415, 20.31443001],\n", " [ 0.35413292, 20.28413501],\n", " [ 0.34871894, 20.25390101],\n", " [ 0.34339273, 20.22385701],\n", " [ 0.33815825, 20.19395801],\n", " [ 0.33305724, 20.16427301],\n", " [ 0.32820637, 20.13508501],\n", " [ 0.32348139, 20.10604901],\n", " [ 0.31883343, 20.07716101],\n", " [ 0.31425423, 20.04833101],\n", " [ 0.30974976, 20.01961701],\n", " [ 0.30531997, 19.99097001],\n", " [ 0.30097354, 19.96246401],\n", " [ 0.29669999, 19.93401801],\n", " [ 0.29250157, 19.90573101],\n", " [ 0.28837983, 19.87746501],\n", " [ 0.28441584, 19.84955001],\n", " [ 0.28065057, 19.82188301],\n", " [ 0.27700644, 19.79450101],\n", " [ 0.27342328, 19.76713801],\n", " [ 0.26989305, 19.73985301],\n", " [ 0.26641258, 19.71265801],\n", " [ 0.26298257, 19.68540001],\n", " [ 0.25960216, 19.65824401],\n", " [ 0.2562733 , 19.63113701],\n", " [ 0.25299978, 19.60409301],\n", " [ 0.24977307, 19.57714401],\n", " [ 0.24660506, 19.55024001],\n", " [ 0.24348829, 19.52341001],\n", " [ 0.24042159, 19.49666601],\n", " [ 0.23741737, 19.46998501],\n", " [ 0.23447423, 19.44339301],\n", " [ 0.23158726, 19.41688701],\n", " [ 0.22876474, 19.39045101],\n", " [ 0.22600432, 19.36410901],\n", " [ 0.22330395, 19.33786601],\n", " [ 0.220663 , 19.31170101],\n", " [ 0.21808571, 19.28560101],\n", " [ 0.21557456, 19.25960101],\n", " [ 0.21312279, 19.23368701],\n", " [ 0.21073349, 19.20785601],\n", " [ 0.20840975, 19.18210401],\n", " [ 0.20614799, 19.15640601],\n", " [ 0.20395119, 19.13076401],\n", " [ 0.20182156, 19.10523201],\n", " [ 0.19975572, 19.07977101],\n", " [ 0.19775195, 19.05436401],\n", " [ 0.19581903, 19.02902801],\n", " [ 0.19395701, 19.00376101],\n", " [ 0.19216276, 18.97857301],\n", " [ 0.19044513, 18.95347601],\n", " [ 0.1888007 , 18.92850001],\n", " [ 0.18723796, 18.90368201],\n", " [ 0.18576648, 18.87905401],\n", " [ 0.18438763, 18.85466301],\n", " [ 0.18310871, 18.83056001],\n", " [ 0.18193706, 18.80672701],\n", " [ 0.18087817, 18.78327401],\n", " [ 0.17993184, 18.76015001],\n", " [ 0.17910244, 18.73740501],\n", " [ 0.17838817, 18.71496101],\n", " [ 0.17779005, 18.69282101],\n", " [ 0.177312 , 18.67099501],\n", " [ 0.17694971, 18.64944001],\n", " [ 0.1767112 , 18.62815801],\n", " [ 0.17659065, 18.60714001],\n", " [ 0.17658939, 18.58636601],\n", " [ 0.17671618, 18.56585701],\n", " [ 0.17696696, 18.54562201],\n", " [ 0.17733781, 18.52565801],\n", " [ 0.1778346 , 18.50597901],\n", " [ 0.17846661, 18.48656801],\n", " [ 0.17922891, 18.46742401],\n", " [ 0.18012796, 18.44859001],\n", " [ 0.18116197, 18.43005501],\n", " [ 0.18233604, 18.41181501],\n", " [ 0.18363223, 18.39379401],\n", " [ 0.18506009, 18.37602901],\n", " [ 0.18660932, 18.35862101],\n", " [ 0.18829849, 18.34153201],\n", " [ 0.19012805, 18.32480701],\n", " [ 0.19210919, 18.30851301],\n", " [ 0.19422686, 18.29250401],\n", " [ 0.1964951 , 18.27685701],\n", " [ 0.19890209, 18.26156301],\n", " [ 0.20145338, 18.24666001],\n", " [ 0.20417715, 18.23260501],\n", " [ 0.20705285, 18.21898101],\n", " [ 0.21005661, 18.20562501],\n", " [ 0.21319339, 18.19254201],\n", " [ 0.22126873, 18.16185301],\n", " [ 0.2300065 , 18.13259301],\n", " [ 0.23950909, 18.10508001],\n", " [ 0.24974677, 18.07932501],\n", " [ 0.26066153, 18.05527801],\n", " [ 0.27224553, 18.03295501],\n", " [ 0.28447607, 18.01227601],\n", " [ 0.40566013, 18.01227601],\n", " [ 0.39412682, 18.03295501],\n", " [ 0.38329907, 18.05527801],\n", " [ 0.37320316, 18.07932501],\n", " [ 0.36384734, 18.10508001],\n", " [ 0.35529237, 18.13259301],\n", " [ 0.34756872, 18.16185301],\n", " [ 0.34056407, 18.19254201],\n", " [ 0.33788593, 18.20562501],\n", " [ 0.33535176, 18.21898101],\n", " [ 0.33295648, 18.23260501],\n", " [ 0.33072983, 18.24666001],\n", " [ 0.32870734, 18.26156301],\n", " [ 0.32684482, 18.27685701],\n", " [ 0.3251355 , 18.29250401],\n", " [ 0.32359167, 18.30851301],\n", " [ 0.32219665, 18.32480701],\n", " [ 0.32097089, 18.34153201],\n", " [ 0.31990093, 18.35862101],\n", " [ 0.31898485, 18.37602901],\n", " [ 0.3182056 , 18.39379401],\n", " [ 0.31756993, 18.41181501],\n", " [ 0.31706705, 18.43005501],\n", " [ 0.31671781, 18.44859001],\n", " [ 0.3165174 , 18.46742401],\n", " [ 0.31646817, 18.48656801],\n", " [ 0.3165622 , 18.50597901],\n", " [ 0.31680458, 18.52565801],\n", " [ 0.31718682, 18.54562201],\n", " [ 0.31770268, 18.56585701],\n", " [ 0.31835632, 18.58636601],\n", " [ 0.31915162, 18.60714001],\n", " [ 0.32007915, 18.62815801],\n", " [ 0.3211385 , 18.64944001],\n", " [ 0.32233599, 18.67099501],\n", " [ 0.32366367, 18.69282101],\n", " [ 0.32512771, 18.71496101],\n", " [ 0.32672398, 18.73740501],\n", " [ 0.32845154, 18.76015001],\n", " [ 0.33031546, 18.78327401],\n", " [ 0.33230964, 18.80672701],\n", " [ 0.33443651, 18.83056001],\n", " [ 0.3366864 , 18.85466301],\n", " [ 0.3390529 , 18.87905401],\n", " [ 0.34152681, 18.90368201],\n", " [ 0.34410502, 18.92850001],\n", " [ 0.34677677, 18.95347601],\n", " [ 0.34953217, 18.97857301],\n", " [ 0.35237348, 19.00376101],\n", " [ 0.35529144, 19.02902801],\n", " [ 0.35828883, 19.05436401],\n", " [ 0.36136575, 19.07977101],\n", " [ 0.36451277, 19.10523201],\n", " [ 0.36773241, 19.13076401],\n", " [ 0.37102978, 19.15640601],\n", " [ 0.37440044, 19.18210401],\n", " [ 0.37784139, 19.20785601],\n", " [ 0.38135736, 19.23368701],\n", " [ 0.38494552, 19.25960101],\n", " [ 0.388603 , 19.28560101],\n", " [ 0.39233725, 19.31170101],\n", " [ 0.39614435, 19.33786601],\n", " [ 0.40002069, 19.36410901],\n", " [ 0.40396796, 19.39045101],\n", " [ 0.40798805, 19.41688701],\n", " [ 0.41208235, 19.44339301],\n", " [ 0.41624335, 19.46998501],\n", " [ 0.42047622, 19.49666601],\n", " [ 0.42478124, 19.52341001],\n", " [ 0.42914714, 19.55024001],\n", " [ 0.43357463, 19.57714401],\n", " [ 0.43806989, 19.60409301],\n", " [ 0.44262347, 19.63113701],\n", " [ 0.44724247, 19.65824401],\n", " [ 0.4519225 , 19.68540001],\n", " [ 0.45666424, 19.71265801],\n", " [ 0.46146067, 19.73985301],\n", " [ 0.46631851, 19.76713801],\n", " [ 0.47124047, 19.79450101],\n", " [ 0.47623175, 19.82188301],\n", " [ 0.48136578, 19.84955001],\n", " [ 0.48671855, 19.87746501],\n", " [ 0.49225451, 19.90573101],\n", " [ 0.49787627, 19.93401801],\n", " [ 0.50358931, 19.96246401],\n", " [ 0.50938655, 19.99097001],\n", " [ 0.51528266, 20.01961701],\n", " [ 0.52126534, 20.04833101],\n", " [ 0.52733726, 20.07716101],\n", " [ 0.53348957, 20.10604901],\n", " [ 0.53973535, 20.13508501],\n", " [ 0.54612384, 20.16427301],\n", " [ 0.55279781, 20.19395801],\n", " [ 0.55962597, 20.22385701],\n", " [ 0.56656311, 20.25390101],\n", " [ 0.57360789, 20.28413501],\n", " [ 0.58074299, 20.31443001],\n", " [ 0.5880138 , 20.34505401],\n", " [ 0.59535596, 20.37564701],\n", " [ 0.60283203, 20.40653701],\n", " [ 0.61042265, 20.43765101],\n", " [ 0.61808231, 20.46871801],\n", " [ 0.62591386, 20.50023501],\n", " [ 0.63413647, 20.53264001],\n", " [ 0.64249372, 20.56529901],\n", " [ 0.65104657, 20.59850601],\n", " [ 0.659584 , 20.63135301],\n", " [ 0.66830253, 20.66469601],\n", " [ 0.67722496, 20.69865301],\n", " [ 0.70017638, 20.78511001],\n", " [ 0.72413715, 20.87386601],\n", " [ 0.74870785, 20.96383501],\n", " [ 0.77374297, 21.05454901],\n", " [ 0.7988286 , 21.14442701],\n", " [ 0.8240001 , 21.23338001],\n", " [ 0.84950281, 21.32246601],\n", " [ 0.8752204 , 21.41174601]]), None)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from matplotlib.path import Path\n", "\n", "path = Path(loop_df)\n", "path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is a `Path` object that represents the polygon.\n", "\n", "`Path` provides `contains_points`, which figures out which points are inside the polygon.\n", "\n", "To test it, we'll create a list with two points, one inside the polygon and one outside." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "points = [(0.4, 20), \n", " (0.4, 16)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can make sure `contains_points` does what we expect." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ True, False])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inside = path.contains_points(points)\n", "inside" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is an array of Boolean values.\n", "\n", "We are almost ready to select stars whose photometry data falls in this polygon. But first we need to do some data cleaning." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reloading the data\n", "\n", "Now we need to combine the photometry data with the list of candidate stars we identified in a previous notebook. The following cell downloads it:\n", "\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "import os\n", "from wget import download\n", "\n", "filename = 'gd1_candidates.hdf5'\n", "filepath = 'https://github.com/AllenDowney/AstronomicalData/raw/main/data/'\n", "\n", "if not os.path.exists(filename):\n", " print(download(filepath+filename))" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "candidate_df = pd.read_hdf(filename, 'candidate_df')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`candidate_df` is the Pandas DataFrame that contains the results from Lesson 4, which selects stars likely to be in GD-1 based on proper motion. It also includes position and proper motion transformed to the ICRS frame." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merging photometry data\n", "\n", "Before we select stars based on photometry data, we have to solve two problems:\n", "\n", "1. We only have Pan-STARRS data for some stars in `candidate_df`.\n", "\n", "2. Even for the stars where we have Pan-STARRS data in `photo_table`, some photometry data is missing.\n", "\n", "We will solve these problems in two step:\n", "\n", "1. We'll merge the data from `candidate_df` and `photo_table` into a single Pandas `DataFrame`.\n", "\n", "2. We'll use Pandas functions to deal with missing data.\n", "\n", "`candidate_df` is already a `DataFrame`, but `results` is an Astropy `Table`. Let's convert it to Pandas:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "source_id\n", "g_mean_psf_mag\n", "i_mean_psf_mag\n" ] } ], "source": [ "photo_df = photo_table.to_pandas()\n", "\n", "for colname in photo_df.columns:\n", " print(colname)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to combine `candidate_df` and `photo_df` into a single table, using `source_id` to match up the rows.\n", "\n", "You might recognize this task; it's the same as the JOIN operation in ADQL/SQL.\n", "\n", "Pandas provides a function called `merge` that does what we want. Here's how we use it." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
source_idradecpmrapmdecparallaxradial_velocityphi1phi2pm_phi1pm_phi2g_mean_psf_magi_mean_psf_mag
0635559124339440000137.58671719.196544-3.770522-12.4904820.791393NaN-59.630489-1.216485-7.361363-0.592633NaNNaN
1635860218726658176138.51870719.092339-5.941679-11.3464090.307456NaN-59.247330-2.016078-7.5271261.74877917.897817.517401
2635674126383965568138.84287419.031798-3.897001-12.7027800.779463NaN-59.133391-2.306901-7.560608-0.74180019.287317.678101
3635535454774983040137.83775218.864007-4.335041-14.4923090.314514NaN-59.785300-1.594569-9.357536-1.21849216.923816.478100
4635497276810313600138.04451619.009471-7.172931-12.2914990.425404NaN-59.557744-1.682147-9.0008312.33440719.924218.334000
\n", "
" ], "text/plain": [ " source_id ra dec pmra pmdec parallax \\\n", "0 635559124339440000 137.586717 19.196544 -3.770522 -12.490482 0.791393 \n", "1 635860218726658176 138.518707 19.092339 -5.941679 -11.346409 0.307456 \n", "2 635674126383965568 138.842874 19.031798 -3.897001 -12.702780 0.779463 \n", "3 635535454774983040 137.837752 18.864007 -4.335041 -14.492309 0.314514 \n", "4 635497276810313600 138.044516 19.009471 -7.172931 -12.291499 0.425404 \n", "\n", " radial_velocity phi1 phi2 pm_phi1 pm_phi2 g_mean_psf_mag \\\n", "0 NaN -59.630489 -1.216485 -7.361363 -0.592633 NaN \n", "1 NaN -59.247330 -2.016078 -7.527126 1.748779 17.8978 \n", "2 NaN -59.133391 -2.306901 -7.560608 -0.741800 19.2873 \n", "3 NaN -59.785300 -1.594569 -9.357536 -1.218492 16.9238 \n", "4 NaN -59.557744 -1.682147 -9.000831 2.334407 19.9242 \n", "\n", " i_mean_psf_mag \n", "0 NaN \n", "1 17.517401 \n", "2 17.678101 \n", "3 16.478100 \n", "4 18.334000 " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged = pd.merge(candidate_df, \n", " photo_df, \n", " on='source_id', \n", " how='left')\n", "merged.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first argument is the \"left\" table, the second argument is the \"right\" table, and the keyword argument `on='source_id'` specifies a column to use to match up the rows.\n", "\n", "The argument `how='left'` means that the result should have all rows from the left table, even if some of them don't match up with a row in the right table.\n", "\n", "If you are interested in the other options for `how`, you can [read the documentation of `merge`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html).\n", "\n", "You can also do different types of join in ADQL/SQL; [you can read about that here](https://www.w3schools.com/sql/sql_join.asp).\n", "\n", "The result is a `DataFrame` that contains the same number of rows as `candidate_df`. " ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7346, 3724, 7346)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(candidate_df), len(photo_df), len(merged)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And all columns from both tables." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "source_id\n", "ra\n", "dec\n", "pmra\n", "pmdec\n", "parallax\n", "radial_velocity\n", "phi1\n", "phi2\n", "pm_phi1\n", "pm_phi2\n", "g_mean_psf_mag\n", "i_mean_psf_mag\n" ] } ], "source": [ "for colname in merged.columns:\n", " print(colname)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Detail** You might notice that Pandas also provides a function called `join`; it does almost the same thing, but the interface is slightly different. We think `merge` is a little easier to use, so that's what we chose. It's also more consistent with JOIN in SQL, so if you learn how to use `pd.merge`, you are also learning how to use SQL JOIN.\n", "\n", "Also, someone might ask why we have to use Pandas to do this join; why didn't we do it in ADQL. The answer is that we could have done that, but since we already have the data we need, we should probably do the computation locally rather than make another round trip to the Gaia server." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Missing data\n", "\n", "Let's add columns to the merged table for magnitude and color." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "merged['mag'] = merged['g_mean_psf_mag']\n", "merged['color'] = merged['g_mean_psf_mag'] - merged['i_mean_psf_mag']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These columns contain the special value `NaN` where we are missing data.\n", "\n", "We can use `notnull` to see which rows contain value data, that is, not null values." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 True\n", "2 True\n", "3 True\n", "4 True\n", " ... \n", "7341 True\n", "7342 False\n", "7343 False\n", "7344 True\n", "7345 False\n", "Name: color, Length: 7346, dtype: bool" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged['color'].notnull()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And `sum` to count the number of valid values." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3724" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged['color'].notnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For scientific purposes, it's not obvious what we should do with candidate stars if we don't have photometry data. Should we give them the benefit of the doubt or leave them out?\n", "\n", "In part the answer depends on the goal: are we trying to identify more stars that might be in GD-1, or a smaller set of stars that have higher probability?\n", "\n", "In the next section, we'll leave them out, but you can experiment with the alternative." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting based on photometry\n", "\n", "Now let's see how many of these points are inside the polygon we chose.\n", "\n", "We can use a list of column names to select `color` and `mag`." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
colormag
0NaNNaN
10.380417.8978
21.609219.2873
30.445716.9238
41.590219.9242
\n", "
" ], "text/plain": [ " color mag\n", "0 NaN NaN\n", "1 0.3804 17.8978\n", "2 1.6092 19.2873\n", "3 0.4457 16.9238\n", "4 1.5902 19.9242" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "points = merged[['color', 'mag']]\n", "points.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is a `DataFrame` that can be treated as a sequence of coordinates, so we can pass it to `contains_points`:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, ..., False, False, False])" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inside = path.contains_points(points)\n", "inside" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is a Boolean array. We can use `sum` to see how many stars fall in the polygon." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "464" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inside.sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use `inside` as a mask to select stars that fall inside the polygon." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "selected2 = merged[inside]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's make a color-magnitude plot one more time, highlighting the selected stars with green `x` marks." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_cmd(photo_table)\n", "plt.plot(gi, g)\n", "loop.plot()\n", "\n", "plt.plot(selected2['color'], selected2['mag'], 'g.');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like the selected stars are, in fact, inside the polygon, which means they have photometry data consistent with GD-1.\n", "\n", "Finally, we can plot the coordinates of the selected stars:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(10,2.5))\n", "\n", "x = selected2['phi1']\n", "y = selected2['phi2']\n", "\n", "plt.plot(x, y, 'ko', markersize=0.7, alpha=0.9)\n", "\n", "plt.xlabel('ra (degree GD1)')\n", "plt.ylabel('dec (degree GD1)')\n", "\n", "plt.axis('equal');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example includes two new Matplotlib commands:\n", "\n", "* `figure` creates the figure. In previous examples, we didn't have to use this function; the figure was created automatically. But when we call it explicitly, we can provide arguments like `figsize`, which sets the size of the figure.\n", "\n", "* `axis` with the parameter `equal` sets up the axes so a unit is the same size along the `x` and `y` axes.\n", "\n", "In an example like this, where `x` and `y` represent coordinates in space, equal axes ensures that the distance between points is represented accurately. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write the data\n", "\n", "Let's write the merged DataFrame to a file." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "filename = 'gd1_merged.hdf5'\n", "\n", "merged.to_hdf(filename, 'merged')\n", "selected2.to_hdf(filename, 'selected2')" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-rw-r-- 1 downey downey 1.1M Dec 14 14:24 gd1_merged.hdf5\r\n" ] } ], "source": [ "!ls -lh gd1_merged.hdf5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are using Windows, `ls` might not work; in that case, try:\n", "\n", "```\n", "!dir gd1_merged.hdf5\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save the polygon\n", "\n", "[Reproducibile research](https://en.wikipedia.org/wiki/Reproducibility#Reproducible_research) is \"the idea that ... the full computational environment used to produce the results in the paper such as the code, data, etc. can be used to reproduce the results and create new work based on the research.\"\n", "\n", "This Jupyter notebook is an example of reproducible research because it contains all of the code needed to reproduce the results, including the database queries that download the data and and analysis.\n", "\n", "In this lesson we used an isochrone to derive a polygon, which we used to select stars based on photometry. \n", "So it is important to record the polygon as part of the data analysis pipeline.\n", "\n", "Here's how we can save it in an HDF file." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "filename = 'gd1_polygon.hdf5'\n", "loop.to_hdf(filename, 'loop')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can read it back like this." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "loop2 = pd.read_hdf(filename, 'loop')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And verify that the data we read back is the same." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "np.all(loop == loop2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "In this notebook, we worked with two datasets: the list of candidate stars from Gaia and the photometry data from Pan-STARRS.\n", "\n", "We drew a color-magnitude diagram and used it to identify stars we think are likely to be in GD-1.\n", "\n", "Then we used a Pandas `merge` operation to combine the data into a single `DataFrame`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Best practices\n", "\n", "* If you want to perform something like a database `JOIN` operation with data that is in a Pandas `DataFrame`, you can use the `join` or `merge` function. In many cases, `merge` is easier to use because the arguments are more like SQL.\n", "\n", "* Use Matplotlib options to control the size and aspect ratio of figures to make them easier to interpret. In this example, we scaled the axes so the size of a degree is equal along both axes.\n", "\n", "* Matplotlib also provides operations for working with points, polygons, and other geometric entities, so it's not just for making figures.\n", "\n", "* Be sure to record every element of the data analysis pipeline that would be needed to replicate the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }