{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Operaciones avanzadas con DataFrames\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Contenidos de esta lección\n",
"\n",
"- Creación de DataFrames con índices jerárquicos: `MultiIndex`\n",
"- Patrón Split-Apply-Combine con `groupBy` \n",
"\n",
"Consideremos la siguiente tabla de la lección anterior con los datos del Censo 2017 de Chile"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" NOMBRE REGIÓN | \n",
" Código Región | \n",
" NOMBRE PROVINCIA | \n",
" Código Provincia | \n",
" NOMBRE COMUNA | \n",
" Código Comuna | \n",
" Viviendas Particulares Ocupadas con Moradores Presentes | \n",
" Viviendas Particulares Ocupadas con Moradores Ausentes | \n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) | \n",
" Viviendas Particulares Desocupadas\\n(de Temporada) | \n",
" Viviendas Colectivas | \n",
" TOTAL VIVIENDAS | \n",
"
\n",
" \n",
" ORDEN | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" ARICA Y PARINACOTA | \n",
" 15.0 | \n",
" ARICA | \n",
" 151.0 | \n",
" ARICA | \n",
" 15101.0 | \n",
" 62129.0 | \n",
" 4574.0 | \n",
" 4045.0 | \n",
" 1666.0 | \n",
" 225.0 | \n",
" 72639.0 | \n",
"
\n",
" \n",
" 2 | \n",
" ARICA Y PARINACOTA | \n",
" 15.0 | \n",
" ARICA | \n",
" 151.0 | \n",
" CAMARONES | \n",
" 15102.0 | \n",
" 431.0 | \n",
" 96.0 | \n",
" 158.0 | \n",
" 242.0 | \n",
" 21.0 | \n",
" 948.0 | \n",
"
\n",
" \n",
" 3 | \n",
" ARICA Y PARINACOTA | \n",
" 15.0 | \n",
" PARINACOTA | \n",
" 152.0 | \n",
" PUTRE | \n",
" 15201.0 | \n",
" 540.0 | \n",
" 197.0 | \n",
" 143.0 | \n",
" 995.0 | \n",
" 42.0 | \n",
" 1917.0 | \n",
"
\n",
" \n",
" 4 | \n",
" ARICA Y PARINACOTA | \n",
" 15.0 | \n",
" PARINACOTA | \n",
" 152.0 | \n",
" GENERAL LAGOS | \n",
" 15202.0 | \n",
" 218.0 | \n",
" 90.0 | \n",
" 162.0 | \n",
" 216.0 | \n",
" 11.0 | \n",
" 697.0 | \n",
"
\n",
" \n",
" 5 | \n",
" TARAPACÁ | \n",
" 1.0 | \n",
" IQUIQUE | \n",
" 11.0 | \n",
" IQUIQUE | \n",
" 1101.0 | \n",
" 56007.0 | \n",
" 3673.0 | \n",
" 5481.0 | \n",
" 1564.0 | \n",
" 261.0 | \n",
" 66986.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" NOMBRE REGIÓN Código Región NOMBRE PROVINCIA Código Provincia \\\n",
"ORDEN \n",
"1 ARICA Y PARINACOTA 15.0 ARICA 151.0 \n",
"2 ARICA Y PARINACOTA 15.0 ARICA 151.0 \n",
"3 ARICA Y PARINACOTA 15.0 PARINACOTA 152.0 \n",
"4 ARICA Y PARINACOTA 15.0 PARINACOTA 152.0 \n",
"5 TARAPACÁ 1.0 IQUIQUE 11.0 \n",
"\n",
" NOMBRE COMUNA Código Comuna \\\n",
"ORDEN \n",
"1 ARICA 15101.0 \n",
"2 CAMARONES 15102.0 \n",
"3 PUTRE 15201.0 \n",
"4 GENERAL LAGOS 15202.0 \n",
"5 IQUIQUE 1101.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Presentes \\\n",
"ORDEN \n",
"1 62129.0 \n",
"2 431.0 \n",
"3 540.0 \n",
"4 218.0 \n",
"5 56007.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Ausentes \\\n",
"ORDEN \n",
"1 4574.0 \n",
"2 96.0 \n",
"3 197.0 \n",
"4 90.0 \n",
"5 3673.0 \n",
"\n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) \\\n",
"ORDEN \n",
"1 4045.0 \n",
"2 158.0 \n",
"3 143.0 \n",
"4 162.0 \n",
"5 5481.0 \n",
"\n",
" Viviendas Particulares Desocupadas\\n(de Temporada) \\\n",
"ORDEN \n",
"1 1666.0 \n",
"2 242.0 \n",
"3 995.0 \n",
"4 216.0 \n",
"5 1564.0 \n",
"\n",
" Viviendas Colectivas TOTAL VIVIENDAS \n",
"ORDEN \n",
"1 225.0 72639.0 \n",
"2 21.0 948.0 \n",
"3 42.0 1917.0 \n",
"4 11.0 697.0 \n",
"5 261.0 66986.0 "
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import pickle\n",
"with open(\"censo.pkl\", \"rb\") as f:\n",
" df = pickle.load(f)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-índices\n",
"\n",
"Estudiando la tabla anterior notamos que tiene una estructura jerárquica\n",
"\n",
"> REGIÓN, PROVINCIA, COMUNA\n",
"\n",
"Podemos representar este tipo de estructuras en pandas usando un `MultiIndex` \n",
"\n",
"**Asignar y reestablecer índices en un DataFrame**\n",
"\n",
"Para asignar un índice a un DataFrame que ya está creado podemos usar el atributo\n",
"\n",
"```python\n",
"df.set_index(keys, # Una etiqueta o una lista de etiquetas que serán los nuevos índices\n",
" drop=True, # Eliminar las columnas que pasarán a ser índices\n",
" inplace=False, # Retornar un nuevo dataframe o modificar df\n",
" ...\n",
" )\n",
"```\n",
"\n",
":::{important}\n",
"\n",
"- Si keys es una etiqueta crearemos un índice regular\n",
"- Si keys es una lista crearemos un `MultiIndex`\n",
"\n",
":::\n",
"\n",
"Si queremos que nuestro índice o multi-índice vuelva a convertirse en columna podemos usar el atributo\n",
"\n",
"```python\n",
"df.reset_index(level = None, # Permite especificar cuantos niveles de índices se removeran\n",
" drop: bool = False, # Si los índices se deben eliminar o agregar como columnas\n",
" inplace: bool = False, # Retornar un nuevo dataframe o modificar df\n",
" ...\n",
" )\n",
"```\n",
"\n",
"**Ejemplo**\n",
"\n",
"Crearemos un `MultiIndex` para la tabla anterior. En este caso usaremos tres niveles de jerarquía:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df.set_index([\"NOMBRE REGIÓN\", \"NOMBRE PROVINCIA\", \"NOMBRE COMUNA\"], inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Si inspeccionamos el DataFrame veremos que:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" | \n",
" Código Región | \n",
" Código Provincia | \n",
" Código Comuna | \n",
" Viviendas Particulares Ocupadas con Moradores Presentes | \n",
" Viviendas Particulares Ocupadas con Moradores Ausentes | \n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) | \n",
" Viviendas Particulares Desocupadas\\n(de Temporada) | \n",
" Viviendas Colectivas | \n",
" TOTAL VIVIENDAS | \n",
"
\n",
" \n",
" NOMBRE REGIÓN | \n",
" NOMBRE PROVINCIA | \n",
" NOMBRE COMUNA | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" ARICA Y PARINACOTA | \n",
" ARICA | \n",
" ARICA | \n",
" 15.0 | \n",
" 151.0 | \n",
" 15101.0 | \n",
" 62129.0 | \n",
" 4574.0 | \n",
" 4045.0 | \n",
" 1666.0 | \n",
" 225.0 | \n",
" 72639.0 | \n",
"
\n",
" \n",
" CAMARONES | \n",
" 15.0 | \n",
" 151.0 | \n",
" 15102.0 | \n",
" 431.0 | \n",
" 96.0 | \n",
" 158.0 | \n",
" 242.0 | \n",
" 21.0 | \n",
" 948.0 | \n",
"
\n",
" \n",
" PARINACOTA | \n",
" PUTRE | \n",
" 15.0 | \n",
" 152.0 | \n",
" 15201.0 | \n",
" 540.0 | \n",
" 197.0 | \n",
" 143.0 | \n",
" 995.0 | \n",
" 42.0 | \n",
" 1917.0 | \n",
"
\n",
" \n",
" GENERAL LAGOS | \n",
" 15.0 | \n",
" 152.0 | \n",
" 15202.0 | \n",
" 218.0 | \n",
" 90.0 | \n",
" 162.0 | \n",
" 216.0 | \n",
" 11.0 | \n",
" 697.0 | \n",
"
\n",
" \n",
" TARAPACÁ | \n",
" IQUIQUE | \n",
" IQUIQUE | \n",
" 1.0 | \n",
" 11.0 | \n",
" 1101.0 | \n",
" 56007.0 | \n",
" 3673.0 | \n",
" 5481.0 | \n",
" 1564.0 | \n",
" 261.0 | \n",
" 66986.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" MAGALLANES Y DE LA ANTÁRTICA CHILENA | \n",
" TIERRA DEL FUEGO | \n",
" PORVENIR | \n",
" 12.0 | \n",
" 123.0 | \n",
" 12301.0 | \n",
" 2097.0 | \n",
" 57.0 | \n",
" 462.0 | \n",
" 511.0 | \n",
" 60.0 | \n",
" 3187.0 | \n",
"
\n",
" \n",
" PRIMAVERA | \n",
" 12.0 | \n",
" 123.0 | \n",
" 12302.0 | \n",
" 190.0 | \n",
" 47.0 | \n",
" 62.0 | \n",
" 100.0 | \n",
" 73.0 | \n",
" 472.0 | \n",
"
\n",
" \n",
" TIMAUKEL | \n",
" 12.0 | \n",
" 123.0 | \n",
" 12303.0 | \n",
" 87.0 | \n",
" 9.0 | \n",
" 34.0 | \n",
" 76.0 | \n",
" 8.0 | \n",
" 214.0 | \n",
"
\n",
" \n",
" ÚLTIMA ESPERANZA | \n",
" NATALES | \n",
" 12.0 | \n",
" 124.0 | \n",
" 12401.0 | \n",
" 7274.0 | \n",
" 754.0 | \n",
" 900.0 | \n",
" 331.0 | \n",
" 110.0 | \n",
" 9369.0 | \n",
"
\n",
" \n",
" TORRES DEL PAINE | \n",
" 12.0 | \n",
" 124.0 | \n",
" 12402.0 | \n",
" 127.0 | \n",
" 16.0 | \n",
" 7.0 | \n",
" 38.0 | \n",
" 38.0 | \n",
" 226.0 | \n",
"
\n",
" \n",
"
\n",
"
346 rows × 9 columns
\n",
"
"
],
"text/plain": [
" Código Región \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 15.0 \n",
" CAMARONES 15.0 \n",
" PARINACOTA PUTRE 15.0 \n",
" GENERAL LAGOS 15.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 1.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 12.0 \n",
" PRIMAVERA 12.0 \n",
" TIMAUKEL 12.0 \n",
" ÚLTIMA ESPERANZA NATALES 12.0 \n",
" TORRES DEL PAINE 12.0 \n",
"\n",
" Código Provincia \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 151.0 \n",
" CAMARONES 151.0 \n",
" PARINACOTA PUTRE 152.0 \n",
" GENERAL LAGOS 152.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 11.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 123.0 \n",
" PRIMAVERA 123.0 \n",
" TIMAUKEL 123.0 \n",
" ÚLTIMA ESPERANZA NATALES 124.0 \n",
" TORRES DEL PAINE 124.0 \n",
"\n",
" Código Comuna \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 15101.0 \n",
" CAMARONES 15102.0 \n",
" PARINACOTA PUTRE 15201.0 \n",
" GENERAL LAGOS 15202.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 1101.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 12301.0 \n",
" PRIMAVERA 12302.0 \n",
" TIMAUKEL 12303.0 \n",
" ÚLTIMA ESPERANZA NATALES 12401.0 \n",
" TORRES DEL PAINE 12402.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Presentes \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 62129.0 \n",
" CAMARONES 431.0 \n",
" PARINACOTA PUTRE 540.0 \n",
" GENERAL LAGOS 218.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 56007.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 2097.0 \n",
" PRIMAVERA 190.0 \n",
" TIMAUKEL 87.0 \n",
" ÚLTIMA ESPERANZA NATALES 7274.0 \n",
" TORRES DEL PAINE 127.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Ausentes \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 4574.0 \n",
" CAMARONES 96.0 \n",
" PARINACOTA PUTRE 197.0 \n",
" GENERAL LAGOS 90.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 3673.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 57.0 \n",
" PRIMAVERA 47.0 \n",
" TIMAUKEL 9.0 \n",
" ÚLTIMA ESPERANZA NATALES 754.0 \n",
" TORRES DEL PAINE 16.0 \n",
"\n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 4045.0 \n",
" CAMARONES 158.0 \n",
" PARINACOTA PUTRE 143.0 \n",
" GENERAL LAGOS 162.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 5481.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 462.0 \n",
" PRIMAVERA 62.0 \n",
" TIMAUKEL 34.0 \n",
" ÚLTIMA ESPERANZA NATALES 900.0 \n",
" TORRES DEL PAINE 7.0 \n",
"\n",
" Viviendas Particulares Desocupadas\\n(de Temporada) \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 1666.0 \n",
" CAMARONES 242.0 \n",
" PARINACOTA PUTRE 995.0 \n",
" GENERAL LAGOS 216.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 1564.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 511.0 \n",
" PRIMAVERA 100.0 \n",
" TIMAUKEL 76.0 \n",
" ÚLTIMA ESPERANZA NATALES 331.0 \n",
" TORRES DEL PAINE 38.0 \n",
"\n",
" Viviendas Colectivas \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 225.0 \n",
" CAMARONES 21.0 \n",
" PARINACOTA PUTRE 42.0 \n",
" GENERAL LAGOS 11.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 261.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 60.0 \n",
" PRIMAVERA 73.0 \n",
" TIMAUKEL 8.0 \n",
" ÚLTIMA ESPERANZA NATALES 110.0 \n",
" TORRES DEL PAINE 38.0 \n",
"\n",
" TOTAL VIVIENDAS \n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"ARICA Y PARINACOTA ARICA ARICA 72639.0 \n",
" CAMARONES 948.0 \n",
" PARINACOTA PUTRE 1917.0 \n",
" GENERAL LAGOS 697.0 \n",
"TARAPACÁ IQUIQUE IQUIQUE 66986.0 \n",
"... ... \n",
"MAGALLANES Y DE LA ANTÁRTICA CHILENA TIERRA DEL FUEGO PORVENIR 3187.0 \n",
" PRIMAVERA 472.0 \n",
" TIMAUKEL 214.0 \n",
" ÚLTIMA ESPERANZA NATALES 9369.0 \n",
" TORRES DEL PAINE 226.0 \n",
"\n",
"[346 rows x 9 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Slicing* con `MultiIndex`**\n",
"\n",
"Para recuperar un elemento de un DataFrame con `MultiIndex` podemos indexar usando una tupla especificando cada uno de los niveles de índices, por ejemplo:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Código Región 14.0\n",
"Código Provincia 141.0\n",
"Código Comuna 14101.0\n",
"Viviendas Particulares Ocupadas con Moradores Presentes 53624.0\n",
"Viviendas Particulares Ocupadas con Moradores Ausentes 2939.0\n",
"Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) 2876.0\n",
"Viviendas Particulares Desocupadas\\n(de Temporada) 1516.0\n",
"Viviendas Colectivas 237.0\n",
"TOTAL VIVIENDAS 61192.0\n",
"Name: (LOS RÍOS, VALDIVIA, VALDIVIA), dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[(\"LOS RÍOS\", \"VALDIVIA\", \"VALDIVIA\")]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"retorna la comuna de Valdivia, mientras que:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_35404/853128477.py:1: PerformanceWarning: indexing past lexsort depth may impact performance.\n",
" df.loc[(\"LOS RÍOS\", \"VALDIVIA\")]\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Código Región | \n",
" Código Provincia | \n",
" Código Comuna | \n",
" Viviendas Particulares Ocupadas con Moradores Presentes | \n",
" Viviendas Particulares Ocupadas con Moradores Ausentes | \n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) | \n",
" Viviendas Particulares Desocupadas\\n(de Temporada) | \n",
" Viviendas Colectivas | \n",
" TOTAL VIVIENDAS | \n",
"
\n",
" \n",
" NOMBRE COMUNA | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" VALDIVIA | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14101.0 | \n",
" 53624.0 | \n",
" 2939.0 | \n",
" 2876.0 | \n",
" 1516.0 | \n",
" 237.0 | \n",
" 61192.0 | \n",
"
\n",
" \n",
" CORRAL | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14102.0 | \n",
" 1842.0 | \n",
" 93.0 | \n",
" 158.0 | \n",
" 470.0 | \n",
" 11.0 | \n",
" 2574.0 | \n",
"
\n",
" \n",
" LANCO | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14103.0 | \n",
" 5643.0 | \n",
" 265.0 | \n",
" 308.0 | \n",
" 217.0 | \n",
" 17.0 | \n",
" 6450.0 | \n",
"
\n",
" \n",
" LOS LAGOS | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14104.0 | \n",
" 6446.0 | \n",
" 304.0 | \n",
" 601.0 | \n",
" 539.0 | \n",
" 23.0 | \n",
" 7913.0 | \n",
"
\n",
" \n",
" MÁFIL | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14105.0 | \n",
" 2317.0 | \n",
" 152.0 | \n",
" 161.0 | \n",
" 77.0 | \n",
" 5.0 | \n",
" 2712.0 | \n",
"
\n",
" \n",
" MARIQUINA | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14106.0 | \n",
" 6808.0 | \n",
" 351.0 | \n",
" 519.0 | \n",
" 727.0 | \n",
" 12.0 | \n",
" 8417.0 | \n",
"
\n",
" \n",
" PAILLACO | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14107.0 | \n",
" 6476.0 | \n",
" 363.0 | \n",
" 542.0 | \n",
" 222.0 | \n",
" 23.0 | \n",
" 7626.0 | \n",
"
\n",
" \n",
" PANGUIPULLI | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14108.0 | \n",
" 11615.0 | \n",
" 840.0 | \n",
" 1155.0 | \n",
" 3142.0 | \n",
" 58.0 | \n",
" 16810.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Código Región Código Provincia Código Comuna \\\n",
"NOMBRE COMUNA \n",
"VALDIVIA 14.0 141.0 14101.0 \n",
"CORRAL 14.0 141.0 14102.0 \n",
"LANCO 14.0 141.0 14103.0 \n",
"LOS LAGOS 14.0 141.0 14104.0 \n",
"MÁFIL 14.0 141.0 14105.0 \n",
"MARIQUINA 14.0 141.0 14106.0 \n",
"PAILLACO 14.0 141.0 14107.0 \n",
"PANGUIPULLI 14.0 141.0 14108.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Presentes \\\n",
"NOMBRE COMUNA \n",
"VALDIVIA 53624.0 \n",
"CORRAL 1842.0 \n",
"LANCO 5643.0 \n",
"LOS LAGOS 6446.0 \n",
"MÁFIL 2317.0 \n",
"MARIQUINA 6808.0 \n",
"PAILLACO 6476.0 \n",
"PANGUIPULLI 11615.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Ausentes \\\n",
"NOMBRE COMUNA \n",
"VALDIVIA 2939.0 \n",
"CORRAL 93.0 \n",
"LANCO 265.0 \n",
"LOS LAGOS 304.0 \n",
"MÁFIL 152.0 \n",
"MARIQUINA 351.0 \n",
"PAILLACO 363.0 \n",
"PANGUIPULLI 840.0 \n",
"\n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) \\\n",
"NOMBRE COMUNA \n",
"VALDIVIA 2876.0 \n",
"CORRAL 158.0 \n",
"LANCO 308.0 \n",
"LOS LAGOS 601.0 \n",
"MÁFIL 161.0 \n",
"MARIQUINA 519.0 \n",
"PAILLACO 542.0 \n",
"PANGUIPULLI 1155.0 \n",
"\n",
" Viviendas Particulares Desocupadas\\n(de Temporada) \\\n",
"NOMBRE COMUNA \n",
"VALDIVIA 1516.0 \n",
"CORRAL 470.0 \n",
"LANCO 217.0 \n",
"LOS LAGOS 539.0 \n",
"MÁFIL 77.0 \n",
"MARIQUINA 727.0 \n",
"PAILLACO 222.0 \n",
"PANGUIPULLI 3142.0 \n",
"\n",
" Viviendas Colectivas TOTAL VIVIENDAS \n",
"NOMBRE COMUNA \n",
"VALDIVIA 237.0 61192.0 \n",
"CORRAL 11.0 2574.0 \n",
"LANCO 17.0 6450.0 \n",
"LOS LAGOS 23.0 7913.0 \n",
"MÁFIL 5.0 2712.0 \n",
"MARIQUINA 12.0 8417.0 \n",
"PAILLACO 23.0 7626.0 \n",
"PANGUIPULLI 58.0 16810.0 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[(\"LOS RÍOS\", \"VALDIVIA\")]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"retorna todas las comunas de la provincia de Valdivia."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para hacer *slices* o `fancy indexing` lo más simple es usar el objeto [`IndexSlice`](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.IndexSlice.html), por ejemplo:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" | \n",
" Código Región | \n",
" Código Provincia | \n",
" Código Comuna | \n",
" Viviendas Particulares Ocupadas con Moradores Presentes | \n",
" Viviendas Particulares Ocupadas con Moradores Ausentes | \n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) | \n",
" Viviendas Particulares Desocupadas\\n(de Temporada) | \n",
" Viviendas Colectivas | \n",
" TOTAL VIVIENDAS | \n",
"
\n",
" \n",
" NOMBRE REGIÓN | \n",
" NOMBRE PROVINCIA | \n",
" NOMBRE COMUNA | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" LOS RÍOS | \n",
" VALDIVIA | \n",
" VALDIVIA | \n",
" 14.0 | \n",
" 141.0 | \n",
" 14101.0 | \n",
" 53624.0 | \n",
" 2939.0 | \n",
" 2876.0 | \n",
" 1516.0 | \n",
" 237.0 | \n",
" 61192.0 | \n",
"
\n",
" \n",
" LOS LAGOS | \n",
" OSORNO | \n",
" OSORNO | \n",
" 10.0 | \n",
" 103.0 | \n",
" 10301.0 | \n",
" 53821.0 | \n",
" 3082.0 | \n",
" 2685.0 | \n",
" 471.0 | \n",
" 73.0 | \n",
" 60132.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Código Región Código Provincia \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS RÍOS VALDIVIA VALDIVIA 14.0 141.0 \n",
"LOS LAGOS OSORNO OSORNO 10.0 103.0 \n",
"\n",
" Código Comuna \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS RÍOS VALDIVIA VALDIVIA 14101.0 \n",
"LOS LAGOS OSORNO OSORNO 10301.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Presentes \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS RÍOS VALDIVIA VALDIVIA 53624.0 \n",
"LOS LAGOS OSORNO OSORNO 53821.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Ausentes \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS RÍOS VALDIVIA VALDIVIA 2939.0 \n",
"LOS LAGOS OSORNO OSORNO 3082.0 \n",
"\n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS RÍOS VALDIVIA VALDIVIA 2876.0 \n",
"LOS LAGOS OSORNO OSORNO 2685.0 \n",
"\n",
" Viviendas Particulares Desocupadas\\n(de Temporada) \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS RÍOS VALDIVIA VALDIVIA 1516.0 \n",
"LOS LAGOS OSORNO OSORNO 471.0 \n",
"\n",
" Viviendas Colectivas \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS RÍOS VALDIVIA VALDIVIA 237.0 \n",
"LOS LAGOS OSORNO OSORNO 73.0 \n",
"\n",
" TOTAL VIVIENDAS \n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS RÍOS VALDIVIA VALDIVIA 61192.0 \n",
"LOS LAGOS OSORNO OSORNO 60132.0 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"idx = pd.IndexSlice\n",
"df.loc[idx[:, :, [\"VALDIVIA\", \"OSORNO\"]], :] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"retorna las comunas de Valdivia y Osorno, sin importar que estén en distintas provincias, mientras que:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" | \n",
" Código Región | \n",
" Código Provincia | \n",
" Código Comuna | \n",
" Viviendas Particulares Ocupadas con Moradores Presentes | \n",
" Viviendas Particulares Ocupadas con Moradores Ausentes | \n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) | \n",
" Viviendas Particulares Desocupadas\\n(de Temporada) | \n",
" Viviendas Colectivas | \n",
" TOTAL VIVIENDAS | \n",
"
\n",
" \n",
" NOMBRE REGIÓN | \n",
" NOMBRE PROVINCIA | \n",
" NOMBRE COMUNA | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" LOS LAGOS | \n",
" LLANQUIHUE | \n",
" PUERTO MONTT | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10101.0 | \n",
" 78820.0 | \n",
" 5503.0 | \n",
" 6655.0 | \n",
" 2194.0 | \n",
" 208.0 | \n",
" 93380.0 | \n",
"
\n",
" \n",
" CALBUCO | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10102.0 | \n",
" 11146.0 | \n",
" 745.0 | \n",
" 1006.0 | \n",
" 637.0 | \n",
" 36.0 | \n",
" 13570.0 | \n",
"
\n",
" \n",
" COCHAMÓ | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10103.0 | \n",
" 1600.0 | \n",
" 90.0 | \n",
" 172.0 | \n",
" 358.0 | \n",
" 20.0 | \n",
" 2240.0 | \n",
"
\n",
" \n",
" FRESIA | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10104.0 | \n",
" 4356.0 | \n",
" 236.0 | \n",
" 435.0 | \n",
" 227.0 | \n",
" 8.0 | \n",
" 5262.0 | \n",
"
\n",
" \n",
" FRUTILLAR | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10105.0 | \n",
" 6267.0 | \n",
" 626.0 | \n",
" 594.0 | \n",
" 398.0 | \n",
" 24.0 | \n",
" 7909.0 | \n",
"
\n",
" \n",
" LOS MUERMOS | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10106.0 | \n",
" 5990.0 | \n",
" 327.0 | \n",
" 619.0 | \n",
" 321.0 | \n",
" 7.0 | \n",
" 7264.0 | \n",
"
\n",
" \n",
" LLANQUIHUE | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10107.0 | \n",
" 5550.0 | \n",
" 200.0 | \n",
" 258.0 | \n",
" 115.0 | \n",
" 12.0 | \n",
" 6135.0 | \n",
"
\n",
" \n",
" MAULLÍN | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10108.0 | \n",
" 5112.0 | \n",
" 259.0 | \n",
" 743.0 | \n",
" 502.0 | \n",
" 14.0 | \n",
" 6630.0 | \n",
"
\n",
" \n",
" PUERTO VARAS | \n",
" 10.0 | \n",
" 101.0 | \n",
" 10109.0 | \n",
" 14231.0 | \n",
" 1034.0 | \n",
" 1168.0 | \n",
" 1108.0 | \n",
" 102.0 | \n",
" 17643.0 | \n",
"
\n",
" \n",
" PALENA | \n",
" CHAITÉN | \n",
" 10.0 | \n",
" 104.0 | \n",
" 10401.0 | \n",
" 1625.0 | \n",
" 151.0 | \n",
" 195.0 | \n",
" 190.0 | \n",
" 60.0 | \n",
" 2221.0 | \n",
"
\n",
" \n",
" FUTALEUFÚ | \n",
" 10.0 | \n",
" 104.0 | \n",
" 10402.0 | \n",
" 1002.0 | \n",
" 130.0 | \n",
" 101.0 | \n",
" 176.0 | \n",
" 24.0 | \n",
" 1433.0 | \n",
"
\n",
" \n",
" HUALAIHUÉ | \n",
" 10.0 | \n",
" 104.0 | \n",
" 10403.0 | \n",
" 3093.0 | \n",
" 270.0 | \n",
" 616.0 | \n",
" 439.0 | \n",
" 21.0 | \n",
" 4439.0 | \n",
"
\n",
" \n",
" PALENA | \n",
" 10.0 | \n",
" 104.0 | \n",
" 10404.0 | \n",
" 715.0 | \n",
" 61.0 | \n",
" 58.0 | \n",
" 165.0 | \n",
" 3.0 | \n",
" 1002.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Código Región Código Provincia \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS LAGOS LLANQUIHUE PUERTO MONTT 10.0 101.0 \n",
" CALBUCO 10.0 101.0 \n",
" COCHAMÓ 10.0 101.0 \n",
" FRESIA 10.0 101.0 \n",
" FRUTILLAR 10.0 101.0 \n",
" LOS MUERMOS 10.0 101.0 \n",
" LLANQUIHUE 10.0 101.0 \n",
" MAULLÍN 10.0 101.0 \n",
" PUERTO VARAS 10.0 101.0 \n",
" PALENA CHAITÉN 10.0 104.0 \n",
" FUTALEUFÚ 10.0 104.0 \n",
" HUALAIHUÉ 10.0 104.0 \n",
" PALENA 10.0 104.0 \n",
"\n",
" Código Comuna \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS LAGOS LLANQUIHUE PUERTO MONTT 10101.0 \n",
" CALBUCO 10102.0 \n",
" COCHAMÓ 10103.0 \n",
" FRESIA 10104.0 \n",
" FRUTILLAR 10105.0 \n",
" LOS MUERMOS 10106.0 \n",
" LLANQUIHUE 10107.0 \n",
" MAULLÍN 10108.0 \n",
" PUERTO VARAS 10109.0 \n",
" PALENA CHAITÉN 10401.0 \n",
" FUTALEUFÚ 10402.0 \n",
" HUALAIHUÉ 10403.0 \n",
" PALENA 10404.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Presentes \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS LAGOS LLANQUIHUE PUERTO MONTT 78820.0 \n",
" CALBUCO 11146.0 \n",
" COCHAMÓ 1600.0 \n",
" FRESIA 4356.0 \n",
" FRUTILLAR 6267.0 \n",
" LOS MUERMOS 5990.0 \n",
" LLANQUIHUE 5550.0 \n",
" MAULLÍN 5112.0 \n",
" PUERTO VARAS 14231.0 \n",
" PALENA CHAITÉN 1625.0 \n",
" FUTALEUFÚ 1002.0 \n",
" HUALAIHUÉ 3093.0 \n",
" PALENA 715.0 \n",
"\n",
" Viviendas Particulares Ocupadas con Moradores Ausentes \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS LAGOS LLANQUIHUE PUERTO MONTT 5503.0 \n",
" CALBUCO 745.0 \n",
" COCHAMÓ 90.0 \n",
" FRESIA 236.0 \n",
" FRUTILLAR 626.0 \n",
" LOS MUERMOS 327.0 \n",
" LLANQUIHUE 200.0 \n",
" MAULLÍN 259.0 \n",
" PUERTO VARAS 1034.0 \n",
" PALENA CHAITÉN 151.0 \n",
" FUTALEUFÚ 130.0 \n",
" HUALAIHUÉ 270.0 \n",
" PALENA 61.0 \n",
"\n",
" Viviendas Particulares Desocupadas (en Venta, para arriendo, Abandonada u otro) \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS LAGOS LLANQUIHUE PUERTO MONTT 6655.0 \n",
" CALBUCO 1006.0 \n",
" COCHAMÓ 172.0 \n",
" FRESIA 435.0 \n",
" FRUTILLAR 594.0 \n",
" LOS MUERMOS 619.0 \n",
" LLANQUIHUE 258.0 \n",
" MAULLÍN 743.0 \n",
" PUERTO VARAS 1168.0 \n",
" PALENA CHAITÉN 195.0 \n",
" FUTALEUFÚ 101.0 \n",
" HUALAIHUÉ 616.0 \n",
" PALENA 58.0 \n",
"\n",
" Viviendas Particulares Desocupadas\\n(de Temporada) \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS LAGOS LLANQUIHUE PUERTO MONTT 2194.0 \n",
" CALBUCO 637.0 \n",
" COCHAMÓ 358.0 \n",
" FRESIA 227.0 \n",
" FRUTILLAR 398.0 \n",
" LOS MUERMOS 321.0 \n",
" LLANQUIHUE 115.0 \n",
" MAULLÍN 502.0 \n",
" PUERTO VARAS 1108.0 \n",
" PALENA CHAITÉN 190.0 \n",
" FUTALEUFÚ 176.0 \n",
" HUALAIHUÉ 439.0 \n",
" PALENA 165.0 \n",
"\n",
" Viviendas Colectivas \\\n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS LAGOS LLANQUIHUE PUERTO MONTT 208.0 \n",
" CALBUCO 36.0 \n",
" COCHAMÓ 20.0 \n",
" FRESIA 8.0 \n",
" FRUTILLAR 24.0 \n",
" LOS MUERMOS 7.0 \n",
" LLANQUIHUE 12.0 \n",
" MAULLÍN 14.0 \n",
" PUERTO VARAS 102.0 \n",
" PALENA CHAITÉN 60.0 \n",
" FUTALEUFÚ 24.0 \n",
" HUALAIHUÉ 21.0 \n",
" PALENA 3.0 \n",
"\n",
" TOTAL VIVIENDAS \n",
"NOMBRE REGIÓN NOMBRE PROVINCIA NOMBRE COMUNA \n",
"LOS LAGOS LLANQUIHUE PUERTO MONTT 93380.0 \n",
" CALBUCO 13570.0 \n",
" COCHAMÓ 2240.0 \n",
" FRESIA 5262.0 \n",
" FRUTILLAR 7909.0 \n",
" LOS MUERMOS 7264.0 \n",
" LLANQUIHUE 6135.0 \n",
" MAULLÍN 6630.0 \n",
" PUERTO VARAS 17643.0 \n",
" PALENA CHAITÉN 2221.0 \n",
" FUTALEUFÚ 1433.0 \n",
" HUALAIHUÉ 4439.0 \n",
" PALENA 1002.0 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"idx = pd.IndexSlice\n",
"df.loc[idx[:, [\"LLANQUIHUE\", \"PALENA\"], : ], :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"retorna las comunas pertenecientes a las provincias de Llanquihue y Palena\n",
"\n",
":::{important}\n",
"\n",
"Con IndexSlice podemos recuperar subconjuntos arbitrarios de filas en base al índice jerárquico\n",
"\n",
":::"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Patrón *Split-Apply-Combine* en DataFrames\n",
"\n",
"Digamos que queremos obtener los totales de todos los tipos de vivienda a nivel de provincia\n",
"\n",
"Si asignamos \"NOMBRE PROVINCIA\" como índice podríamos usar\n",
"\n",
"```python\n",
"result = []\n",
"for provincia in df.index.unique(): \n",
" sub_df = df.loc[provincia, col_mask]\n",
" if sub_df.ndim>1: \n",
" result.append(df.loc[provincia, col_mask].sum())\n",
" else: # No hacer reducción suma si la provincia tiene una sola comuna\n",
" result.append(df.loc[provincia, col_mask])\n",
"pd.DataFrame(result, columns=col_mask, index=df.index.unique())\n",
"```\n",
"que obtiene el resultado deseado, pero es ineficiente y bastante engorroso\n",
"\n",
"El ejemplo anterior representa un patrón de \"operaciones condicionadas por llave\" que es muy utilizado en bases de datos y se conoce como *split-apply-combine*\n",
"\n",
"
\n",
"\n",
"Donde\n",
"\n",
"- *split*: Divide los datos según una **llave**\n",
"- *apply*: Realiza una función sobre cada grupo\n",
"- *combine*: Mezcla el resultado en un nuevo dataframe donde la **llave** se convierte en el índice\n",
"\n",
"En el ejemplo anterior \n",
"\n",
"- *split*: Crea subconjuntos con las comunas de cada provincia\n",
"- *apply*: Hace una reducción suma en las columnas de viviendas\n",
"- *combine*: Construye un nuevo dataframe con los resultados donde la llave son las provincias "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Atributo `groupby`\n",
"\n",
"El patrón *split-apply-combine* está implementado de forma muy eficiente en `pandas` a través del atributo [`groupby()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) \n",
"\n",
"Los argumentos básicos de `groupby` son\n",
"\n",
"```python\n",
"df.groupby(by=None, # Columna o lista de columnas con se hace el split\n",
" axis=0, # Dividir en filas (0) o en columnas (1)\n",
" as_index: bool = True, # Retornar las etiquetas de cada grupo como índice\n",
" sort: bool = True, # Retornar las etiquetas de grupo ordenadas alfabeticamente\n",
" ...\n",
" ) \n",
"```\n",
"\n",
"Notemos que `groupby` actua como un iterador\n",
"\n",
"```python\n",
"for (region, sub_df) in df.groupby('NOMBRE REGIÓN'):\n",
" display(region, # La etiqueta\n",
" sub_df # El dataframe con las filas que comparten esa etiqueta\n",
" )\n",
"```\n",
"\n",
"La función que se ejecuta a cada grupo en el paso *apply* es un atributo de `groupby`, existen cuatro atributos\n",
"\n",
"- `aggregate`: Para hacer reducciones\n",
"- `filter`: Para eliminar filas\n",
"- `transform`: Para modificar filas\n",
"- `apply`: Función flexible que puede combinar lo que hace `aggregate` y `transform` \n",
"\n",
"A continuación revisaremos las primeras tres en detalle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reducción con `aggregate`\n",
"\n",
"La sintaxis básica de este atributo es\n",
"\n",
"```python\n",
"# Para aplicar la misma función a todos las columnas\n",
"df.groupby(by=llave).aggregate(funcion1) \n",
"# Para aplicar varias funciones a todos las columnas\n",
"df.groupby(by=llave).aggregate([funcion1, funcion2, ...]) \n",
"# Para aplicar funciones específicas a columnas específicas\n",
"df.groupby(by=llave).aggregate({columna1: funcion1, columna2: funcion2}) \n",
"```\n",
"Las funciones debe entregar un sólo valor por cada columna del grupo\n",
"\n",
"En general las reducciones se usan para hacer resumenes, por ejemplo sumas, promedios o varianzas\n",
"\n",
"**Ejemplo**\n",
"\n",
"Podemos encontrar los totales de vivienda por provincia en una sola linea usando\n",
"\n",
"```python\n",
"df.groupby(by=\"NOMBRE PROVINCIA\", sort=False).aggregate(np.sum)\n",
"```\n",
"\n",
"O usando el alias\n",
"\n",
"```python\n",
"df.groupby(by=\"NOMBRE PROVINCIA\", sort=False).sum() \n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filtrado con `filter`\n",
"\n",
"La sintaxis básica de este atributo es\n",
"\n",
"```python\n",
"df.groupby(by=llave).filter(funcion)\n",
"```\n",
"\n",
"La función debe retornar `True` o `False`\n",
"\n",
"El resultado es un nuevo DataFrame con todos los grupos que \"pasaron el filtro\"\n",
"\n",
"En general este atributo se usa para eliminar/descartar grupos de filas (drop)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Modificando el `DataFrame` con `transform`\n",
"\n",
"La sintaxis básica de este atributo es\n",
"\n",
"```python\n",
"df.groupby(by=llave).transform(funcion)\n",
"```\n",
"\n",
"La función debe retornar un dataframe con la misma dimensión y tamaño que el original y se aplica columna a columna\n",
"\n",
"La función puede ser explicita o anónima (lambda)\n",
"\n",
"Un uso típico de este atributo es el reescalamiento/normalización a nivel de grupo "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "294.4px"
},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 2
}