pandas


pandas groupby and mean aggregation on more columns


I can't find any way to get the output from my input. I want to group by user and question, (but this question information is already in the answer columns, because it's NaN in the Question is not related to the given Answer, see below) and get the means for each Answer column.
Q stands for Question, A stands for Answer
Input:
import pandas as pd
import numpy as np
df = pd.DataFrame(
data={
'userid':[11,11,11,12,13,13],
'Q':['Q1','Q2','Q1','Q3','Q1','Q1'],
'A1':[1,np.NaN,0,np.NaN,0.8,0.6],
'A2':[np.NaN,1,np.NaN,np.NaN,np.NaN,np.NaN],
},
index=range(1,7)
)
input
My expected temporary state - you dont have to use:
temp_df = pd.DataFrame(
data={
'userid':[11,12,13],
'A1':[0.5,np.NaN,0.7],
'A2':[1,np.NaN,np.NaN],
},
index=range(1,4)
)
The final, desired dataframe:
desired_df = pd.DataFrame(
data={
'userid':[11,12,13],
'A1':[0.5,0.6,0.7],
'A2':[1,1,1],
},
index=range(1,4)
)
final, desired
You can use groupby on the userid column and calculate the means for the answer columns to get your intermediate dataframe (temp_df). Then you can just fill the missing values with the column means to get your final dataframe (desired_df).
temp_df = df.groupby('userid')[['A1', 'A2']].mean()
desired_df = temp_df.fillna(temp_df.mean())
print desired_df gives:
A1 A2
userid
11 0.5 1.0
12 0.6 1.0
13 0.7 1.0

Related Links

How to avoid temporary variables when creating new column via groupby.apply
Get value of a Pandas GroupBy Object
Trouble importing Pandas
pandas.io.ga not working for me
adding two series with missing data
Merging/combining two dataframes with different frequency time series indexes in Pandas?
Show DataFrame as table in iPython Notebook
Pandas. Groupby multiple columns, then attach a calculated column to an existing dataframe
pandas dataframe transformation partial sums
Pycharm - Package installation on Windows
rolling polynomial regression in pandas
python list to dataframe object
Find string in multiple columns ?
Drop level from one specific column
build sums of columns of pandas dataframe despite missing some data
Index column names

Categories

HOME
date
reflection
read-eval-print-loop
rsyslog
spring-jdbc
leon
contact
virtualization
adobe-analytics
php-7.1
apache-cayenne
commonmark
leiningen
hex-editors
quickfix
carthage
lombok
invantive-sql
crystal-reports-2008
jprofiler
autocad-plugin
paging
orleans
kvc
zurb-foundation-6
entitlements
elasticsearch-ruby
bluestacks
key-value-observing
twilio-api
google-sites-2016
winrt-xaml-toolkit
vsts-build-task
xmlreader
io-redirection
restlet
adobe-premiere
gesture
streamsets
hilbert-curve
leading-zero
elasticsearch-plugin
broadcastreceiver
ws-security
particles.js
eclipse-scout
node-gyp
scorm
apple-news
long-polling
parentheses
filepicker
rails-routing
clean-architecture
dotnetzip
veracode
fancybox-2
sqlbulkcopy
chord-diagram
ableton-live
radtreelist
bonobo
azure-virtual-network
autorest
bstr
revolution-r
hsv
kendonumerictextbox
wordml
jqgrid-formatter
computer-algebra-systems
browser-bugs
key-management
jsapi
modalpopup
network-interface
websocket4net
preferences
zend-route
project-planning
heisenbug
marmalade
transcoding
oracle-warehouse-builder
typo3-neos
geos
aqtime
spring-io
odata4j
random-seed
drools-planner
ocunit
adk
hirefire
heartbeat
infobox
appendto
actionview
datareader
remember-me
modelstate
bespin
sustainable-pace
aquaticprime
port-number
putchar
defensive-programming
msdev
commodore
signal-handling
uiq3

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App