pandas


pandas groupby and mean aggregation on more columns


I can't find any way to get the output from my input. I want to group by user and question, (but this question information is already in the answer columns, because it's NaN in the Question is not related to the given Answer, see below) and get the means for each Answer column.
Q stands for Question, A stands for Answer
Input:
import pandas as pd
import numpy as np
df = pd.DataFrame(
data={
'userid':[11,11,11,12,13,13],
'Q':['Q1','Q2','Q1','Q3','Q1','Q1'],
'A1':[1,np.NaN,0,np.NaN,0.8,0.6],
'A2':[np.NaN,1,np.NaN,np.NaN,np.NaN,np.NaN],
},
index=range(1,7)
)
input
My expected temporary state - you dont have to use:
temp_df = pd.DataFrame(
data={
'userid':[11,12,13],
'A1':[0.5,np.NaN,0.7],
'A2':[1,np.NaN,np.NaN],
},
index=range(1,4)
)
The final, desired dataframe:
desired_df = pd.DataFrame(
data={
'userid':[11,12,13],
'A1':[0.5,0.6,0.7],
'A2':[1,1,1],
},
index=range(1,4)
)
final, desired
You can use groupby on the userid column and calculate the means for the answer columns to get your intermediate dataframe (temp_df). Then you can just fill the missing values with the column means to get your final dataframe (desired_df).
temp_df = df.groupby('userid')[['A1', 'A2']].mean()
desired_df = temp_df.fillna(temp_df.mean())
print desired_df gives:
A1 A2
userid
11 0.5 1.0
12 0.6 1.0
13 0.7 1.0

Related Links

Pandas dataframe use column names in train data to select same column names in test data
Pandas Multivariate Linear Regression by Group and Saving Results as csv
Pandas, Using .loc on a cell from another row
Integral value in csv file verification using pandas module
How to add values to the pandas dataframe coulmn depending upon value of column in other dataframe
In Python Pandas using cumsum with groupby
apply function to columns in dataframe [duplicate]
line break symbols of pandas dataframe at pycharm interactive console
How to render two pd.DataFrames in jupyter notebook side by side?
Slicing data in ipython using pandas datetime
pandas resampling without performing statistics
how to insert a new integer index ipython pandas
Is it possible to insert a worksheet into an existing workbook using Python?
Apache Spark - sqlContext.sql to pandas
Deleting DataFrame row in Python Pandas based on column values [duplicate]
Creating a new column value based on calculating existing column value in a data frame

Categories

HOME
sendgrid
log4j
osgi
image
zeromq
isabelle
onedrive
android-4.4-kitkat
react-redux
alpha
blueprintjs
node-notifier
basic
windows-10-universal
modx-revolution
quickbooks
google-apps-marketplace
circuit
jsprit
iolanguage
lombok
finite-automata
reactive-cocoa
google-cloud-speech
lucene.net
visual-composer
swingx
plunker
java-3d
css-animations
mustache.php
language-agnostic
php-openssl
opennlp
one-to-many
bootstrap-material-design
uninstall
neo4j-spatial
ping
nat
directx-10
bosh
functor
pdb
gtrendsr
gammu
http-referer
wpf-controls
git-merge
occlusion
impersonation
unobtrusive-validation
lumberjack
gzipstream
reactive-cocoa-5
cubic-spline
long-polling
kbuild
dtexec
r-forge
flutterwave
lync-client-sdk
yt-project
messenger
google-feed-api
spring-android
ado.net-entity-data-model
url-masking
mathematica-frontend
websitepanel
connect-by
nsviewcontroller
blackberry-10
essence
captivenetwork
asp.net5
xpath-1.0
uid
freedesktop.org
ora-00900
retina
fouc
tarjans-algorithm
0xdbe
ibaction
python-green
xcode-6.2
php-5.4
project-planning
responsive-slides
dml
elliptic-curve
jubula
lcs
wsdl-2.0
tinn-r
bitsharp
first-responder
boost-filesystem
inotifycollectionchanged
android-sdk-2.1
telerik-scheduler
fluent-interface
zend-translate
ext3
pascal-fc
mdac
document-conversion

Resources

Database Users
RDBMS discuss
Database Dev&Adm
javascript
java
csharp
php
android
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App