Two unaligned Pandas Series: concat raises error, adding does not but it returns a weird answer
Note: I am working with a rather old-ish Pandas 0.16.2, in Python 2.7.11. My simplistic conceptual model for the adding of two Series was that it would involve an index-matching step that is similar to what goes on in a pd.concat(..., axis=1), ie. the Series indexes are lined up and then the values are added. Therefore (modulo NaN handling I guess), I would expect u+v to work if, and only if concat([u, v], axis=1) works. In the example below I build two Series with 'unalignable' indexes. My confusion is that concat does raise an error (as expected) but the adding does not -- and even more confusing is that the result of adding comes back with everything duplicated. First I create a couple of series which have equal indexes (containing duplicates): import string, pandas as pd # Create a series with an index that has duplicates u = pd.Series(range(5), index=list(string.ascii_lowercase)[:5]) u = pd.concat([u, u]) # Create another, same index but values reversed v = pd.Series(range(5)[::-1], index=list(string.ascii_lowercase)[:5]) v = pd.concat([v, v]) Here they are: In : u Out: a 0 b 1 c 2 d 3 e 4 a 0 b 1 c 2 d 3 e 4 dtype: int64 In : v Out: a 4 b 3 c 2 d 1 e 0 a 4 b 3 c 2 d 1 e 0 dtype: int64 They can be added since the indices are equal: In : u+v Out: a 4 b 4 c 4 d 4 e 4 a 4 b 4 c 4 d 4 e 4 dtype: int64 If we sort v then its index gets reordered and since there is no longer an obvious way to line up v with u any more it is not surprising that concat raises an error: In : v.sort() In : v Out: e 0 e 0 d 1 d 1 c 2 c 2 b 3 b 3 a 4 a 4 dtype: int64 In : pd.concat([u, v], axis=1) .... ValueError: cannot reindex from a duplicate axis However, adding still works but bizarrely returns a longer series: In : u+v Out: a 4 a 4 a 4 a 4 b 4 b 4 b 4 b 4 c 4 c 4 c 4 c 4 d 4 d 4 d 4 d 4 e 4 e 4 e 4 e 4 dtype: int64 What happened here?
Is there a GraphLab SFrame.show() equivalent in Pandas?
Filter data frame with a boolean vector based on one column
Putting a vector as a Pandas data frame element
Extra lane in heat map (pandas)
How to concat multiple pandas dataframes into one dask dataframe larger than memory?
Upsampling Dataframe in Pandas with Index + Column
Reading Google Bigquery data into dataframe
How do you divide pandas columns by a scalar?
pandas - not in index
Can I retrieve all the results from a groupby?
Pandas - 2 dataframes, add Index column of df1 to df2 on second column
How to improve the efficiency of pandas.nlargest?
Matplotlib index of datapoints from rectangleselector
How to create a view of dataframe in pandas?
Python 3 pandas add a column with if then statement using length
updating a column based on a condition without using .loc