Two unaligned Pandas Series: concat raises error, adding does not but it returns a weird answer
Note: I am working with a rather old-ish Pandas 0.16.2, in Python 2.7.11. My simplistic conceptual model for the adding of two Series was that it would involve an index-matching step that is similar to what goes on in a pd.concat(..., axis=1), ie. the Series indexes are lined up and then the values are added. Therefore (modulo NaN handling I guess), I would expect u+v to work if, and only if concat([u, v], axis=1) works. In the example below I build two Series with 'unalignable' indexes. My confusion is that concat does raise an error (as expected) but the adding does not -- and even more confusing is that the result of adding comes back with everything duplicated. First I create a couple of series which have equal indexes (containing duplicates): import string, pandas as pd # Create a series with an index that has duplicates u = pd.Series(range(5), index=list(string.ascii_lowercase)[:5]) u = pd.concat([u, u]) # Create another, same index but values reversed v = pd.Series(range(5)[::-1], index=list(string.ascii_lowercase)[:5]) v = pd.concat([v, v]) Here they are: In : u Out: a 0 b 1 c 2 d 3 e 4 a 0 b 1 c 2 d 3 e 4 dtype: int64 In : v Out: a 4 b 3 c 2 d 1 e 0 a 4 b 3 c 2 d 1 e 0 dtype: int64 They can be added since the indices are equal: In : u+v Out: a 4 b 4 c 4 d 4 e 4 a 4 b 4 c 4 d 4 e 4 dtype: int64 If we sort v then its index gets reordered and since there is no longer an obvious way to line up v with u any more it is not surprising that concat raises an error: In : v.sort() In : v Out: e 0 e 0 d 1 d 1 c 2 c 2 b 3 b 3 a 4 a 4 dtype: int64 In : pd.concat([u, v], axis=1) .... ValueError: cannot reindex from a duplicate axis However, adding still works but bizarrely returns a longer series: In : u+v Out: a 4 a 4 a 4 a 4 b 4 b 4 b 4 b 4 c 4 c 4 c 4 c 4 d 4 d 4 d 4 d 4 e 4 e 4 e 4 e 4 dtype: int64 What happened here?
Plotting separate plots for each decile in time series data
how to convert pandas dataframe to libsvm format?
Merging two dataframes based on a date between two other dates without a common column
Why I can't change the series format?
Copy numpy array into Panda multiindex (same size)
Average Previous and Current Row - Pandas
seaborn pointplot above swarmplot
Edit field and append value to a python dataframe column
column_stack returns non cotiguous array
pandas: conditionally select a row cell for each column based on a mask
pandas custom function apply on melted dataframe
How to check for boolean codition in pandas dataframe
Reading batches of data from BigQuery into Datalab
Jupyter/ipywidgets sorting dataframe on two levels
Groupby.sum() giving ValueError: overflow in timedelta operation
Why does DataFrameGroupBy.boxplot method throw error when given argument “subplots=True/False”?