我有一个(n,m)维数据帧,其列是“dtype对象”,具有不同长度字符串的条目。df如下所示:
col1 col2 col3 col4 ... colm
|---------------------------------------------
row1| str1,1 str1,2 str1,3 str1,4 ... str1,m
row2| str2,1 str2,2 str2,3 str2,4 ... str2,m
. | . . . . ... .
. | . . . . ... .
. | . . . . ... .
rown| strn,1 strn,2 strn,3 strn,4 ... strn,m
我希望在字符串长度必须小于10的情况下用NaN替换特定字符串,但仅限于某些列。
这是我的代码:
column_list = ['col1','col3']
df.loc[:,column_list] = df.apply(lambda x: x.str.replace(x,np.NaN) if len(x) < 10 else x)
代码运行没有错误,但遗憾的是实际上没有对这些列中的值执行任何操作。我相信我的问题与以下部分有关:
x.str.replace(x,np.NaN)
我不认为“x”应该在“替换”功能中。
感谢帮助。
谢谢
mask
在你得到字符串长度之后就做str.len
s=df.apply(lambda x : x.str.len())<10
df.loc[:,column_list]=df.loc[:,column_list].mask(s)
I have a (n,m) dimensional dataframe, with columns that are "dtype objects", that has entries of different length strings. The df looks like the following:
col1 col2 col3 col4 ... colm
|---------------------------------------------
row1| str1,1 str1,2 str1,3 str1,4 ... str1,m
row2| str2,1 str2,2 str2,3 str2,4 ... str2,m
. | . . . . ... .
. | . . . . ... .
. | . . . . ... .
rown| strn,1 strn,2 strn,3 strn,4 ... strn,m
I want to replace particular strings with NaN's on a condition that the length of the string has to be less than 10, but only for certain columns.
Here's my code:
column_list = ['col1','col3']
df.loc[:,column_list] = df.apply(lambda x: x.str.replace(x,np.NaN) if len(x) < 10 else x)
The code is running without error, but unfortunately not actually doing anything to my values in those columns. I believe my issue has to do with the following part:
x.str.replace(x,np.NaN)
I don't think "x" should be in the "replace" function.
Appreciate the help.
Thanks
Just do with mask
after you get string length by str.len
s=df.apply(lambda x : x.str.len())<10
df.loc[:,column_list]=df.loc[:,column_list].mask(s)