首页 > python > 比较列表的Python字典中的值

比较列表的Python字典中的值 (Comparing values in a Python dict of lists)

2017-03-12 pythondictionary

问题

我有一个列表的字典,数字作为键,字符串列表作为值。例如,

my_dict = {
    1: ['bush', 'barck obama', 'general motors corporation'],
    2: ['george bush', 'obama'],
    3: ['general motors', 'george w. bush']
}

我想要的是比较每个列表中的每个项目(对于每个键),如果该项目是另一个项目的子字符串 - 将其更改为更长的项目。所以,这是一种非常糟糕的共识解决方案。

无法真正地围绕着如何做到这一点。这是我的想法的伪代码:

for key, value in dict:
    for item in value:
        if item is substring of other item in any other key, value:
            item = other item

所以我的词典最终会看起来像这样:

my_dict = {
    1: ['george w. bush', 'barck obama', 'general motors corporation'],
    2: ['george w. bush', 'barck obama'],
    3: ['general motors corporation', 'george w. bush']
}

对不起,如果我没有表达出明显的问题。

解决方法

这是一个列表字典的事实是无关紧要的。有些字符串必须根据其他字符串进行修改。

这些是字符串:

all_strings = [s for string_list in my_dict.values() for s in string_list]

要替换字符串:

def expand_string(s, all_strings):
    # compare words
    matches = [s2 for s2 in all_strings
               if all(word in s2.split() for word in s.split())]
    if matches:
        # find longest result
        return sorted(matches, key=len, reverse=True)[0]
    else:
        # this wont't really happen, but anyway
        return s

要替换一切:

result = {k: [expand_string(s, all_strings) for s in v]
          for k, v in my_dict.items()}

问题

I have a dict of lists with numbers as keys and lists of strings as values. E.g.,

my_dict = {
    1: ['bush', 'barck obama', 'general motors corporation'],
    2: ['george bush', 'obama'],
    3: ['general motors', 'george w. bush']
}

What I want is to compare each item in each list (for every key), and if the item is a substring of another item – change it to a longer one. So, kind of a very dirty coreference resolution thing.

Can't really wrap my head around how to do it. Here's pseudo code of what I had in mind:

for key, value in dict:
    for item in value:
        if item is substring of other item in any other key, value:
            item = other item

So that my dictionary in the end will end up looking like this:

my_dict = {
    1: ['george w. bush', 'barck obama', 'general motors corporation'],
    2: ['george w. bush', 'barck obama'],
    3: ['general motors corporation', 'george w. bush']
}

Sorry if I didn't express what the problem is clearly enough.

解决方法

The fact that this is a dictionary of lists is irrelevant. There are strings which have to be modified depending on other strings.

These are the strings:

all_strings = [s for string_list in my_dict.values() for s in string_list]

To replace a string:

def expand_string(s, all_strings):
    # compare words
    matches = [s2 for s2 in all_strings
               if all(word in s2.split() for word in s.split())]
    if matches:
        # find longest result
        return sorted(matches, key=len, reverse=True)[0]
    else:
        # this wont't really happen, but anyway
        return s

To replace everything:

result = {k: [expand_string(s, all_strings) for s in v]
          for k, v in my_dict.items()}
相似信息