• 首页 首页 icon
  • 工具库 工具库 icon
    • IP查询 IP查询 icon
  • 内容库 内容库 icon
    • 快讯库 快讯库 icon
    • 精品库 精品库 icon
    • 问答库 问答库 icon
  • 更多 更多 icon
    • 服务条款 服务条款 icon

为什么要在 HashMap 使用哈希方法

用户头像
it1352
帮助5

问题说明

hash 方法状态的Java doc,

Java doc of hash method states,

检索对象散列码并将补充散列函数应用于结果散列,以防止劣质散列函数.这很关键,因为 HashMap 使用长度为二的幂的哈希表,否则会遇到低位没有差异的 hashCode 的冲突.

Retrieve object hash code and applies a supplemental hash function to the result hash, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits.

我无法理解的是,

1) 为什么 HashMap 使用长度为二的幂的哈希表?

1) Why HashMap uses power-of-two length hash tables?

在声明表时也有说明:

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table;

为什么会有这个限制?

2) 否则会遇到低位相同的 hashCode 冲突. 是什么意思?

正确答案

#1

hashmap 的目的是快速缩小在搜索特定键时需要查看的对象数量(理想情况下为 0 或 1).

The purpose of a hashmap is to very quickly narrow down how many objects you need to look at (ideally 0 or 1) when searching for a specific key.

HashMap.get(key)的一般方法如下:

  1. 调用 key.hashCode() 以获取表示对象的单个整数.

  1. Call key.hashCode() to get a single integer that represents the object.

查看基于该哈希码的哈希桶",其中可以包含零个或多个条目.

Look in a hash "bucket" based on that hashcode, which can contain zero or more entries.

遍历bucket中的每一个entry,看看是否有entry的key是.equals(key).如果是这样,请将其退回.如果存储桶中没有条目与搜索的条目具有相同的键,则返回 null.

Go through each entry in the bucket and find if any entry's key is .equals(key). If so, return it. If no entry in the bucket has an equal key to the one searched for, return null.

good hashmap 和 bad hashmap 的区别在于速度.您必须平衡所有这三个问题:

The difference between a good hashmap and a bad hashmap is speed. You have to balance all three of these concerns:

  1. 您可以多快将密钥转换为哈希码?

  1. How quickly can you transform the key into a hashcode?

两个不同的键映射到同一个哈希码的频率如何?

How often do two different keys map to the same hashcode?

您多久会将具有不同哈希码的两个密钥放入同一个桶"中?

How often will you put two keys with different hashcodes into the same "bucket"?

Java 的设计者选择了一组他们认为最平衡的折衷方案.没有正确的答案,但您必须选择一种特定的方法,并在文档中写入该方法是什么.

Java's designers have chosen a set of tradeoffs they think balances best. There is no right answer, but you do have to choose a specific approach, and write into the documentation what that approach is.

Java 的设计者可能有一些基于添加到哈希图中的典型数据的统计证据.

Java's designers likely have some statistic evidence based on typical data added to hashmaps.

他们选择通过提取哈希码的最低 n 位来将哈希码转换为桶,因为这些位的变化比高位更频繁.他们选择提取位而不是另一种将哈希码转换为桶的典型方法(除以素数后的整数余数),因为在 Java 最常部署到的平台上,这通常是一种更快的操作.

They chose to convert hashcode to bucket by extracting the lowest n bits of the hashcode, because those vary more often than the upper bits. They chose extracting bits over another typical method of converting hashcode to bucket (integer remainder after dividing by a prime number) because it's typically a faster operation on the platforms Java is most commonly deployed to.

Java 的设计者可能发现,第 1 步,hashCode() 的实现,是由 Java 用户编写的,而且通常很糟糕,为他们想要的许多对象返回相同的 hashCode存储在同一个哈希图中.想象一下,如果 hashCode 是这样的:

What Java's designers may have found is that step 1, the implementation of hashCode(), is written by Java users, and can often be terrible, returning the same hashCode for lots of objects they want to store in the same hashmap. Imagine if the hashCode was this:

public class MyInteger {
    final int i;
    public MyInteger(int i) {
        this.i = i;
    }
    public int hashCode() {
        return i << 24; // will return 0x00000000, 0x01000000, etc.
    }
    public boolean equals(Object o) {
        return (o != null) && (o instanceof MyInteger) && ((MyInteger)o).i == i;
    }
}

这就是他们所说的劣质";哈希码的低位变化不大.在这种病态的实现中,低 24 位根本没有变化!

This is what they call "poor quality"; the lower bits of the hashcode don't vary very much. In this pathological implementation, the lower 24 bits don't vary at all!

在这种情况下,对于任何小于 16,777,216 个桶的 hashmap,可以进入 hashmap 的每个键都将转到桶 0.其他 16,777,215 个桶将为空.

In this case, for hashmaps any smaller than 16,777,216 buckets, every single key that could go in the hashmap will go to bucket 0. The other 16,777,215 buckets will be empty.

其他人的哈希码可能没有这么糟糕,但它们已经够糟糕了,以至于 Java 的设计者选择添加第二个哈希码来帮助提高两个不同键进入两个不同存储桶的机会,从而减少对象的数量每次检索给定键时都需要检查是否相等.

Other people's hashcodes may not be as bad as this, but they're bad enough that Java's designers chose to add a second hashcode to help improve the chance that two different keys will go into two different buckets, thus reducing how many objects need to be checked for equality each time a given key is retrieved.

这篇好文章是转载于:学新通技术网

  • 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
  • 本站站名: 学新通技术网
  • 本文地址: /reply/detail/tangicake
系列文章
更多 icon
同类精品
更多 icon
继续加载