2 min read

为什么编号应该从零开始

今天看书(Fluent Python)的时候,看到讲 “Why Slices and Range Exclude the Last Item”,里面列了几个原因。原文如下:

The Pythonic convention of excluding the last item in slices and ranges works well with the zero-based indexing in Python, C, and many other languages. Some convention features of the convention are:

  • It’s easy to see the length of a slice or range when only the stop position is given: range(3) and my_list[:3] both produce three items.
  • It’s easy to compute the length of a slice or range when start and stop are given: just substract stop - start.
  • It’s easy to split a sequence in two parts at any index x, without overlapping: simply get my_list[:x] and my_list[x:].

But the best arguments for this convention were written by the Dutch computer scientist Edsger W. Dijkstra.

上面最后一段说,最好的理由是由著名的计算机科学家Edsger W. Dijkstra(本科上《运筹学》课的时候,有接触过他的算法,好像是最短路径标号算法,也叫Dijkstra算法,当时大家都记不住名字,还取了个中文名字 — 立即刻死去啦。)于是,我Google出这个最好的理由:

Why numbering should start at zero

To denote the subsequence of natural numbers 2, 3, …, 12 without the pernicious three dots, four conventions are open to us

  1.     2 ≤ i < 13
  2.     1 < i ≤ 12
  3.     2 ≤ i ≤ 12
  4.     1 < i < 13

Are there reasons to prefer one convention to the other? Yes, there are. The observation that conventions a) and b) have the advantage that the difference between the bounds as mentioned equals the length of the subsequence is valid. So is the observation that, as a consequence, in either convention two subsequences are adjacent means that the upper bound of the one equals the lower bound of the other. Valid as these observations are, they don’t enable us to choose between a) and b); so let us start afresh.

There is a smallest natural number. Exclusion of the lower bound —as in b) and d)— forces for a subsequence starting at the smallest natural number the lower bound as mentioned into the realm of the unnatural numbers. That is ugly, so for the lower bound we prefer the ≤ as in a) and c). Consider now the subsequences starting at the smallest natural number: inclusion of the upper bound would then force the latter to be unnatural by the time the sequence has shrunk to the empty one. That is ugly, so for the upper bound we prefer < as in a) and d). We conclude that convention a) is to be preferred.

末尾一段,硬是看了很久也没明白,后来仔细想了一想,算是应该明白了。

  • 为了表示{0,1,2,3,…,12},得这样-1 < i ≤ 12,如果使用b)这种表示方法。这样很不好,因为引入了非自然数-1。
  • 为了表示一个从0开始,但是是空的{ }序列(这在计算机里,应该是一种常见的操作),得这样0≤ i ≤-1,如果使用c)这种表示方法。这样也不好,还是引入了非自然数-1。