Zipping (computer science)

In computer science, zipping is a function which maps a tuple of sequences into a sequence of tuples. This name zip derives from the action of a zipper in that it interleaves two formerly disjoint sequences. The inverse function is unzip.

Example

Given the three words cat, fish and be where |cat| is 3, |fish| is 4 and |be| is 2. Let denote the length of the longest word which is fish; . The zip of cat, fish, be is then 4 tuples of elements:

where # is a symbol not in the original alphabet. In Haskell this truncates to the shortest sequence , where :

zip3 "cat" "fish" "be"
-- [('c','f','b'),('a','i','e')]

Definition

Let Σ be an alphabet, # a symbol not in Σ.

Let x1x2... x|x|, y1y2... y|y|, z1z2... z|z|, ... be n words (i.e. finite sequences) of elements of Σ. Let denote the length of the longest word, i.e. the maximum of |x|, |y|, |z|, ... .

The zip of these words is a finite sequence of n-tuples of elements of (Σ ∪ {#}), i.e. an element of :

,

where for any index i > |w|, the wi is #.

The zip of x, y, z, ... is denoted zip(x, y, z, ...) or xyz ⋆ ...

The inverse to zip is sometimes denoted unzip.

A variation of the zip operation is defined by:

where is the minimum length of the input words. It avoids the use of an adjoined element , but destroys information about elements of the input sequences beyond .

In programming languages

Zip functions are often available in programming languages, often referred to as

zip. In Lisp-dialects one can simplymap the desired function over the desired lists,map is variadic in Lisp so it can take an arbitrary number of lists as argument. An example from Clojure:

;; `nums' contains an infinite list of numbers (0 1 2 3 ...)
(def nums (range))
(def tens [10 20 30])
(def firstname "Alice")

;; To zip (0 1 2 3 ...) and [10 20 30] into a vector, invoke `map vector' on them; same with list
(map vector nums tens)           ; ⇒ ([0 10] [1 20] [2 30])
(map list nums tens)             ; ⇒ ((0 10) (1 20) (2 30))
(map str nums tens)              ; ⇒ ("010" "120" "230")

;; `map' truncates to the shortest sequence; note missing \c and \e from "Alice"
(map vector nums tens firstname) ; ⇒ ([0 10 \A] [1 20 \l] [2 30 \i])
(map str nums tens firstname)    ; ⇒ ("010A" "120l" "230i")

;; To unzip, apply `map vector' or `map list'
(apply map list (map vector nums tens firstname))
;; ⇒ ((0 1 2) (10 20 30) (\A \l \i))

In Common Lisp:

(defparameter nums '(1 2 3))
(defparameter tens '(10 20 30))
(defparameter firstname "Alice")

(mapcar #'list nums tens)
;; ⇒ ((1 10) (2 20) (3 30))

(mapcar #'list nums tens (coerce firstname 'list))
;; ⇒ ((1 10 #\A) (2 20 #\l) (3 30 #\i)) — truncates on shortest list

;; Unzips
(apply #'mapcar #'list (mapcar #'list nums tens (coerce firstname 'list)))
;; ⇒ ((1 2 3) (10 20 30) (#\A #\l #\i))

Languages such as Python provide azip() function.zip() in conjunction with the* operator unzips a list:

>>> nums = [1, 2, 3]
>>> tens = [10, 20, 30]
>>> firstname = 'Alice'

>>> zipped = list(zip(nums, tens))
>>> zipped
[(1, 10), (2, 20), (3, 30)]

>>> list(zip(*zipped)) # unzip
[(1, 2, 3), (10, 20, 30)]

>>> zipped2 = list(zip(nums, tens, list(firstname)))
>>> zipped2 # zip, truncates on shortest
[(1, 10, 'A'), (2, 20, 'l'), (3, 30, 'i')] 
>>> list(zip(*zipped2)) # unzip
[(1, 2, 3), (10, 20, 30), ('A', 'l', 'i')]

Haskell has a method of zipping sequences but requires a specific function for each arity (zip for two sequences,zip3 for three etc.), similarly the functionsunzip andunzip3 are available for unzipping:

-- nums contains an infinite list of numbers [1, 2, 3, ...] 
nums = [1..]
tens = [10, 20, 30]
firstname = "Alice"

zip nums tens
-- ⇒ [(1,10), (2,20), (3,30)] — zip, truncates infinite list
unzip $ zip nums tens
-- ⇒ ([1,2,3], [10,20,30]) — unzip

zip3 nums tens firstname
-- ⇒ [(1,10,'A'), (2,20,'l'), (3,30,'i')] — zip, truncates
unzip3 $ zip3 nums tens firstname
-- ⇒ ([1,2,3], [10,20,30], "Ali") — unzip

Language comparison

List of languages by support of zip:

See also

References

Uses material from the Wikipedia article Zipping (computer science), released under the CC BY-SA 4.0 license.