diff options
author | Péter Diviánszky <divipp@gmail.com> | 2016-04-07 17:02:26 +0200 |
---|---|---|
committer | Péter Diviánszky <divipp@gmail.com> | 2016-04-07 17:02:26 +0200 |
commit | 2e65df79cda76000ae0b69b8b9b85f4abac7f084 (patch) | |
tree | 50a662e00214996b0ebe33b87c0d90da77e45f27 /doc | |
parent | 1fb8403f0c3f56ddcc61f467e47dd8722ac55293 (diff) |
more complete documentation of pattern match compilation
Diffstat (limited to 'doc')
-rw-r--r-- | doc/guide.pandoc | 309 |
1 files changed, 256 insertions, 53 deletions
diff --git a/doc/guide.pandoc b/doc/guide.pandoc index 4fba4fe0..6b50db80 100644 --- a/doc/guide.pandoc +++ b/doc/guide.pandoc | |||
@@ -292,18 +292,18 @@ Pattern match compilation | |||
292 | 292 | ||
293 | Main task: | 293 | Main task: |
294 | 294 | ||
295 | - transform pattern matching and guards to case expressions | 295 | - transform pattern matching and guards to simple case expressions |
296 | 296 | ||
297 | Other tasks: | 297 | Other tasks: |
298 | 298 | ||
299 | - exhaustiveness check (is pattern matching total?) | 299 | - exhaustiveness check (is pattern matching total?) |
300 | - reachability check (are there superfluous expressions involved?) | 300 | - reachability check (are there superfluous expressions involved?) |
301 | - pattern guards desugaring | 301 | - [pattern guards](https://wiki.haskell.org/Pattern_guard) desugaring |
302 | - view patterns desugaring | 302 | - [view patterns](https://ocharles.org.uk/blog/posts/2014-12-02-view-patterns.html) desugaring |
303 | 303 | ||
304 | Design decisions: | 304 | Design decisions: |
305 | 305 | ||
306 | - The same algorithm is used with different parameters to compile all syntactic structures which may use pattern matching: | 306 | - The same algorithm is used to compile all language structures which contain pattern matching: |
307 | - function alternatives | 307 | - function alternatives |
308 | - case expressions | 308 | - case expressions |
309 | - lambda expressions | 309 | - lambda expressions |
@@ -311,11 +311,14 @@ Design decisions: | |||
311 | - open type family instances | 311 | - open type family instances |
312 | - type class instances, type class instance heads (look at type class desugaring) | 312 | - type class instances, type class instance heads (look at type class desugaring) |
313 | - A new data structure, *pattern guard trees* are used as intermediate representation. | 313 | - A new data structure, *pattern guard trees* are used as intermediate representation. |
314 | Pattern match compilation is done in two steps: | 314 | Pattern match compilation is done in three steps: |
315 | 1. all syntax sugars (pattern matching, guards, view patterns, ...) are transformed to pattern guard trees | 315 | |
316 | 2. pattern guard trees are optimized and compiled to case expressions | 316 | 1. language structures are transformed into pattern guard trees |
317 | - Exhaustiveness and reachability checks are done also in step 2. | 317 | 1. pattern guard trees are normalized |
318 | This approach is less error-prone then doing them in separate phase. | 318 | 1. normalized pattern guard trees are compiled to case expressions |
319 | |||
320 | - Exhaustiveness and reachability checks are done also in step 3. | ||
321 | This approach is less error-prone and more cheap than doing them in a separate phase. | ||
319 | 322 | ||
320 | 323 | ||
321 | ### Pattern guard trees | 324 | ### Pattern guard trees |
@@ -325,21 +328,19 @@ Pattern guard trees is an intermediate representation to which all pattern match | |||
325 | Syntax of pattern guards trees (used only in this documentation): | 328 | Syntax of pattern guards trees (used only in this documentation): |
326 | 329 | ||
327 | ----------- -- ---------------------------------------------------------- ----------------------- | 330 | ----------- -- ---------------------------------------------------------- ----------------------- |
328 | *guardtree* := *expression* **rhs** | 331 | *guardtree* := *expression* **success** |
329 | | *guardtree* `|` *guardtree* **alternative patterns** | 332 | | `MatchFailure` **failure** |
330 | | `MatchFailure` | 333 | | *guardtree* `|` *guardtree* **alternatives** |
331 | | *pattern* `<-` *expression* `,` *guardtree* **pattern guard** | 334 | | *pattern* `<-` *expression* `,` *guardtree* **pattern guard** |
332 | | `let` *bindings* `in` *guardtree* **let binding** | 335 | | `let` *bindings* `in` *guardtree* **let binding** |
333 | ----------- -- ---------------------------------------------------------- ----------------------- | 336 | ----------- -- ---------------------------------------------------------- ----------------------- |
334 | 337 | ||
335 | Remarks: | 338 | Pattern guard trees are generalizations of [pattern guards](https://wiki.haskell.org/Pattern_guard): |
336 | 339 | ||
337 | - Pattern guard trees are generalizations of [pattern guards](https://wiki.haskell.org/Pattern_guard): | 340 | - pattern guard trees are expressions (may appear whereever an expression may appear) |
338 | - pattern guard trees are expressions (may appear whereever an expression may appear) | 341 | - pattern guard trees may have alternatives not only on the top level |
339 | - pattern guard trees may have alternatives not only on the top level | 342 | For example, in `[] <- x, (Nothing <- y, ... | True <- z, ...)` both alternatives will match only if `x` is an empty list. |
340 | For example, in `[] <- x, (Nothing <- y, ... | True <- z, ...)` both alternatives will match only if `x` is an empty list. | 343 | - pattern guard trees may have let bindings not only on the top level |
341 | - pattern guard trees may have let bindings not only on the top level | ||
342 | - Pattern guard trees form a monoid with alternative patterns and `MatchFailure`. | ||
343 | 344 | ||
344 | Examples: | 345 | Examples: |
345 | 346 | ||
@@ -349,22 +350,6 @@ Examples: | |||
349 | not x = (True <- x, False) | (False <- x, True) | 350 | not x = (True <- x, False) | (False <- x, True) |
350 | ~~~~~ | 351 | ~~~~~ |
351 | 352 | ||
352 | 2. Definition of `zip`: | ||
353 | |||
354 | ~~~~~ {.haskell} | ||
355 | zip [] _ = [] | ||
356 | zip x [] = [] | ||
357 | zip (a:as) (b:bs) = (a,b) : zip as bs | ||
358 | ~~~~~ | ||
359 | |||
360 | A possible translation to pattern guard trees: | ||
361 | |||
362 | ~~~~~ {.haskell} | ||
363 | zip v0 v1 | ||
364 | = ([] <- v0, []) | ||
365 | | (let x = v0 in [] <- v1, []) | ||
366 | | ((:) a as <- v0, (:) b bs <- v1, (a, b): zip as bs) | ||
367 | ~~~~~ | ||
368 | 353 | ||
369 | #### Semantics | 354 | #### Semantics |
370 | 355 | ||
@@ -393,6 +378,21 @@ data GuardTree a where | |||
393 | PatternGuard :: Pattern b -> Expression b -> GuardTree a -> GuardTree a | 378 | PatternGuard :: Pattern b -> Expression b -> GuardTree a -> GuardTree a |
394 | ~~~~~ | 379 | ~~~~~ |
395 | 380 | ||
381 | #### Basic properties of pattern guard trees | ||
382 | |||
383 | ------------------------------------- -------------------------- ------------------------------- | ||
384 | t `|` (t' `|` t'') = (t `|` t') `|` t'' (`|`) is associative | ||
385 | `MatchFailure` `|` t = t `MatchFailure` is left unit | ||
386 | t `|` `MatchFailure` = t `MatchFailure` is right unit | ||
387 | t `|` t = t (`|`) is idempotent | ||
388 | e `|` t = e success ignores alternative | ||
389 | C1 p1 ... pn `<-` C2 e1 ... em`,` t = `MatchFailure` constructor match failure | ||
390 | ------------------------------------- -------------------------- ------------------------------- | ||
391 | |||
392 | These are not true because of different strictness properties: | ||
393 | |||
394 | - (p `<-` e, `MatchFailure`) = `MatchFailure` | ||
395 | |||
396 | 396 | ||
397 | ### Desugaring to pattern guard trees | 397 | ### Desugaring to pattern guard trees |
398 | 398 | ||
@@ -405,24 +405,26 @@ data GuardTree a where | |||
405 | 405 | ||
406 | Example: | 406 | Example: |
407 | 407 | ||
408 | Definition of `zip`: | ||
409 | |||
408 | ~~~~~ {.haskell} | 410 | ~~~~~ {.haskell} |
409 | zip [] _ = [] | 411 | zip [] _ = [] |
410 | zip x [] = [] | 412 | zip x [] = [] |
411 | zip (a:as) (b:bs) = (a,b) : zip as bs | 413 | zip (a:as) (b:bs) = (a,b) : zip as bs |
412 | ~~~~~ | 414 | ~~~~~ |
413 | 415 | ||
414 | Transformed code: | 416 | Translation to pattern guard trees: |
415 | 417 | ||
416 | ~~~~~ {.haskell} | 418 | ~~~~~ {.haskell} |
417 | zip v0 v1 | 419 | zip v0 v1 |
418 | = ([] <- v0, _ <- v1, []) | 420 | = ([] <- v0, _ <- v1, []) |
419 | | (x <- v0, [] <- v1, []) | 421 | | (x <- v0, [] <- v1, []) |
420 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | 422 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) |
421 | ~~~~~ | 423 | ~~~~~ |
422 | 424 | ||
423 | |||
424 | #### View patterns | 425 | #### View patterns |
425 | 426 | ||
427 | ... | ||
426 | 428 | ||
427 | ### Normalization of pattern guard trees | 429 | ### Normalization of pattern guard trees |
428 | 430 | ||
@@ -432,22 +434,199 @@ The normal form of pattern guard trees contains only simple constructor patterns | |||
432 | *conpattern* := *conname* (*variable*)* | 434 | *conpattern* := *conname* (*variable*)* |
433 | ------------- -- ---------------------------------------------------------- ----------------------- | 435 | ------------- -- ---------------------------------------------------------- ----------------------- |
434 | 436 | ||
435 | The normalization process is driven by the shape of patterns: | 437 | The following equations turn all patterns into simple constructor patterns: |
436 | |||
437 | 438 | ||
438 | pattern shape input output | 439 | ---------------------------- -------------------------------------------------------------- ------------------- |
439 | -------------- ---------------------------- -- -------- | 440 | `_` `<-` e`,` t = t wildcard elim |
440 | wildcard `_` `<-` e `,` t ~> t | 441 | v `<-` e`,` t = `let` v = e `in` t variable elim |
441 | variable v `<-` e `,` t ~> `let` v = e `in` t | 442 | v`@`p `<-` e`,` t = `let` v = e `in` p `<-` v`,` t at-pattern elim |
442 | at-pattern v`@`p `<-` e `,` t ~> `let` v = e `in` p `<-` v `,` t | 443 | C p1 ... pn `<-` e`,` t = C w1 ... wn `<-` e`,` p1 `<-` w1`,` ...`,` pn `<-` wn`,` t nested pattern elim |
443 | nested pattern C p1 ... pn `<-` e `,` t ~> C w1 ... wn `<-` e `,` p1 `<-` w1 `,` ... `,` pn `<-` wn `,` t | 444 | (f `->` p) `<-` e`,` t = p `<-` f e`,` t view pattern elim |
444 | view pattern (f `->` p) `<-` e `,` t ~> p `<-` f e `,` t | 445 | ---------------------------- -------------------------------------------------------------- ------------------- |
445 | -------------- ---------------------------- -- -------- | ||
446 | 446 | ||
447 | 447 | ||
448 | ### Compilation of pattern guard trees | 448 | ### Compilation of pattern guard trees |
449 | 449 | ||
450 | #### Filtering | 450 | |
451 | Compilation is driven by the following equations: | ||
452 | |||
453 | 1. C ... `<-` e`,` t | t' | ||
454 | = `seq` e (C ... `<-` e`,` t | t') | ||
455 | |||
456 | 1. `seq` e t | ||
457 | = `case` e `of` p1 `->` t; ...; pn `->` t | ||
458 | if p1, ..., pn is complete | ||
459 | |||
460 | 1. `case` e `of` ...; C e1 ... en `->` ... (C v1 ... vn `<-` e`,` t | t') ...; ... | ||
461 | = `case` e `of` ...; C e1 ... en `->` ... (`let` v1 = e1; ...; vn = en `in` t) ...; ... | ||
462 | |||
463 | 1. `case` e `of` ...; C1 e1 ... en `->` ... (C2 v1 ... vn `<-` e`,` t | t') ...; ... | ||
464 | = `case` e `of` ...; C1 e1 ... en `->` ... t' ...; ... | ||
465 | |||
466 | |||
467 | |||
468 | # Complete example | ||
469 | |||
470 | Source code: | ||
471 | |||
472 | ~~~~~ {.haskell} | ||
473 | zip [] _ = [] | ||
474 | zip x [] = [] | ||
475 | zip (a:as) (b:bs) = (a,b) : zip as bs | ||
476 | ~~~~~ | ||
477 | |||
478 | Transformed to pattern guard trees: | ||
479 | |||
480 | ~~~~~ {.haskell} | ||
481 | zip v0 v1 | ||
482 | = ([] <- v0, _ <- v1, []) | ||
483 | | (x <- v0, [] <- v1, []) | ||
484 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
485 | ~~~~~ | ||
486 | |||
487 | Simplified pattern guard trees: | ||
488 | |||
489 | ~~~~~ {.haskell} | ||
490 | zip v0 v1 | ||
491 | = ([] <- v0, []) | ||
492 | | (let x = v0 in [] <- v1, []) | ||
493 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
494 | ~~~~~ | ||
495 | |||
496 | Apply (1): | ||
497 | |||
498 | ~~~~~ {.haskell} | ||
499 | zip v0 v1 = v0 `seq` | ||
500 | ([] <- v0, []) | ||
501 | | (let x = v0 in [] <- v1, []) | ||
502 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
503 | ~~~~~ | ||
504 | |||
505 | Apply (2) with patterns `[]` and `x:xs`: | ||
506 | |||
507 | ~~~~~ {.haskell} | ||
508 | zip v0 v1 = case v0 of | ||
509 | [] -> | ||
510 | ([] <- v0, []) | ||
511 | | (let x = v0 in [] <- v1, []) | ||
512 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
513 | x: xs -> | ||
514 | ([] <- v0, []) | ||
515 | | (let x = v0 in [] <- v1, []) | ||
516 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
517 | ~~~~~ | ||
518 | |||
519 | Apply (3); empty let is eliminated: | ||
520 | |||
521 | ~~~~~ {.haskell} | ||
522 | zip v0 v1 = case v0 of | ||
523 | [] -> [] | ||
524 | x: xs -> | ||
525 | ([] <- v0, []) | ||
526 | | (let x = v0 in [] <- v1, []) | ||
527 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
528 | ~~~~~ | ||
529 | |||
530 | Apply (4): | ||
531 | |||
532 | ~~~~~ {.haskell} | ||
533 | zip v0 v1 = case v0 of | ||
534 | [] -> [] | ||
535 | x: xs -> | ||
536 | (let x = v0 in [] <- v1, []) | ||
537 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
538 | ~~~~~ | ||
539 | |||
540 | Let floating: | ||
541 | |||
542 | ~~~~~ {.haskell} | ||
543 | zip v0 v1 = case v0 of | ||
544 | [] -> [] | ||
545 | x: xs -> let x = v0 in | ||
546 | ([] <- v1, []) | ||
547 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
548 | ~~~~~ | ||
549 | |||
550 | Apply (1): | ||
551 | |||
552 | ~~~~~ {.haskell} | ||
553 | zip v0 v1 = case v0 of | ||
554 | [] -> [] | ||
555 | x: xs -> let x = v0 in v1 `seq` | ||
556 | ([] <- v1, []) | ||
557 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
558 | ~~~~~ | ||
559 | |||
560 | Apply (2) with patterns `[]` and `y: ys`: | ||
561 | |||
562 | ~~~~~ {.haskell} | ||
563 | zip v0 v1 = case v0 of | ||
564 | [] -> [] | ||
565 | x: xs -> let x = v0 in case v1 of | ||
566 | [] -> | ||
567 | ([] <- v1, []) | ||
568 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
569 | y: ys -> | ||
570 | ([] <- v1, []) | ||
571 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
572 | ~~~~~ | ||
573 | |||
574 | Apply (3): | ||
575 | |||
576 | ~~~~~ {.haskell} | ||
577 | zip v0 v1 = case v0 of | ||
578 | [] -> [] | ||
579 | x: xs -> let x = v0 in case v1 of | ||
580 | [] -> [] | ||
581 | y: ys -> | ||
582 | ([] <- v1, []) | ||
583 | | (a:as <- v0, b:bs <- v1, (a,b) : zip as bs) | ||
584 | ~~~~~ | ||
585 | |||
586 | Apply (4): | ||
587 | |||
588 | ~~~~~ {.haskell} | ||
589 | zip v0 v1 = case v0 of | ||
590 | [] -> [] | ||
591 | x: xs -> let x = v0 in case v1 of | ||
592 | [] -> [] | ||
593 | y: ys -> | ||
594 | a:as <- v0, b:bs <- v1, (a,b) : zip as bs | ||
595 | ~~~~~ | ||
596 | |||
597 | Apply (3): | ||
598 | |||
599 | ~~~~~ {.haskell} | ||
600 | zip v0 v1 = case v0 of | ||
601 | [] -> [] | ||
602 | x: xs -> let x = v0 in case v1 of | ||
603 | [] -> [] | ||
604 | y: ys -> let a = x; as = xs in b:bs <- v1, (a,b) : zip as bs | ||
605 | ~~~~~ | ||
606 | |||
607 | Apply (3): | ||
608 | |||
609 | ~~~~~ {.haskell} | ||
610 | zip v0 v1 = case v0 of | ||
611 | [] -> [] | ||
612 | x: xs -> let x = v0 in case v1 of | ||
613 | [] -> [] | ||
614 | y: ys -> let a = x; as = xs in let b = y; bs = ys in (a,b) : zip as bs | ||
615 | ~~~~~ | ||
616 | |||
617 | Simplification: | ||
618 | |||
619 | ~~~~~ {.haskell} | ||
620 | zip v0 v1 = case v0 of | ||
621 | [] -> [] | ||
622 | x: xs -> case v1 of | ||
623 | [] -> [] | ||
624 | y: ys -> (x, y) : zip xs ys | ||
625 | ~~~~~ | ||
626 | |||
627 | |||
628 | |||
629 | #### Concrete algorithm | ||
451 | 630 | ||
452 | Filtering is a transformation on pattern guard trees, given an expession and a constructor name: | 631 | Filtering is a transformation on pattern guard trees, given an expession and a constructor name: |
453 | 632 | ||
@@ -463,12 +642,36 @@ filterTree e a@(Con c es) (PatternGuard (PCon c' vs) e' t) | |||
463 | | otherwise = filterTree e a (let $(bind vs es) t) | 642 | | otherwise = filterTree e a (let $(bind vs es) t) |
464 | ~~~~~ | 643 | ~~~~~ |
465 | 644 | ||
645 | ... | ||
646 | |||
647 | |||
648 | #### Checks | ||
649 | |||
650 | Exhaustiveness check is just finding `MatchFailure` values in the result. | ||
651 | |||
652 | Reachability check is just finding parts of source code not present in result. | ||
653 | |||
654 | |||
655 | #### Extensions | ||
656 | |||
657 | Step (2) is very general: | ||
658 | |||
659 | - For pattern matching just on a single constructor C, the patterns {C v1 ... vn, _} can be used. | ||
660 | This can prevent quadratic code blowup in case of ADT with many constructors. | ||
661 | - Pattern matching on types is possible in a similar way. | ||
662 | - In case of GADTs, for better warnings, type information can be taken into account to restrict the set of matching constructor patterns, | ||
663 | similar to [GADTs Meet Their Match](https://people.cs.kuleuven.be/~george.karachalias/papers/p424-karachalias.pdf), implemented in GHC 8. | ||
664 | |||
665 | ### Discussion | ||
466 | 666 | ||
467 | ### TODO | 667 | - Extension with dependent pattern matching seems possible but has to be explored. |
668 | - The algorithm produce fast code, but in some cases the size of produced code may blow up. | ||
669 | Compilation driven by more than 4 equations might prevent this. | ||
670 | - [A prototype was implemented.](https://github.com/lambdacube3d/lambdacube-compiler/blob/master/prototypes/PatCompile.hs) | ||
468 | 671 | ||
469 | - pattern matching on GADTs | 672 | #### Related work |
470 | - dependent pattern matching | ||
471 | 673 | ||
674 | Have not seen this approach before; help to compare it with others. | ||
472 | 675 | ||
473 | 676 | ||
474 | Desugared source code | 677 | Desugared source code |