[TABLE OF CONTENTS] [BOTTOM] - [NEXT]

Author:  Rony G. Flatscher

For:     ANSI NCITS J18

Purpose: Collections and Set(like) Operations

Version: 0.9.2

As of:   1997-09-16

Status:  FINAL

Please send comments to Rony.Flatscher@wu-wien.ac.at

Abstract This paper discusses the set(like)-operations in conjunction with Object Rexx collections for further discussing these issues within the ANSI Rexx committee (as determined in the ANSI-Rexx meeting in Heidelberg in April 1997).

[TOP] [BOTTOM] - [PREVIOUS] [NEXT] Table of Contents


        1     Set-Operations - Overview
             

        1.1   Basic Set-Operations and Object Rexx
             

        1.1.1 Set-Operations on Sets ("Collections without Duplicates")  
             

        1.1.2 Set(like)-Operations on Collections with Duplicates 
             

        1.1.3 Rexx Code
             

        1.2   ORACLE 7.3
             

        1.2.1 "Relational Relations"
             

        1.2.2 The Reality
             

        1.2.3 SQL for ORACLE 7.3
             

        1.3   Conclusions
             

        2     Set-Operations in Object Rexx - More Detailed
             

        2.1   OOI Documentation of the Set-Operations
             

        2.2   Collection Classes and the Identity of Collection Elements  
             

        2.3   Semantics of Other 
             

        2.3.1 "Semantics" of Indexes 
                             

        2.3.2 Using the Object Rexx Builtin Collection Classes "Array" ... 


        2.3.3 General Rules for Treating "Other"
             

        3     Conclusions

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1 Set-Operations - Overview

This section discusses some set operations under Object Rexx and under a relational database (in this case ORACLE 7.3).

Whenever appropriate the Object Rexx set operations DIFFERENCE, INTERSECTION, SUBSET, UNION and XOR are taken into account.

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.1 Basic Set-Operations and Object Rexx

The examples in this section use Object Rexx sets and Object Rexx bags (for collections with duplicate elements).

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.1.1 Set-Operations on Sets ("Collections without Duplicates")

There exist the sets A={a,b} and B={b,c}. The result of any set-operation yields a set. Set operators are infix.

UNION operation

All elements of A and B are united:


    A
    UNION
    B = {a,b,c}

DIFFERENCE (MINUS) operation

The resulting set contains all elements of the first set with those elements removed which also appear in the second set:


    A
    DIFFERENCE
    B = {a}
    
 B
    DIFFERENCE
    A = {c}

XOR operation

The resulting set contains all elements of the first set which are not in the second set and all elements of the second set which are not in the first set:


    A
    XOR
    B = {a,c}

Please note that XOR can be defined with a combination of the UNION and DIFFERENCE operations:


    A
    XOR
    B :=

    (A
    DIFFERENCE
    B)
    UNION

    (B
    DIFFERENCE
    A)

    
 ... {a}
    UNION
    {c} 
    
 ... {a,c}

INTERSECTION operation

The resulting set contains all elements which appear in both sets:


    A
    INTERSECTION
    B = {b}

Please note that INTERSECTION can be defined with the DIFFERENCE operation:


    A
    INTERSECTION
    B :=

    A
    DIFFERENCE
    (A
    DIFFERENCE
    B)

    
 ... {a,b}
    DIFFERENCE
    {a} 
    
 ... {b}

or:

A INTERSECTION B := B DIFFERENCE (B DIFFERENCE A) ... {b,c} DIFFERENCE {c} ... {b}

SUBSET operation

Returns .true if first set contains only elements which appear in the second set too, i.e. A DIFFERENCE B would yield an empty set {}; .false else:


    A
    SUBSET
    B = .false

as:


    A
    DIFFERENCE
    B = {a}

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.1.2 Set(like)-Operations on Collections with Duplicates

There exist the collections A={a,b,b} and B={b,b,c,c}. The result of any set-operation yields another collection. Set operators are infix.

UNION operation

All elements of A and B are united:


    A
    UNION
    B = {a,b,b,b,b,c,c}

DIFFERENCE (MINUS) operation

The resulting collection contains all elements of the first collection with those elements removed which also appear in the second collection (by iterating over the elements of the second collection and removing them from the first collection one by one):


    A
    DIFFERENCE
    B = {a}
    
 B
    DIFFERENCE
    A = {c,c}

XOR operation:

The resulting collection contains all elements of the first collection which are not in the second collection and all elements of the second collection which are not in the second collection:


    A
    XOR
    B = {a,c,c}

Please note that XOR can be defined with a combination of the UNION and DIFFERENCE operations:


    A
    XOR
    B :=

    (A
    DIFFERENCE
    B)
    UNION

    (B
    DIFFERENCE
    A)

    
 ... {a}
    UNION
    {c,c} 
    
 ... {a,c,c}

INTERSECTION operation

The resulting collection contains all elements which appear in both collections:


    A
    INTERSECTION
    B = {b,b}

Please note that INTERSECTION can be defined with the DIFFERENCE operation:


    A
    INTERSECTION
    B :=

    A
    DIFFERENCE
    (A
    DIFFERENCE
    B)

    
 ... {a,b,b}
    DIFFERENCE
    {a} 
    
 ... {b,b}

or:

A INTERSECTION B := B DIFFERENCE (B DIFFERENCE A) ... {b,b,c,c} DIFFERENCE {c,c} ... {b,b}

SUBSET operation

Returns .true if first set contains only elements which appear in the second set too, i.e. A DIFFERENCE B would yield an empty set {}; .false else:


    A
    SUBSET
    B = .false

as:


    A
    DIFFERENCE
    B = {a}

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.1.3 Rexx Code

The following code uses sets and bags to demonstrate the Set-operators.

/* code to permutate Set-Operations under OOI, ---rgf, 97-08-22 */

PARSE VERSION version
SAY "Version:" version
SAY

run. = 0
run.1.1 = .set ~ of( "a", "b" )                 /* define sets          */
run.1.2 = .set ~ of( "b", "c" )
run.2.1 = .bag ~ of( "a", "b", "b" )            /* define bags          */
run.2.2 = .bag ~ of( "b", "b", "c", "c" )
run.0   = 2

op. = 0
op.1 = "UNION"                                  /* define operations    */
op.2 = "DIFFERENCE"
op.3 = "XOR"
op.4 = "INTERSECTION"
op.5 = "SUBSET"
op.5.1 = .true                                  /* does not return a collection */
op.0 = 5

DO iRuns = 1 TO run.0           /* loop over collections                */
   A = run.iRuns.1
   B = run.iRuns.2
   A_String = display( "A", A )
   B_String = display( "B", B )

   SAY "Run" iRuns":" A_String "(" A ")," B_String "(" B ")"
   SAY
   DO iOps = 1 TO op.0          /* Loop over operations                 */

      DO rev = 1 TO 2           /* exchange order of collections        */
         IF rev = 1 THEN doSTring = "A ~" op.iOps || "( B )"
                    ELSE doSTring = "B ~" op.iOps || "( A )"

         INTERPRET "tmp =" doString
         IF op.iOps.1 = .true THEN tmp_String = doString "=" yes_no( tmp )
                              ELSE tmp_String = display( doSTring, tmp )
         SAY A_String"," B_String":" tmp_String
      END
   END iOps
   SAY LEFT( "", 70, "-" )
END iRuns
EXIT


YES_NO : PROCEDURE
  IF ARG( 1 ) = .true THEN RETURN "<yes>"
                      ELSE RETURN "<no>"

DISPLAY : PROCEDURE
   USE ARG symbol, collection

   tmpArray = sortCollection( collection )
   tmp = ""
   DO i = 1 TO tmpArray ~ items / 2
      tmp = tmp || "," || tmpArray[ i, 2 ]
   END
   tmp = STRIP( tmp, "L", "," ) 

   IF symbol = "" THEN RETURN "{" || tmp || "}"
                  ELSE RETURN symbol "= {" || tmp || "}"

   RETURN

:: REQUIRES rgf_util            /* for sorting, from ORX8.ZIP,
                                   documentation in ORX8DOC.ZIP */

The output:


Version: OBJREXX 6.00 21 Jul 1997

Run 1: A = {a,b} ( a Set ), B = {b,c} ( a Set )

A = {a,b}, B = {b,c}: A ~ UNION( B ) = {a,b,c}
A = {a,b}, B = {b,c}: B ~ UNION( A ) = {a,b,c}
A = {a,b}, B = {b,c}: A ~ DIFFERENCE( B ) = {a}
A = {a,b}, B = {b,c}: B ~ DIFFERENCE( A ) = {c}
A = {a,b}, B = {b,c}: A ~ XOR( B ) = {a,c}
A = {a,b}, B = {b,c}: B ~ XOR( A ) = {a,c}
A = {a,b}, B = {b,c}: A ~ INTERSECTION( B ) = {b}
A = {a,b}, B = {b,c}: B ~ INTERSECTION( A ) = {b}
A = {a,b}, B = {b,c}: A ~ SUBSET( B ) = <no>
A = {a,b}, B = {b,c}: B ~ SUBSET( A ) = <no>
----------------------------------------------------------------------
Run 2: A = {a,b,b} ( a Bag ), B = {b,b,c,c} ( a Bag )

A = {a,b,b}, B = {b,b,c,c}: A ~ UNION( B ) = {a,b,b,b,b,c,c}
A = {a,b,b}, B = {b,b,c,c}: B ~ UNION( A ) = {a,b,b,b,b,c,c}
A = {a,b,b}, B = {b,b,c,c}: A ~ DIFFERENCE( B ) = {a}
A = {a,b,b}, B = {b,b,c,c}: B ~ DIFFERENCE( A ) = {c,c}
A = {a,b,b}, B = {b,b,c,c}: A ~ XOR( B ) = {a,c,c}
A = {a,b,b}, B = {b,b,c,c}: B ~ XOR( A ) = {a,c,c}
A = {a,b,b}, B = {b,b,c,c}: A ~ INTERSECTION( B ) = {b,b}
A = {a,b,b}, B = {b,b,c,c}: B ~ INTERSECTION( A ) = {b,b}
A = {a,b,b}, B = {b,b,c,c}: A ~ SUBSET( B ) = <no>
A = {a,b,b}, B = {b,b,c,c}: B ~ SUBSET( A ) = <no>
----------------------------------------------------------------------

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.2 ORACLE 7.3

ORACLE 7.3 is an up-to-date relational database management system (RDBMS) (unfortunately, I do not have access to DB2 or Universal Database), which implements SQL92 and a few extensions from "SQL3", which is not yet standardized. ORACLE contains a FLAGGER which can be set such, that it indicates whether an SQL statement complies to the SQL92 standard or not.

In this section only the set operations available in ORACLE 7.3 are discussed.

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.2.1 "Relational Relations"

A "relation" in relational theory is a set of tuples. Any operation on a "relational relation" :) yields another "relational relation" as a result, i.e. a set of tuples. The same is true for intermediary results. Relational Algebra (and Relational Calculus) therefore work with sets (of tuples) only.

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.2.2 The Reality

In reality all vendors of RDBMS started out in the 80ies by allowing tables to contain duplicate rows (i.e. duplicate tuples). Tables are database objects representing relations, rows are database objects representing tuples, columns are database objects representing attributes. The number of attributes of a tuple determines the arity of a tuple.

One of the many implications of duplicates is e.g. that SQL (structured query language) cannot be optimized such that different parts of an SQL-statement can be concurrently executed, as the order in which parts of it are executed may matter. (In relational algebra any intermediary result is a set, so duplicate tuples cannot occur and influence the proceeding of the execution.)
Another implication is that the set-operators (at least the meanwhile standardized UNION) have to take care into account, that relational tables may contain duplicate tuples: SQL92 therefore defines two versions of the UNION operator: UNION and UNION ALL.
The former operator first turns all tables participating in the union into a set of tuples, applies the UNION and makes sure that the resulting intermediary table itself is a set of tuples, so the behavior is exactly as the operation implemented in Object Rexx for Sets; the latter (UNION ALL) allows for duplicate tuples and behaves exactly as the operation implemented in Object Rexx for Bags.

ORACLE 7.3 supplies the set-operators MINUS and INTERSECT, both of which make sure to work on non-duplicates (i.e. true sets). Unfortunately, there are no ALL versions available for these particular set-operators. Also, at present (97-08) both operations are not defined as an SQL-standard AFAIK.

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.2.3 SQL for ORACLE 7.3

What follows is a sequence of SQL statements which can be directly addressed to the ORACLE 7.3 RDBMS command line interface.

The ORACLE set-operator MINUS is called DIFFERENCE in Object Rexx and INTERSECT is named INTERSECTION in Object Rexx. In ORACLE 7.3 there are no single operators available for XOR and SUBSET.


REM SET ECHO OFF
REM ---rgf, 97-08-22, ANSI Rexx
SET PAGESIZE 9999
HOST test_orx_dupl.out
SPOOL test_orx_dupl.out

REM make sure tables do not exist, so CREATE ... works without errors
DROP table A;
DROP table B;

REM ===================> create tables
CREATE TABLE A ( element VARCHAR( 1 ) );
CREATE TABLE B ( element VARCHAR( 1 ) );

REM ===================> fill tables
INSERT INTO A VALUES ( 'a' );
INSERT INTO A VALUES ( 'b' );
INSERT INTO A VALUES ( 'b' );

INSERT INTO B VALUES ( 'b' );
INSERT INTO B VALUES ( 'b' );
INSERT INTO B VALUES ( 'c' );
INSERT INTO B VALUES ( 'c' );

SET ECHO ON
REM ===================> Show contents of tables
SELECT * from A;
SELECT * from B;

REM ===================> Show contents of tables without duplicates
SELECT DISTINCT * from A;
SELECT DISTINCT * from B;

REM --------------------> A UNION B 
SELECT * FROM A UNION SELECT * FROM B ORDER BY 1 ;

REM --------------------> A UNION ALL B
SELECT * FROM A UNION ALL SELECT * from B ORDER BY 1 ;

REM --------------------> A INTERSECT B
SELECT * FROM A INTERSECT SELECT * FROM B ORDER BY 1 ;

REM --------------------> A MINUS B
SELECT * FROM A MINUS SELECT * FROM B ORDER BY 1 ;

REM --------------------> B MINUS A
SELECT * FROM B MINUS SELECT * FROM A ORDER BY 1 ;

spool off

In the following output the results just show 'E' as the heading for the column 'ELEMENT':



SQL> REM ===================> Show contents of tables
SQL> SELECT * from A;

E                                                                                                   
-                                                                                                   
a                                                                                                   
b                                                                                                   
b                                                                                                   

SQL> SELECT * from B;

E                                                                                                   
-                                                                                                   
b                                                                                                   
b                                                                                                   
c                                                                                                   
c                                                                                                   

SQL> REM ===================> Show contents of tables without duplicates
SQL> SELECT DISTINCT * from A;

E                                                                                                   
-                                                                                                   
a                                                                                                   
b                                                                                                   

SQL> SELECT DISTINCT * from B;

E                                                                                                   
-                                                                                                   
b                                                                                                   
c                                                                                                   

SQL> REM --------------------> A UNION B
SQL> SELECT * FROM A UNION SELECT * FROM B ORDER BY 1 ;

E                                                                                                   
-                                                                                                   
a                                                                                                   
b                                                                                                   
c                                                                                                   

SQL> REM --------------------> A UNION ALL B
SQL> SELECT * FROM A UNION ALL SELECT * from B ORDER BY 1 ;

E                                                                                                   
-                                                                                                   
a                                                                                                   
b                                                                                                   
b                                                                                                   
b                                                                                                   
b                                                                                                   
c                                                                                                   
c                                                                                                   

7 rows selected.

SQL> REM --------------------> A INTERSECT B
SQL> SELECT * FROM A INTERSECT SELECT * FROM B ORDER BY 1 ;

E                                                                                                   
-                                                                                                   
b                                                                                                   

SQL> REM --------------------> A MINUS B
SQL> SELECT * FROM A MINUS SELECT * FROM B ORDER BY 1 ;

E                                                                                                   
-                                                                                                   
a                                                                                                   

SQL> REM --------------------> B MINUS A
SQL> SELECT * FROM B MINUS SELECT * FROM A ORDER BY 1 ;

E                                                                                                   
-                                                                                                   
c                                                                                                   

SQL> spool off

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

1.3 Conclusions

In the Object Rexx documentation the first collection taking place in a set operation is usually called the receiver and the second collection is named other and supplied as the argument to the set-operator message.
Object Rexx implements the set operations with respect to Sets such that the "classic" set operations take place.
For the collection classes containing duplicates (Relations and Bags) Object Rexx implements the set operation UNION such that the "SQL92" semantics of UNION ALL take place. The results in Object Rexx are therefore the same as in ORACLE 7.3, hence as defined with the SQL92 standard.
The MINUS (Object Rexx message name: DIFFERENCE) and INTERSECT (Object Rexx message name: INTERSECTION) set operators in ORACLE 7.3 work on sets only, there are no MINUS ALL and INTERSECT ALL variants defined as it is the case with the SQL92 compliant UNION ALL.
Therefore one needs to look into the semantics of the DIFFERENCE operation on collections containing duplicates as implemented by Object Rexx. If the DIFFERENCE operation is agreed upon, the INTERSECT and XOR operations for collections containing duplicates can be derived according to the definitions in 1.1.1 and 1.1.2.
It seems that the DIFFERENCE definition in 1.1.2 for collections with duplicates is reasonable:

"DIFFERENCE (MINUS) operation:
The resulting collection contains all elements of the first collection with those elements removed which also appear in the second collection (by iterating over the elements of the second collection and removing them from the first collection one by one)."

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

2 Set-Operations in Object Rexx - More Detailed

In this section first the definitions of the set-operators are cited according to the online help (the "OOI-book"), so an overview of the present documentation is given. Then, for every collection class the concept of identity is discussed, and finally the semantics and the implications of the Object Rexx defined collection classes are discussed with respect to participating in set operations as arguments ("other").

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

2.1 OOI Documentation of the Set-Operations

The set-operations are defined as methods for the following collection classes and inherited by their subclasses:

Table (and inherited by its subclass Set)
Directory
Relation (and inherited by its subclass Bag)

Hence, the following collection classes do not have set-operator methods defined, but may be used as arguments ("other") for set-operator messages:

Array
List
Queue

Table of OOI-documentation about the set-operator methods (just concentrating on UNION and DIFFERENCE as the other Object Rexx set-operations can be derived from these two operations):

TABLE and its subclass SET
receiver ~ UNION( other )
Returns a new collection of the same class as the receiver that contains all the items from the receiver collection and selected items from the collection other. This method includes an item from other in the new collection only if there is no item with the same associated index in the receiver collection and the method has not already included an item with the same index. The order in which this method selects items in other is unspecified. (The program should not rely on any order.) See also the UNION method of the Directory class and UNION method of the Relation class. The other can be any object that supports the methods the REXX collection classes implement. The other must also allow all of the index values in the receiver collection.
(Remark: as the identity of a table-"element" is defined by the index, the reference to index-item pairs seems to be misleading and complicating the matter. Also, it is not clear which 'methods of the collection classes' are meant and and therefore needed.)
receiver ~ DIFFERENCE( other )
Returns a new collection (of the same class as the receiver) containing only those index-item pairs from the receiver whose indexes the collection other does not contain (with the same associated index). The other can be any object that supports the methods the REXX collection classes implement. The other must also allow all of the index values in the receiver collection.
(Remark: as the identity of a table-"element" is defined by the index, the reference to index-item pairs seems to be misleading and complicating the matter. Also, it is not clear which 'methods of the collection classes' are meant and and therefore needed.)

DIRECTORY
(Remark: the index of a directory "requires a string value".
Wouldn't it therefore make sense to add the same behavior as is present already e.g. with SAY, PARSE VAR etc.? I.e., if Object Rexx receives a non-string-object for a directory index it would convert that object to a string representation (like it does already e.g. with SAY or PARSE VAR). If there is no string value available to Object Rexx then a NOSTRING condition should be raised. C.f. the online docs about "Required String Values".)
receiver ~ UNION( other )
Returns a new collection of the same class as the receiver that contains all the items from the receiver collection and selected items from the collection other. This method includes an item from other in the new collection only if there is no item with the same associated index in the receiver collection and the method has not already included an item with the same index. The order in which this method selects items in other is unspecified. (The program should not rely on any order.) See also the UNION method of the Table class and the UNION method of the Relation class. The other can be any object that supports the methods the REXX collection classes implement. The other must also allow all of the index values in the receiver collection.
(Remark: note the usage of the terms "index" and "item". Again, it is the index which determines whether an element from other has to be put into the resulting collection and not the value.) The Object Rexx documentation uses the term "item" in the explaning text, but in the syntax diagrams it uses the term "value" instead; another possible source of confusion
receiver ~ DIFFERENCE( other )
Returns a new collection (of the same class as the receiver) containing only those items from the receiver whose indexes the collection other does not contain.
The other can be any object that supports the methods the REXX collection classes implement. The other must also allow all of the index values in the receiver collection.
(Remark: which methods must be available in other in order to be elidgible to participate in set(like)-operations ?)

RELATION and its subclass BAG
receiver ~ UNION( other )
Returns a new collection containing all items from the receiver collection and the collection other. The other can be any object that supports a HASITEM method and the methods the REXX collection classes implement.
(Remark: is this really true ?? Or is the reference to HASITEM nothing else but a hint that for identifiying individual "elements" of relations both need to be used, the INDEX and the ITEM of an "element"?)
receiver ~ DIFFERENCE( other )
Returns a new collection (of the same class as the receiver) containing only those items that the collection other does not contain (with the same associated index). The other can be any object that supports a HASITEM method and the methods the REXX collection classes implement.
(Remark: is this really true ?? Or is the reference to HASITEM nothing else but a hint that for identifiying individual "elements" of relations both need to be used, the INDEX and the ITEM of an "element"?)

At present it seems that the documentation with respect to the set-operators is not really clear (or as clear as it could be), e.g.:

Which methods must other supply in order to become ledgible as a valid argument for participating in set operations?
Is it really necessary to have a HASITEM method defined for other if the receiver is a relation?
What determines the identity of collection "elements", making it possible to test for existense, if necessary?
The Object Rexx documentation uses the term "item" in the explaning text, but in the syntax diagrams it uses the term "value" instead which seems to be another possible source of confusion.

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

2.2 Collection Classes and the Identity of Collection Elements

For set(like) operations it becomes necessary to define what constitutes the identity of collection elements in order to determine, whether an element is present in a receiver collection or not.

All Object Rexx collection elements consist of a tuple in the form of "(value,index)" (with the notable exception of arrays, which may have more than one index, depending on the number of their dimensions). [In order to verify that this is true even for sets and bags lookup the PUT method description. Both of these collection classes are defined such that value and index are in effect the same object, i.e. both are identical.]

Of those collection classes which have set-operation methods defined, only the relation class needs both, the value and index part of the tuple in order to determine whether an element exists already or not. The other classes just need the index part of the tuple in order to determine whether an element exists already.

The following table depicts the parts of the tuples which need to be used for determining whether a collection element exists in the receiver already or not. Only those collection classes are shown, which have set-operation methods defined for them:

Determining whether an Element Exists in the Receiver Already
Receiver: Identity Portion of Tuple (value, index) needed: Testmethod:
Table (Set) index HASINDEX
Directory index HASINDEX
Relation (Bag) value, index HASITEM

From the above table it should become clear that in all cases except for relation and its subclasses the index portion of the tuple is sufficient for testing whether an element exists in the receiver already or not. For the relation class and its subclasses both parts of the tuple are needed, the value as well as the index part.

To define how the UNION and DIFFERENCE set-operators work, one could state:

For the purpose of the UNION set operation it would be sufficient to create a supplier object of the other collection, iterate over it and

PUT the element into the receiver only, if it does not exist there already [using the appropriate HAS-method from the above table for testing for existence]: pertains to the Table and Directory classes and their respective subclasses, or
PUT the element into the receiver anyway (cf. UNION ALL above): pertains to the Relation class and its subclasses.

For the purpose of the DIFFERENCE set operation it would be sufficient to create a supplier object of the other collection, iterate over it and REMOVE the elements from the receiver (or REMOVEITEM in the case that the receiver is a relation). One interesting feature of these definitions is the fact, that so far other may be any collection which merely supports the SUPPLIER method.

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

2.3 Semantics of Other

In this section there is a short contemplation about the meaning of indexes which will be taken into account later on when discussing the semantics applied to other collections. [The reasoning will be: if an index does not convey application related semantic information, then the supplier object should be built such, as if it was a supplier for a bag by setting the index part of the tuple equal to the value part. Therefore the index and the value object are identical.]

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

2.3.1 "Semantics" of Indexes

It is interesting to note that those collection classes for which set-operator methods have been defined (TABLE, DIRECTORY and RELATION), both the index and the value part of the tuple are user defined. One may infer that programmers are using the index to represent a specific value. In such a case there exists a "functional dependency" (FD) between the index and the associated value, expressed as: index->value.

In the case of a relation one index may represent "multivalued dependency" (MVD), as one index may determine multiple values, expressed as: index->->value.
[Both, FD and MVD, are terms from RDBMS-design in the area of "normalization" of table layouts in order to minimize redundancy and all problems related to it.]

Turning to Array, List and Queue it is interesting to note that the indexes convey no special meaning related to the values associated with them. Rather, the index merely serves as an addressing mechanism and may imply some order, but does not represent a specific value. The indexes are supplied or predefined by Object Rexx and therefore no FD or MVD can be inferred.
[Speculation: because of the missing of FDs or MVDs the developers did not see any sense in adding set-operation methods to these collection classes.]

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

2.3.2 Using the Object Rexx Builtin Collection Classes `"Array"`, `"List"` and `"Queue"` as Other

It has been argued that it does not make much sense to use the index part of the tuples in the case of these collection classes if used as other in a set-operation method, because there is no FD or MVD present.

Even worse, if the receiver is a Set or a Bag and a UNION operation has to be carried out an error may occur: the value and index retrieved from the other collection causes the PUT method for Set and Bag to fail. This is because of the fact that Set and Bag mandate, that both arguments must be the same object.

Array, List and Queue store user defined values which should get extracted and used for the set(like) operations as both, as value and as index. In effect, what should happen is a coercion to a Bag for the purpose of the set(like) operation, by using the value part of the other collection tuple.

The following Object Rexx example makes the effect of this rule clear and demonstrates the usage of arrays, lists and queues as other in set(like) operations:

An amateurish tennis club organizes a little tennis tournament. Every member may become elidgible as a player, if she or he pays the appropriate fee. For this particular example the following data is needed:

MemberSet: This is a set containing all members of the tennis club, hence an Object Rexx set was chosen.
PlayerSet: This is a set containing all members who paid the fee for the tournament, hence an Object Rexx set was chosen.
WinnerList: This is a list which collects all winners in the order of their achievement, hence an Object Rexx list was chosen.
WaitingQueue: There exists a queue for members who wait for playing. New entries are queued at the tail and people leaving the queue are removed from the head, hence an Object Rexx queue was chosen.
RosterArray: For the tournament a roster of players playing against each other was determined. A two dimensional Object Rexx array was chosen, representing each pair as (x, 1) and (x, 2 ), where x is the number of a particular pair.

As the organization of the tournament is rather chaotic, some of the above rules may be jeopardized:


/* code to demonstrate coercion of 'other' to bags, ---rgf, 97-08-24 */

        /* define all members                                   */
MemberSet = .set ~ of( "Anton", "Berta", "Caesar", "Dora", "Emile", "Franz" )

        /* define members who paid for playing                  */
PlayerSet = .set ~ of( "Anton", "Caesar", "Dora", "Emile" )

        /* define the list of winners (1st, 2nd, 3rd):          */
WinnerList   = .list  ~ of( "Emile", "Berta", "Anton" )

        /* define a queue containing members who wish to play   */
WaitingQueue = .queue ~ new ~~ queue( "Berta" ) ~~ queue( "Berta" ) 
        WaitingQueue ~~ queue( "Franz" ) ~~ queue( "Anton" )

        /* define an array representing the players playing against each other  */
RosterArray = .array ~ new      /* build pairs of players:      */
        RosterArray ~~ put( "Emile", 1, 1 ) ~~ put( "Anton", 1, 2)
        RosterArray ~~ put( "Berta", 2, 1 ) ~~ put( "Franz", 2, 2)
        RosterArray ~~ put( "Dora" , 3, 1 ) ~~ put( "Franz", 3, 2)
        RosterArray ~~ put( "Emile", 4, 1 ) ~~ put( "Franz", 4, 2)

        /* coerce "other" into bags */
MemberBag = .bag ~ NEW ~ UNION( MemberSet )
PlayerBag = .bag ~ NEW ~ UNION( PlayerSet )

        /* coerce a list        */
WinnerBag = .bag ~ NEW ~ UNION( WinnerList )
WinnerSet = .set ~ NEW ~ UNION( WinnerList )

        /* coerce a queue       */
WaitingBag = .bag ~ NEW ~ UNION( WaitingQueue )
WaitingSet = .set ~ NEW ~ UNION( WaitingQueue )

        /* coerce an array      */
RosterBag  = .bag ~ NEW ~ UNION( RosterArray )
RosterSet  = .set ~ NEW ~ UNION( RosterArray )

        /* display objects (type and content)   */
SAY "MemberSet" pp( MemberSet ) "MemberBag " pp( MemberBag )
SAY "   " DISPLAY( "MemberSet   ", MemberSet ) 
SAY
SAY "PlayerSet" pp( PlayerSet ) "PlayerBag " pp( PlayerBag )
SAY "   " DISPLAY( "PlayerSet   ", PlayerSet ) 
SAY "   " "('Players' are members and non-members who PAID for playing.)"
SAY
SAY "WinnerList  " pp( WinnerList   ) "WinnerBag " pp( WinnerBag ) , 
    "WinnerSet " pp( WinnerSet )
SAY "   " DUMP_OVER( "WinnerList  ", WinnerList ) 
SAY "   " DISPLAY( "WinnerBag   ", WinnerBag ) 
SAY "   " DISPLAY( "WinnerSet   ", WinnerSet ) 
SAY
SAY "WaitingQueue" pp( WaitingQueue ) "WaitingBag" pp( WaitingBag ) ,
    "WaitingSet" pp( WaitingSet )
SAY "   " DUMP_OVER( "WaitingQueue", WaitingQueue )
SAY "   " DISPLAY( "WaitingBag  ", WaitingBag )
SAY "   " DISPLAY( "WaitingSet  ", WaitingSet )
SAY
SAY "RosterArray " pp( RosterArray )       "RosterBag " pp( RosterBag ) ,
    "RosterSet " pp( RosterSet )
SAY "   " DUMP_OVER( "RosterArray ", RosterArray )
SAY "   " DISPLAY( "RosterBag   ", RosterBag )
SAY "   " DISPLAY( "RosterSet   ", RosterSet )
SAY


/* using LIST, QUEUE, ARRAY as "other" in set(like) operations  */
SAY LEFT( "", 70, "-" )
SAY "Are all players members ?                   ",
    yes_no( PlayerSet ~ SUBSET( MemberSet ) )
SAY "    show players who are members:           ",
    DISPLAY( "", MemberSet ~ INTERSECTION( PlayerSet ) )
SAY "    which members do not play officially ?  ",
    DISPLAY( "", MemberSet ~ DIFFERENCE( PlayerSet ) )
SAY

SAY "Are all winners official players ?          ",               
    yes_no( .set ~ new ~ UNION( WinnerList ) ~ SUBSET( PlayerSet ) )
    /* or:
    yes_no( WinnerSet ~ SUBSET( PlayerSet ) )
    */
SAY "    who did not pay to become a player?     ",    
    DISPLAY( "",  WinnerSet ~ DIFFERENCE( PlayerSet ) )
SAY

SAY "Did some players play more than once ?      ",   
    yes_no( RosterBag ~ DIFFERENCE( RosterSet ) ~ ITEMS  > 0 )

SAY "    # of players who played more than once ?",   
    ( .set ~ NEW ~ UNION( RosterBag ~ DIFFERENCE( RosterSet ) ) ~ ITEMS  )

SAY "    which players played more than once ?   ",   
    DISPLAY( "", ( .set ~ NEW ~ UNION( RosterBag ~ DIFFERENCE( RosterSet )  ) ) )
SAY

SAY "Which players are in the waiting queue ?    ",   
    DISPLAY( "", ( WaitingSet ~ INTERSECTION( PlayerSet )  ) ) 

SAY "    and who waits and is not a player ?     ",   
    DISPLAY( "", ( WaitingSet ~ DIFFERENCE( PlayerSet )  ) )
EXIT


yes_no : 
    IF ARG( 1 ) = 1 THEN RETURN "yes"
                    ELSE RETURN "no"

pp : PROCEDURE
   IF ARG( 2, "E" ) THEN RETURN LEFT( "[" || ARG( ! ) || "]", ARG( 2 ) )
                    ELSE RETURN LEFT( "[" || ARG( 1 ) || "]" , 10 )

DISPLAY : PROCEDURE            
   USE ARG symbol, collection

   tmpArray = sortCollection( collection )
   tmp = ""
   DO i = 1 TO tmpArray ~ items / 2
      tmp = tmp || "," || tmpArray[ i, 2 ]
   END
   tmp = STRIP( tmp, "L", "," ) 

   IF symbol = "" THEN RETURN "{" || tmp || "} - sorted"
                  ELSE RETURN symbol "= {" || tmp || "} - sorted"


DUMP_OVER : PROCEDURE
   USE ARG symbol, collection

   tmp = ""
   DO item OVER collection
      tmp = tmp || "," || item
   END
   tmp = STRIP( tmp, "L", "," ) 

   IF symbol = "" THEN RETURN "{" || tmp || "}"
                  ELSE RETURN symbol "= {" || tmp || "}"


:: REQUIRES rgf_util            /* for sorting, from ORX8.ZIP,
                                   documentation in ORX8DOC.ZIP */

The output:


MemberSet [a Set]    MemberBag  [a Bag]
    MemberSet    = {Anton,Berta,Caesar,Dora,Emile,Franz} - sorted

PlayerSet [a Set]    PlayerBag  [a Bag]
    PlayerSet    = {Anton,Caesar,Dora,Emile} - sorted
    ('Players' are members and non-members who PAID for playing.)

WinnerList   [a List]   WinnerBag  [a Bag]    WinnerSet  [a Set]
    WinnerList   = {Emile,Berta,Anton}
    WinnerBag    = {Anton,Berta,Emile} - sorted
    WinnerSet    = {Anton,Berta,Emile} - sorted

WaitingQueue [a Queue]  WaitingBag [a Bag]    WaitingSet [a Set]
    WaitingQueue = {Berta,Berta,Franz,Anton}
    WaitingBag   = {Anton,Berta,Berta,Franz} - sorted
    WaitingSet   = {Anton,Berta,Franz} - sorted

RosterArray  [an Array] RosterBag  [a Bag]    RosterSet  [a Set]
    RosterArray  = {Emile,Anton,Berta,Franz,Dora,Franz,Emile,Franz}
    RosterBag    = {Anton,Berta,Dora,Emile,Emile,Franz,Franz,Franz} - sorted
    RosterSet    = {Anton,Berta,Dora,Emile,Franz} - sorted

----------------------------------------------------------------------
Are all players members ?                    yes
    show players who are members:            {Anton,Caesar,Dora,Emile} - sorted
    which members do not play officially ?   {Berta,Franz} - sorted

Are all winners official players ?           no
    who did not pay to become a player?      {Berta} - sorted

Did some players play more than once ?       yes
    # of players who played more than once ? 2
    which players played more than once ?    {Emile,Franz} - sorted

Which players are in the waiting queue ?     {Anton} - sorted
    and who waits and is not a player ?      {Berta,Franz} - sorted

From this example it may be seen that the index part of elements of type arrays, lists and queues are semantically irrelevant for the problem to be solved.

To further the argumentation that indexes are irrelevant [because they are surrogates generated or pre-defined by the Object Rexx runtime system] in the case of arrays, lists and queues, one can turn to the implementation of MAKEARRAY wich in these cases returns a single-dimensioned array of the value part of the collected tuples! Were the index part semantically important, then MAKEARRAY would return the index part instead as is the case with tables, sets, directories, relations and bags.

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

2.3.3 General Rules for "Other" Collections

The other collection must implement the SUPPLIER method. The receiver set(like) method iterates over the supplied tuples.
If the other collection does not contain a UNION method, then the receiver set(like) method treats other as a Bag. This bag is constructed by using the value part of the other tuple at the same time as its index.
Reasoning: if other has a set(like) operator method implemented, then it is implied that a FD or MVD exists between the index and value part of the tuple, so the index is part of the identity of each tuple.
If a set(like) operator is missing in other then it is implied that the index carries no identity semantics and therefore needs to get dropped from the set(like) operation. With other words: in such a case the value object is the only part of a tuple which is relevant and therefore the receiver's set(like) method treats other as a bag of values.
Input from Kurt Märker: Other should optionally contain the messages ITEMS, PUT, AT, HASINDEX, HASITEM (if available), REMOVE and REMOVEITEM (if available).
The availability of these methods allows the interpreter to optimize (speed-up) the set-operations internally (like internally exchanging other with the receiver dependent e.g. on the number of tuples available in both collections).

[TOP] [TABLE OF CONTENTS] [BOTTOM] - [PREVIOUS] [NEXT]

3 Conclusions

The present OOI documentation is rather unclear and therefore confusing with respect to collections and set(like) operations defined for them. It needs to be re-written in these parts.
The tuples of a collection always consist of a value (i.e. item) and one or more indexes.
The other collection must implement the SUPPLIER method. The receiver set(like) method iterates over the supplied tuples.
Input from Kurt Märker: Other should implement the following messages: ITEMS, PUT, AT, HASINDEX, HASITEM (if available), REMOVE and REMOVEITEM (if available).
[The availability of these methods allows the interpreter to optimize (speed-up) the set-operations internally (like internally exchanging other with the receiver dependent e.g. on the number of tuples available in both collections).]
If the other collection does not contain a UNION method, then the receiver set(like) method treats other as a value bag. This bag is constructed by using the value part of the other tuple at the same time as its index part.
For the purpose of set(like) operations the identity of a tuple is determined by its index in the case of the following OOI collection classes: tables, sets and directories. There are no duplicate tuples in these collections.
Sets are special in the sense that their tuple's index and value parts are the same object.
The UNION set(like) operation for these collection classes works like this:
- a copy of the receiver collection is created, dubbed 'recv',
- a supplier object of the other collection is created,
- for each supplied tuple the recv UNION method tests whether the index part of it is present in a recv tuple already by sending it the HASINDEX message. Only if there is no tuple with the supplied index present, is the supplied tuple PUT into the recv collection.
In the case of the OOI relation and its subclass bag collections the identity of a tuple is determined by its index and its value. There may be duplicate tuples in the collection.
Bags are special in the sense that their tuple's index and value parts are the same object.
The UNION set(like) operation for these collection classes works like this:
- a copy of the receiver collection is created, dubbed 'recv',
- a supplier object of the other collection is created,
- each supplied tuple is PUT into the recv collection by its UNION method, irrespectible whether the supplied tuple exists in recv already or not.
The DIFFERENCE set(like) operation for all collection classes works like this:
- a copy of the receiver collection is created, dubbed 'recv',
- a supplier object of the other collection is created,
- each supplied tuple is removed from the recv collection by its DIFFERENCE method, which sends a REMOVE or if available a REMOVEITEM message to recv with the supplied tuple as an argument.
The INTERSECTION and XOR set(like) operations for all collection classes works by applying the definitions given in 1.1.1 and 1.1.2.

Please send comments to Rony.Flatscher@wu-wien.ac.at

[TOP] [TABLE OF CONTENTS] - [PREVIOUS]

`TABLE` and its subclass `SET`
	`receiver ~ UNION( other )`
	Returns a new collection of the same class as the receiver that contains all the items from the receiver collection and selected items from the collection other. This method includes an item from other in the new collection only if there is no item with the same associated index in the receiver collection and the method has not already included an item with the same index. The order in which this method selects items in other is unspecified. (The program should not rely on any order.) See also the UNION method of the Directory class and UNION method of the Relation class. The other can be any object that supports the methods the REXX collection classes implement. The other must also allow all of the index values in the receiver collection. (Remark: as the identity of a table-"element" is defined by the index, the reference to index-item pairs seems to be misleading and complicating the matter. Also, it is not clear which 'methods of the collection classes' are meant and and therefore needed.)
	`receiver ~ DIFFERENCE( other )`
	Returns a new collection (of the same class as the receiver) containing only those index-item pairs from the receiver whose indexes the collection other does not contain (with the same associated index). The other can be any object that supports the methods the REXX collection classes implement. The other must also allow all of the index values in the receiver collection. (Remark: as the identity of a table-"element" is defined by the index, the reference to index-item pairs seems to be misleading and complicating the matter. Also, it is not clear which 'methods of the collection classes' are meant and and therefore needed.)

`DIRECTORY` (Remark: the index of a directory "requires a string value". Wouldn't it therefore make sense to add the same behavior as is present already e.g. with `SAY`, `PARSE VAR` etc.? I.e., if Object Rexx receives a non-string-object for a directory index it would convert that object to a string representation (like it does already e.g. with `SAY` or `PARSE VAR`). If there is no string value available to Object Rexx then a `NOSTRING` condition should be raised. C.f. the online docs about "Required String Values".)
	`receiver ~ UNION( other )`
	Returns a new collection of the same class as the receiver that contains all the items from the receiver collection and selected items from the collection other. This method includes an item from other in the new collection only if there is no item with the same associated index in the receiver collection and the method has not already included an item with the same index. The order in which this method selects items in other is unspecified. (The program should not rely on any order.) See also the UNION method of the Table class and the UNION method of the Relation class. The other can be any object that supports the methods the REXX collection classes implement. The other must also allow all of the index values in the receiver collection. (Remark: note the usage of the terms "index" and "item". Again, it is the index which determines whether an element from other has to be put into the resulting collection and not the value.) The Object Rexx documentation uses the term "item" in the explaning text, but in the syntax diagrams it uses the term "value" instead; another possible source of confusion
	`receiver ~ DIFFERENCE( other )`
	Returns a new collection (of the same class as the receiver) containing only those items from the receiver whose indexes the collection other does not contain. The other can be any object that supports the methods the REXX collection classes implement. The other must also allow all of the index values in the receiver collection. (Remark: which methods must be available in other in order to be elidgible to participate in set(like)-operations ?)

`RELATION` and its subclass `BAG`
	`receiver ~ UNION( other )`
	Returns a new collection containing all items from the receiver collection and the collection other. The other can be any object that supports a `HASITEM` method and the methods the REXX collection classes implement. (Remark: is this really true ?? Or is the reference to `HASITEM` nothing else but a hint that for identifiying individual "elements" of `relations` both need to be used, the `INDEX` and the `ITEM` of an "element"?)
	`receiver ~ DIFFERENCE( other )`
	Returns a new collection (of the same class as the receiver) containing only those items that the collection other does not contain (with the same associated index). The other can be any object that supports a `HASITEM` method and the methods the REXX collection classes implement. (Remark: is this really true ?? Or is the reference to `HASITEM` nothing else but a hint that for identifiying individual "elements" of `relations` both need to be used, the `INDEX` and the `ITEM` of an "element"?)

Determining whether an Element Exists in the Receiver Already
	Receiver:	Identity Portion of Tuple (value, index) needed:	Testmethod:
	`Table` (`Set`)	index	`HASINDEX`
	`Directory`	index	`HASINDEX`
	`Relation` (`Bag`)	value, index	`HASITEM`

1 Set-Operations - Overview

1.1 Basic Set-Operations and Object Rexx

1.1.1 Set-Operations on Sets ("Collections without Duplicates")

1.1.2 Set(like)-Operations on Collections with Duplicates

1.1.3 Rexx Code

1.2 ORACLE 7.3

1.2.1 "Relational Relations"

1.2.2 The Reality

1.2.3 SQL for ORACLE 7.3

1.3 Conclusions

2 Set-Operations in Object Rexx - More Detailed

2.1 OOI Documentation of the Set-Operations

2.2 Collection Classes and the Identity of Collection Elements

2.3 Semantics of Other

2.3.1 "Semantics" of Indexes

2.3.2 Using the Object Rexx Builtin Collection Classes "Array", "List" and "Queue" as Other

2.3.3 General Rules for "Other" Collections

3 Conclusions

2.3.2 Using the Object Rexx Builtin Collection Classes `"Array"`, `"List"` and `"Queue"` as Other