Re: Ideas about data types (fwd)

Date: Tue, 22 Jul 1997 17:44:28 -0400
From:
Erik Ostrom <eostrom@research.att.com>
Content-Type: text/plain; charset=us-ascii
Michael sent this to me, and then realized it should probably have gone to 
moo-cows.  I'll write a reply in a while.

------- Forwarded Message

Received: from research.att.com (research.research.att.com [135.205.32.20])
	by amontillado.research.att.com (8.8.5/8.8.5) with SMTP id RAA20152
	for <eostrom@issr.research.att.com>; Tue, 22 Jul 1997 17:05:18 -0400 (EDT)
Received: from castor.ipac.caltech.edu ([131.215.11.35]) by research; Tue Jul 
22 17:04:12 EDT 1997
Received: from laurel (laurel.ipac.caltech.edu [134.4.20.87]) 
	  by castor.ipac.caltech.edu (8.7.4/8.6.4)
	  with ESMTP id OAA00211
	  for <eostrom@research.att.com>; Tue, 22 Jul 1997 14:04:04 -0700 (PDT)
Received: (brundage@localhost) by laurel (8.6.8.1/8.6.4) id OAA01413; Tue, 22 
Jul 1997 14:02:43 -0700
Date: Tue, 22 Jul 1997 14:02:39 -0700 (PDT)
From: Michael Brundage <brundage@laurel.ipac.caltech.edu>
To: Erik Ostrom <eostrom@research.att.com>
Subject: Re: Ideas about data types 
In-Reply-To: <199707221914.PAA01663@radish.research.att.com>
Message-ID: <Pine.SOL.3.91.970722124833.1376A-100000@laurel>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Content-Length: 11270

On Tue, 22 Jul 1997, Erik Ostrom wrote:
>     5 as boolean

I suggest that you use this "var as type" syntax; furthermore, let the 
type correspond to those values which are returned from typeof() -- so 
for example,  5 as STR,  5 as FLOAT

This type should be computable, so that, for example,

x = STR;
y = 5 as x;

results in y = "5"

Some type conversions will result in loss of information, such as
y = 5.3 as INT;
I don't know whether the programmer should be signalled to this; one 
could always add a flag to each variable and make some sort of function
like conversion_error() which would return a false value if no error 
occurred and would return an error message (possibly just a string) if 
one did occur.  For example,

y = 5 as INT;
if(conversion_error(y))
  "no conversion error, so this line is never reached";
endif
y = 5.3 as INT;
if(conversion_error(y))
  notify(player, conversion_error(y));
  "conversion_error() might indicate loss of precision or whatever here";
endif


Some types may not be convertable to certain other types.  For example, 
lists may not be convertable to integers. In this case, an error should 
be raised, say E_CONVERSION, to let the programmer know something is amiss.

> Many types won't need a conversion function or "literal" value syntax at all; 
> their values are generated by other means.

Although it is true that there may be no explicit literal syntax for 
types such as file handles, it is nevertheless true that there should be 
an orthogonal syntax for constructing objects of various types with 
particular values.  I don't want to have to write code that looks like:

  if(typeof(x) == LIST)
    y = toliteral(x);
  elseif(typeof(x) == FILE)
    y = open(x.filename, x.permissions);
  ...

My understanding of literals (and perhaps this is flawed) is that a 
literal value is not merely its representation in MOO code, but is a 
value from which the exact state of the value may be recreated.  So it 
works like serialization in Java -- to_literal() should have a reverse, 
from_literal().  from_literal() is implcitly understood to apply to 
"built-in" types such as lists, and the parser knows this.  But for new 
types, one should have to write this explicitly:

  x = to_literal(z);
  y = from_literal(x);
  if(y == z)
    "always true";

This is analogous to the problem encountered with builtin functions, 
where MOO code may fail to work because a "built-in" function may have 
been removed.  The work around was to introduce call_function(), which is 
a more general method of calling builtins and has the advantage that it 
doesn't break with dynamic functions.  So I propose that we add a 
from_literal() function which is the counterpart of call_function() for 
literals.

Furthermore, I recommend that literals always be strings (as they are now),
because this facilitates storing them in the DB and transporting them 
across the network.

Having non-printable data types would be bad, IMHO.  Custom data types 
should always provide a meaningful toliteral() that will allow their 
reconstruction using fromliteral().

> Note that in my scheme, this may not even be true (as a naive implementation 
> will simply parse it as a list and an identifier).  However, it IS necessary 
> to load the data type before loading _stored values_ of that type--for 
> example, in properties or in suspended tasks.
> [...] Types are also 
> given their own hooks for reading from and writing to the DB file; these use 
> the DB module's I/O mechanisms, rather than being directly string-based.
> This  creates a bit of difficulty if you have a database containing 
> values of a type that your server doesn't support.  I haven't 
> finalized a plan for that yet.

Another advantage to having explicit fromliteral() calls is that when the 
MOO encounters a data_type it doesn't know, it can substitute the 
literal, just as it currently substitutes call_function() for builtins it 
doesn't know.  Of course, the code won't run in a meaningful way, but at 
least it will still compile.  This can be useful if, for example, code is 
written to take advantage of new data types before they are actually 
added.

> I'm not planning on any implicit coercion, as you suggest.

Yay!

> Possibly.  I'm not sure it's necessary to overload the arithmetic operators, 
> but I suppose it'd be handy if one wanted to implement new kinds of numbers, 
> and as Nick pointed out, they are already overloaded, so what's a few more 
> meanings.

Ack.  Just don't make it too hard to use.  It would be nice to be able to 
do things like notify(player, "The time is "+time()) without having to 
manually wrap tostr() around everything, and this would have the 
additional advantage that if you ever switch your functions from 
returning ints to returning strs (or vice-versa) then code like
notify(player, "The time is "+someobj:time()) will still work. *BUT* 
overloaded operators can make it harder to catch bugs, even when they are 
restricted to operating on values of the same type.  Also, integer and 
floating additions are associative, but string and list additions 
presumably would not be.  So in a way this would change how + works...

> > I suppose that if the new datatype is a container type, that
> > it should be able to handle indexing of a variable like lists and strings
> > currently allow.
> 
> Yep.  This is planned but not yet implemented.

Cool!

May I also recommend that literals be able to have properties and methods?
For example, we currently have the notation
#10:someverb() which executes a verb on an object
and
#10.someproperty which accesses a property on an object

But modifying built-in types requires special builtin functions for each 
type -- so things become cluttered with bf's like listappend(), 
listdelete() -- and with custom types, I expect we'll see things like 
vectorappend(), setadd(), hashtableput() -- aaaaagh!

Better, I think would be to have verbs and properties on any data type, 
not just objects.  I suppose the difference is that (initially at least), 
types would not be able to inherit from other types, and one would not be 
able to change the verbs for a data type.  However, one would be able to 
execute verbs and get/set property values.

Then one can not only do some_str[5], some_list[5] and some_vector[5], but 
also some_string.length, some_list.length and some_vector.length.  
Instead of cluttering the global namespace with listappend() and so forth 
(possibly creating conflicts among different custom data types), one 
could just do  some_vector.append(some_element) or 
some_list.append(some_element) or some_string.append(some_string).

In this way, one could code naturally, as:

#42:copy_container_to_vector
"Takes any container and returns a vector copy of it"
{container} = args;
try
  len = container.length;
except e (E_PROPNF)
  raise(E_TYPE);
  "container isn't really a container if it doesn't let us get its length";
endtry
result = this:new_vector(len);
for i in [0..len]
  result[i] = container[i]
endfor
return result;


If data types do not have properties and methods we can work with in this 
fashion, then not only will data type writers have to write many built-in 
functions to accomodate their new data types (or else hack the 
pre-existing built-ins to correctly work with them), but we will have to 
special case our code to deal with each new type of builtin.  For 
example,the code above would have to start out

{container} = args;
len = this:length(container);

where #42:length would have special cases for every recognized type:

#42:length
{container} = args;
if(typeof(container)==list)
  return length(container);
elseif(typeof(container)==vector)
  return vectorlength(container);
elseif(typeof(container)==set)
  return setlength(container);
...

Yuck!

The only alternative is to wrap data types in objects (in order to use 
the objects verbs and to accomplish the same thing), but this defeats the 
whole purposes of new data types.

I would also like to have the equivalent of "class methods" for custom data 
types.  Otherwise, constructing new instances of data types with no 
in-MOO-code literal representation will be difficult.  This is somewhat 
mitigated if the data type has a stable literal representation that works 
with toliteral/fromliteral.  For example, suppose vectors have a literal 
representation "vector:{item1,item2,...,itemN}" where each item is a 
literal for that item.  I.e., the literal for a vector of a vector of 
length one containing the number 3 might look  like 
"vector:{\"vector:{3}\"}".  In this case, we could get a new vector with 
some length using:

#42:new_vector
  {size} = args;
  if(typeof(size)!=INT)
    raise(E_TYPE);
  endif
  return fromliteral("vector:"+toliteral($list_utils:make_list(size)));

or even
#42:new_vector
  {size} = args;
  if(typeof(size)!=INT)
    raise(E_TYPE);
  endif
  v = fromliteral("vector:{}");    " construct an empty vector";
  for i in [1..size]
    v:append(0);  " increase the vector to be the size we want";
  endfor
  return v;

But neither of these is as concise or efficient as just doing

#42:new_vector
  return VECTOR:new(10);


> > If the new datatype is a range type, I would expect the
> > for var in [literal..literal]  construct to allow you to iterate over
> > a sequence of literal values in that type.
> 
> Can you give an example of this?

This only works for (partially) ordered types, of course.  For totally 
ordered sets there is no problem, and it would work just like 0..3 or 
"a".."c".  For partially ordered ones, such as complex integers, there is 
the problem that the range may not be constructible- for example, 
from_literal("complex_int:1+3i")..from_literal("complex_int:1+5i") makes 
sense but 
from_literal("complex_int:2+3i")..from_literal("complex_int:1+5i") does not.

It's not clear to me that this functionality would be that super -- 
in fact, I have a hard time thinking up examples of useful ordered 
data types other than the ones we already have.  Note that ranges are 
only good for data types which are intrinsically ordered; they wouldn't 
be good for things like depth-first searching trees (the vertices are 
ordered *in the tree*, but by themselves have no inherent ordering).  
Even for things like permutations (or the complex integers above), there 
may be no canonical ordering (lexical, transposition, etc. for 
permutations).

> At the moment, I'm just using the string-valued name of the data type as a 
> return value for typeof().  I really don't want to add a new built-in 
> variable for each data type.  (The type name used in the conversion 
> expressions above  isn't really a variable; it's an identifier valid 
> only in the context of  conversion.

Just as long as typeof() returns the same kind of data for all types.
It would suck to have to special-case the built-in types in our MOO code.

>  There'll also be a way to convert to an arbitrary named type, 
> analogous to call_function() for functions.)

I thought this was what the "5 as some_type" notation was for?  Just make 
the expected type an expression instead of a literal, and you already can 
convert to any named type by using, for example,
  5 as #123:foo()




Cheers,

michael
brundage@ipac.caltech.edu


------- End of Forwarded Message
Prev by Subject: Re: Ideas about data types
Next by Subject: Re: Ideas about data types (toliteral)
Prev by thread: Re: moving things out of the server (was: floats and efficiency)
Next by thread: `expr ! err => whatever'
Index(es):
- Subject
- Thread
Home | Subject Index | Thread Index