mgetty/vgetty mailing list archives now working

"Robert J. Brown" (rj@eli.elilabs.com)
Tue, 9 Nov 1999 21:57:43 -0600


>>>>> "Gert" == Gert Doering <gert@greenie.muc.de> writes:

    Gert> Hi, On Mon, Nov 08, 1999 at 12:48:11AM -0600, Robert
    Gert> J. Brown wrote:
    >> Thanks to the diligent efforts and help of Chip Atkinson
    >> <chip@pupman.com>, the archives now have a nice search engine.
    >> This has been something many of you have requested for quite
    >> some time.  It is now up and running in "beta" mode.
    >> Everything seems to be working well, but we want to give it a
    >> thorough live test.

    Gert> Looks quite good.

    Gert> NB: It's pretty amazing to see all that the stuff since
    Gert> February 1994 being online!  (Unfortunately, at least in the
    Gert> 1994 e-mails, there is a fair number of duplicates - any way
    Gert> to get them out?)

I guess we could write a filter script that stripped everything except
the body text of the message, then formed an md5 digest of that text
and wrote it together with the filename of the message and the posting
date from the header to a file.  Next we would sort the file by md5
hash as primary key and by posting date as secondary key.  This would
bring all files with identical message bodies together, and put the
oldest post date first.  We then delete all but the first file in each 
md5 hash group.  Voila!  No more duplicatye messages.

Do we have any volunteers to write such a script?

-- 
--------  "And there came a writing to him from Elijah"  [2Ch 21:12]  --------
R. J. Brown III  rj@elilabs.com http://www.elilabs.com/~rj  voice 847 543-4060
Elijah Laboratories Inc. 457 Signal Lane, Grayslake IL 60030  fax 847 543-4061
-----  M o d e l i n g   t h e   M e t h o d s   o f   t h e   M i n d  ------