<?xml version="1.0" encoding="UTF-8"?><record xmlns="http://www.loc.gov/MARC21/slim" xmlns:zs="http://docs.oasis-open.org/ns/search-ws/sruResponse">
  <leader>03089cmm a22004935i 4500</leader>
  <controlfield tag="001">21906383</controlfield>
  <controlfield tag="005">20250607104600.2</controlfield>
  <controlfield tag="006">m     o  d f      </controlfield>
  <controlfield tag="007">cr |||||||||||</controlfield>
  <controlfield tag="008">210216s2019    dcu     o  |        eng  </controlfield>
  <datafield ind1=" " ind2=" " tag="035">
    <subfield code="a">21906383</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="906">
    <subfield code="a">0</subfield>
    <subfield code="b">ibc</subfield>
    <subfield code="c">orignew</subfield>
    <subfield code="d">u</subfield>
    <subfield code="e">ncip</subfield>
    <subfield code="f">20</subfield>
    <subfield code="g">y-gencatlg</subfield>
  </datafield>
  <datafield ind1="0" ind2=" " tag="925">
    <subfield code="a">acquire</subfield>
    <subfield code="x">policy default</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="955">
    <subfield code="b">hh20 2021-02-12 TW Situational</subfield>
    <subfield code="i">hk15 2021-02-16 TW Situational</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="010">
    <subfield code="a">  2020445557</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="040">
    <subfield code="a">DLC</subfield>
    <subfield code="b">eng</subfield>
    <subfield code="c">DLC</subfield>
    <subfield code="e">rda</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="043">
    <subfield code="a">n-us---</subfield>
  </datafield>
  <datafield ind1="0" ind2="0" tag="050">
    <subfield code="a">JF1525.A8</subfield>
  </datafield>
  <datafield ind1="0" ind2="0" tag="245">
    <subfield code="a">3000 .gov tabular dataset.</subfield>
  </datafield>
  <datafield ind1="3" ind2=" " tag="246">
    <subfield code="a">Dot gov tabular dataset</subfield>
  </datafield>
  <datafield ind1="3" ind2=" " tag="246">
    <subfield code="a">Three thousand dot gov tabular dataset</subfield>
  </datafield>
  <datafield ind1=" " ind2="0" tag="264">
    <subfield code="a">Washington, D.C. :</subfield>
    <subfield code="b">Library of Congress Web Archiving Program,</subfield>
    <subfield code="c">[2018]</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="300">
    <subfield code="a">Online resource (datasets)</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="336">
    <subfield code="a">text</subfield>
    <subfield code="b">txt</subfield>
    <subfield code="2">rdacontent</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="336">
    <subfield code="a">computer dataset</subfield>
    <subfield code="b">cod</subfield>
    <subfield code="2">rdacontent</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="337">
    <subfield code="a">computer</subfield>
    <subfield code="b">c</subfield>
    <subfield code="2">rdamedia</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="338">
    <subfield code="a">other</subfield>
    <subfield code="b">cz</subfield>
    <subfield code="2">rdacarrier</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="500">
    <subfield code="a">"Each of these datasets consist of 1,000 files generated from indexes of the Web archives, which were used to derive a random list of 1,000 items identified as CSV, tab-separated (TSV), or Excel (XLS) files and hosted on .gov domains. Each set includes 1,000 unique CSV, TSV, and XLS files and minimal metadata about them, including links to their locations within the Library's web archive."-- Web archive datasets website.</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="500">
    <subfield code="a">"Dataset originally created 11/6/2018."--README file</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="500">
    <subfield code="a">"This dataset is based on exploratory work begun by the Library of Congress's Web Archiving Team in 2018. The goal of the work is to explore the contents of the Library's web archives through analysis of the indexes containing metadata from the harvested web content, as stored in CDX files. The metadata contained in the indexes was used for initial analysis, rather than the archived content stored in WARC and ARC container files, since W/ARC files present significant challenges due to large size and high processing requirements. The CDX indexes used in this initial analysis were six terabytes (TB) in size, which is a fraction of the web archive content in W/ARC files constituting nearly 1.5 petabytes (PB) at the time of analysis (November 2018)."-- README file</subfield>
  </datafield>
  <datafield ind1="0" ind2=" " tag="505">
    <subfield code="a">[Comma-separated values (CSV) dataset]. -- [Tab-separated values (TSV) dataset]. -- [Excel (XLS) dataset].</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="588">
    <subfield code="a">Title from Web Archive Datasets website, viewed February 16, 2021.</subfield>
  </datafield>
  <datafield ind1=" " ind2="0" tag="650">
    <subfield code="a">Electronic government information</subfield>
    <subfield code="z">United States.</subfield>
  </datafield>
  <datafield ind1=" " ind2="0" tag="650">
    <subfield code="a">Electronic spreadsheets</subfield>
    <subfield code="z">United States.</subfield>
  </datafield>
  <datafield ind1=" " ind2="0" tag="650">
    <subfield code="a">Web archives</subfield>
    <subfield code="z">United States.</subfield>
  </datafield>
  <datafield ind1=" " ind2="7" tag="655">
    <subfield code="a">Data sets.</subfield>
    <subfield code="2">lcgft</subfield>
  </datafield>
  <datafield ind1=" " ind2="7" tag="655">
    <subfield code="a">Web archives.</subfield>
    <subfield code="2">lcgft</subfield>
  </datafield>
  <datafield ind1="2" ind2=" " tag="710">
    <subfield code="a">Library of Congress Web Archiving Program</subfield>
  </datafield>
  <datafield ind1="8" ind2=" " tag="852">
    <subfield code="b">s-Online</subfield>
    <subfield code="h">Electronic Resource</subfield>
  </datafield>
  <datafield ind1="4" ind2="2" tag="856">
    <subfield code="3">Web Archives Datasets website</subfield>
    <subfield code="u">https://labs.loc.gov/work/experiments/webarchive-datasets/</subfield>
  </datafield>
  <datafield ind1="4" ind2="0" tag="856">
    <subfield code="3">dataset</subfield>
    <subfield code="d">gdcdatasets</subfield>
    <subfield code="f">2020445557</subfield>
    <subfield code="u">https://hdl.loc.gov/loc.gdc/gdcdatasets.2020445557</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="985">
    <subfield code="a">gdcdatasets</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="985">
    <subfield code="a">gdcusgovdigresources</subfield>
  </datafield>
  <datafield ind1=" " ind2=" " tag="985">
    <subfield code="a">gdc/usgovdigresources</subfield>
  </datafield>
</record>
