Each separate 'stream' (or really, file) in the multistream dump contains 100 pages, except possibly the last one.įor multistream, you can get an index file, 2. NOTE THAT the multistream dump file contains multiple bz2 'streams' (bz2 header, body, footer) concatenated together into one file, in contrast to the vanilla file which contains one stream. And it will unpack to ~5-10 times its original size. You might be tempted to get the smaller non-multistream archive, but this will be useless if you don't unpack it. The only downside to multistream is that it is marginally larger. Your reader should handle this for you, if your reader doesn't support it it will work anyway since multistream and non-multistream contain the same xml. But with multistream, it is possible to get an article from the archive without unpacking the whole thing. So if you unpack either, you get the same data. 2 and 2 both contain the same xml contents. GET THE MULTISTREAM VERSION! (and the corresponding index file, 2) To download a subset of the database in XML format, such as a specific category or a list of articles see: Special:Export, usage of which is described at Help:Export.Go to Latest Dumps and look out for all the files that have 'pages-meta-history' in their name. Please only download these if you know you can cope with this quantity of data. All revisions, all pages: These files expand to multiple terabytes of text.SQL files for the pages and links are also available.all-titles-in-ns0.gz – Article titles only (with redirects).2 – Current revisions only, all pages (including talk).2 – Current revisions only, no talk or user pages this is probably what you want, and is over 19 GB compressed (expands to over 86 GB when decompressed).Download the data dump using a BitTorrent client (torrenting has many benefits and reduces server load, saving bandwidth costs).English Wikipedia dumps in SQL and XML: dumps. Dumps from any Wikimedia Foundation project: dumps.Where do I get it? English-language Wikipedia Some of them are mobile applications – see " list of Wikipedia mobile applications". Wikipedia on rockbox: § Wikiviewer for Rockbox.Selected Wikipedia articles as a printed document: Help:Printing.BzReader: § BzReader and MzReader (for Windows).Some of the many ways to read Wikipedia while offline: 13.4 BzReader and MzReader (for Windows).13 Dynamic HTML generation from a local XML database dump.12 Static HTML tree dumps for mirroring or CD distribution.9.1 Doing Hadoop MapReduce on the Wikipedia current database dump.9 Help to parse dumps for use in scripts.7.2 Doing SQL queries on the current database dump.7 Why not just retrieve data from at runtime?.4 Where are the uploaded files (image, audio, video, etc.)?.I'm not sure there is an easy way to view it as a webpage, other than using a site-ripping program (minus the images) to crawl through the site and download every page. You basically need to be running a web server on your computer with the mediawiki software (see install & config). Sorry, I am a noob, what is MedaWiki software? Thanks for the quick response and moving it to the right section. :) I don't need images, but would like it to show up as a webpage. You would probably need to run Mediawiki software on you computer with that file in order to see it as a webpage.- Sxerks 22:28, Novem(UTC) Thanks, I downloaded the XML file. You can search inside the file though it is messy with all the formatting. There are no images with that file though. You can get a XML dump of the site at the bottom of the Special:Statistics page. The wiki I am trying to download is the STARGATE wiki. Basically is there a way to download an archive of all the pages in a wiki? I use wikia a lot to quickly check facts, and I was wondering if there was a way I could download a whole wiki instead of saving each page. I do most of my work on my personal computer and because it's not connected to the internet, I have to ferry files between two computers. I have two computers, and only one is connected to the internet. This isn't a bug or glitch, I was just wondering if there is way to do this. Do not add to unless it really needs a response. Information in this thread may be out of date. It is considered archived - the discussion is over. Note: This topic has been unedited for 180 days.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |