Tar file extraction


#1



http://optima.jrc.it/Resources/DCEP-2013/DCEP-extract-README.html
I've installed python and the wget module.
I'm trying to extract bilingual couple of sentences from two bz2 files and I follow the instructions reported in the url above but the command prompt returns me this:

File "", line 1
tar jxf DCEP-DA-LV.tar.bz2
^
SyntaxError: invalid syntax

the "wget " command works fine but not the "tar" one. I'm a newbie to python so I fon't know what I have to do..... in the reference code page there is no much information to work around this issue.
My computer runs W10. Is there any problem with that?


wget http://optima.jrc.it/Resources/DCEP-2013/langpairs/DCEP-DA-LV.tar.bz2
tar jxf DCEP-DA-LV.tar.bz2


#2

That isn't valid python code, perhaps you googled how to use tar and found that.
tar is a program typically found on linux and probably on other unix-like systems as well. But you're not using that program, you're using python, presumably the tarfile module. So you would need to use what that module makes available to you (which certainly doesn't involve changing python's syntax)


#3

thank you ionatan for your quick feedback!
Frankly, I have not much a clue on what I'm doing... I mean I'm a newbie in Python and I copied-pasted the codes I found in the reference page.
Do you know what code I have to use in the command prompt? the file python should work on is "DCEP-DA-LV.tar.bz2"
or can you suggest me any webpage where I could find that info?
thanks


#4

The place to look would be the module's documentation:
https://docs.python.org/3/library/tarfile.html
(change the 3 to 2 if you're using python2)
There are some examples near the bottom, and the description or name of the function you're looking for probably involves the word "extract" so you could also search for that and see if the description matches what you want.

Same thing with wget, wget is a program and the module you downloaded isn't that program. (Also, it might be preferable to use a module from the standard library instead (urllib.request))

The right tools for the job is of course wget (or curl) and tar - not python. You'd just stick your two lines in a file and run it with some shell, or type them directly into the shell.

Or if you're just trying to get it open at all, then the windows-y solution is to download 7zip (because it's easier to install for windows than tar, and because it has a gui which is what windows users are used to interacting with)


#5

I managed to download the tar files and to open them (through clicking on the link on the website). the files inside the tar folder have no extension.
Now I'm following the next instructions provided in the website:

wget http://optima.jrc.it/Resources/DCEP-2013/DCEP-extract-scripts.tar.bz2
tar jxvf DCEP-extract-scripts.tar.bz2
./src/languagepair.py DA-LV > DA-LV-bisentences.txt

I assume the first two lines are:
1 download the bz2 file;
2 unpack it
I have done it manually too through clicking inside the website I mentioned at the beginning of post.
I now have that "src" folder containing:
- two (python) files named "ladder2text.py" and "languagepair.py"
- one "README" file with no extension;
- one "README.md" file.
So I basically think I have downloaded on my machine the completed "toolbox" but maybe the step I am not able to do now is to run the "./src/languagepair.py DA-LV > DA-LV-bisentences.txt" string of code.

In the command prompt, I've tried to run python first, then I typed the command:
./src/languagepair.py DA-LV > DA-LV-bisentences.txt
but I receive the following error message:
"
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

./src/languagepair.py EN-IT > EN-IT-bisentences.txt
File "", line 1
./src/languagepair.py EN-IT > EN-IT-bisentences.txt
^
SyntaxError: invalid syntax
"
I tried that command outside python, in the command prompt with a similar result:
"
'.' is not recognized as an internal or external command,
operable program or batch file.
"
Do you maybe have any other suggestions?:confounded:


#6

Those commands (tar, wget, ./src/languagepair.py) are meant for bash (command line, and not windows's cmd.exe one)

> redirects stdout to a file, so what that command says is to run the python code and put the output in a file, cmd.exe might support that as well.

The python file probably has a first line that looks something like:

#!/usr/env/bin python

Which says how to run it, but windows doesn't look at that line.

You can try putting py followed by space at the front of that command, py is a program that looks at that line and figures out how to execute it. Not 100% sure if it comes with python itself, it's a windows thing and therefore not something I use/encounter. Otherwise you can try with python2 or python3 instead (if that first line says only python, then it probably means python2 and might not be compatible with python3)

Your instructions appear to assume a basic unix-like environment. If there are more command-line stuff to run then I suggest installing cygwin where you can just paste the commands in.


#7

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.