unix - Remove rows in a csv which has a specific entry in one column and entries in other columns are repeated -


i have stumbled upon problem, solved, hook or crook. need in precise solution. being beginner in awk/sed, not solve 1 liner (which sure there is), or awk script, though lot of pipes could.

here question:

i have large .csv file entries similar this:

file (space delimited)

$ cat file d e r none c f g r c f g r none d d e c d d e none g f r t none g f t r none k f r e d r e t y none s c d er d g f r t 4 

there no duplicates. if see carefully, there entries in columns 1,2,3,4 repeating, , change 5th column, 'none'. need remove rows (records) repeating in 1,2,3,4 fields , have none in 5th column.

here code wrote, worked, no 1 recommend:

awk '{print $5,$4,$3,$2,$1}' file | sed 's/none/zzz/g' | sort | awk '!array[$2,$3,$4,$5]++' | sed 's/zzz/none/g' 

and here output got, , expecting.

4 t r f g r g f c c e d d d e r f k d er d c s none r e d none r t f g none y t e r 

purpose of replacing none zzz after sorting, rows appear in end, , awk remove second occurrences of duplicates remaining columns. same reason inverting column sequence , re-inverting back. sort.

please if help. thanks!

i got solution:

awk '{s=$4" "$3" "$2" "$1; if($5=="none"&& s in a)next;else a[s]=$5" "s}end{for(i in a)print a[i]}' file|sort 

it outputs:

kent$  awk '{s=$4" "$3" "$2" "$1; if($5=="none"&& s in a)next;else a[s]=$5" "s}end{for(i in a)print a[i]}' file|sort 4 t r f g r g f c c e d d d er d c s d e r f k none r e d none r t f g none y t e r 

it seems same expectation.


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

qt - Errors in generated MOC files for QT5 from cmake -